EU – CEF (Connecting Europe Facility) / Telecommunications sector
Total eligible costs: 1,883,714.67 EUR
Estimated CEF contribution: 1,412,786.00 EUR
2018-10-01 – 2020-09-30 (24 months)
Research Institute for Linguistics of the Hungarian Academy of Sciences (RILMTA)
The overall objective of Multilingual Resources for CEF.AT in the legal domain – MARCELL Action is to provide automatic translation on the body of national legislation (laws, decrees, regulations) in seven countries: Bulgaria, Croatia, Hungary, Poland, Romania, Slovakia and Slovenia. At present national legislation texts are not automatically available to CEF.AT and present Machine Translation (MT) systems could be improved if they had access to national legislative texts.
The Action aims to process two resources available in all seven languages concerned i.e. the multilingual ontology-based thesaurus EUROVOC on the one hand and the corpora of all national legislation in the respective languages on the other. As a result, the Action will produce the following deliverables:
- Seven large-scale suitably pre-processed (tokenized and morphologically tagged) monolingual corpora of national legislation documents classified into EUROVOC topics/descriptors and enriched with EUROVOC and IATE terms identified.
- Comparable corpus of seven languages aligned at the topic level domains identified by EUROVOC descriptors.
- Croatian English parallel corpus consisting of ca. 1800 legislative documents.
In addition to the expected overall improvement of the MT system in the seven languages concerned, the Action will have an impact both on the e-justice and the Online Dispute Resolution Digital Service Infrastructures as the resources focus on national legislation, which is of direct relevance to both DSI’s.