Journal: | Polibits |
Database: | PERIÓDICA |
System number: | 000359002 |
ISSN: | 1870-9044 |
Authors: | Turchi, Marco1 Ehrmann, Maud1 |
Institutions: | 1GlobSec, European Commission, Joint Research Centre, Ispra, Varese. Italia |
Year: | 2011 |
Season: | Ene-Jun |
Number: | 43 |
Country: | México |
Language: | Inglés |
Document type: | Artículo |
Approach: | Analítico, descriptivo |
English abstract | Translation capability of a Phrase–Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efflciently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En–Fr and Fr–En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out–of–vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model |
Disciplines: | Ciencias de la computación, Literatura y lingüística |
Keyword: | Procesamiento de datos, Lingüística aplicada, Lingüística computacional, Sistemas de traducción, Aprendizaje de máquinas, Morfosintaxis |
Keyword: | Computer science, Literature and linguistics, Data processing, Applied linguistics, Computing linguistics, Translation systems, Machine learning, Morphosyntax |
Full text: | Texto completo (Ver HTML) |