Revista: | Computación y sistemas |
Base de datos: | PERIÓDICA |
Número de sistema: | 000411063 |
ISSN: | 1405-5546 |
Autores: | Jakubina, Laurent1 Langlais, Philippe1 |
Instituciones: | 1Universite de Montreal, Departement d'Informatique, Montreal, Quebec. Canadá |
Año: | 2016 |
Periodo: | Jul-Sep |
Volumen: | 20 |
Número: | 3 |
Paginación: | 449-458 |
País: | México |
Idioma: | Inglés |
Tipo de documento: | Artículo |
Enfoque: | Experimental, aplicado |
Resumen en inglés | Identifying translations in comparable corpora is a challenge that has attracted many researchers since a long time. It has applications in several applications including Machine Translation and Cross-lingual Information Retrieval. In this study we compare three state-of-the-art approaches for these tasks: the so-called context-based projection method, the projection of monolingual word embeddings, as well as a method dedicated to identify translations of rare words. We carefully explore the hyper-parameters of each method and measure their impact on the task of identifying the translation of English words in Wikipedia into French. Contrary to the standard practice, we designed a test case where we do not resort to heuristics in order to pre-select the target vocabulary among which to find translations, therefore pushing each method to its limit. We show that all the approaches we tested have a clear bias toward frequent words. In fact, the best approach we tested could identify the translation of a third of a set of frequent test words, while it could only translate around 10% of rare words |
Disciplinas: | Ciencias de la computación, Literatura y lingüística |
Palabras clave: | Procesamiento de datos, Lingüística aplicada, Lingüística computacional, Traducción automática, Inducción de léxico |
Keyword: | Computer science, Literature and linguistics, Data processing, Applied linguistics, Computing linguistics, Automatic translation, Lexicon induction |
Texto completo: | Texto completo (Ver HTML) |