Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval



Título del documento: Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval
Revue: Computación y sistemas
Base de datos: PERIÓDICA
Número de sistema: 000383499
ISSN: 1405-5546
Autores: 1
Instituciones: 1Universidade de Sao Paulo, Instituto de Matematica e Estatistica, Sao Paulo. Brasil
Año:
Periodo: Abr-Jun
Volumen: 19
Número: 2
Paginación: 357-350
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate a lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which allow reducing the vocabulary and data sparseness. Then, to these segmentations we add the morphological information of a POS language model. We combine all these approaches using a Minimum Bayes Risk strategy. Experiments show significant improvements from the enhanced system over the baseline system on the Brazilian-Portuguese/English language pair. Finally, we report a case study of the impact of enhancing the statistical machine translation system with morphology in a cross-language application system such as ONAIR which allows users to look for information in video fragments through queries in natural language
Disciplinas: Ciencias de la computación,
Bibliotecología y ciencia de la información
Palabras clave: Inteligencia artificial,
Tecnología de la información,
Recuperación de información,
Traducción automática
Keyword: Computer science,
Library and information science,
Artificial intelligence,
Information technology,
Information retrieval,
Machine translation
Texte intégral: Texto completo (Ver HTML)