A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Laskar, Sahinur Rahman; Manna, Riyanka; Pakray, Partha; Bandyopadhyay, Sivaji


Título del documento:	A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation
Revista:	Computación y sistemas
Base de datos:
Número de sistema:	000560747
ISSN:	1405-5546
Autores:	Laskar, Sahinur Rahman¹ Manna, Riyanka² Pakray, Partha¹ Bandyopadhyay, Sivaji¹
Instituciones:	¹National Institute of Technology Silchar, Department of Computer Science and Engineering, Assam. India ²Adamas University, Department of Computer Science and Engineering, Kolkata, West Bengal. India
Año:	2022
Periodo:	Oct-Dic
Volumen:	26
Número:	4
Paginación:	1669-1687
País:	México
Idioma:	Inglés
Tipo de documento:	Artículo
Resumen en inglés	Machine translation deals with automatic translation from one natural language to another. Neural machine translation is a widely accepted technique of the corpus-based machine translation approach. However, an adequate amount of training data is required, and there is a need for the domain-wise parallel corpus to improve translational performance that shows translational coverages in various domains. In this work, a domain-specific parallel corpus is prepared that includes different domain coverages, namely, Agriculture, Government Oﬀice, Judiciary, Social Media, Tourism, COVID-19, Sports, and Literature domains for low-resource English-Assamese pair translation. Moreover, we have tackled data scarcity and word-order divergence problems via data augmentation and prior alignment concept. Also, we have contributed Assamese pretrained LM, Assamese word-embeddings by utilizing Assamese monolingual data, and a bilingual dictionary-based post-processing step to enhance transformer-based neural machine translation. We have achieved state-of-the-art results for both forward (English-to-Assamese) and backward (Assamese-to-English) directions of translation.
Disciplinas:	Ciencias de la computación, Ciencias de la computación
Palabras clave:	Procesamiento de datos, Inteligencia artificial
Keyword:	Data processing, Artificial intelligence
Texto completo:	Texto completo (Ver HTML) Texto completo (Ver PDF)

A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation

Espere un momento...