Named Entity Recognition in Hindi using Maximum Entropy and Transliteration

Kumar Saha, Sujan; Sarathi Ghosh, Partha; Sarkar, Sudeshna; Mitra, Pabitra


Título del documento:	Named Entity Recognition in Hindi using Maximum Entropy and Transliteration
Revue:	Polibits
Base de datos:	PERIÓDICA
Número de sistema:	000368520
ISSN:	1870-9044
Autores:	Kumar Saha, Sujan¹ Sarathi Ghosh, Partha² Sarkar, Sudeshna¹ Mitra, Pabitra¹
Instituciones:	¹Indian Institute of Technology, Department of Computer Science and Engineering, Kharagpur. India ²HCL Technologies, Bangalore, Karnataka. India
Año:	2008
Periodo:	Jul-Dic
Número:	38
País:	México
Idioma:	Inglés
Tipo de documento:	Artículo
Enfoque:	Experimental, aplicado
Resumen en inglés	Named entities are perhaps the most important indexing element in text for most of the information extraction and mining tasks. Construction of a Named Entity Recognition (NER) system becomes challenging if proper resources are not available. Gazetteer lists are often used for the development of NER systems. In many resource–poor languages gazetteer lists of proper size are not available, but sometimes relevant lists are available in English. Proper transliteration makes the English lists useful in the NER tasks for such languages. In this paper, we have described a Maximum Entropy based NER system for Hindi. We have explored different features applicable for the Hindi NER task. We have incorporated some gazetteer lists in the system to increase the performance of the system. These lists are collected from the web and are in English. To make these English lists useful in the Hindi NER task, we have proposed a two–phase transliteration methodology. A considerable amount of performance improvement is observed after using the transliteration based gazetteer lists in the system. The proposed transliteration based gazetteer preparation methodology is also applicable for other languages. Apart from Hindi, we have applied the transliteration approach in Bengali NER task and also achieved performance improvement
Disciplinas:	Ciencias de la computación
Palabras clave:	Procesamiento de datos, Lingüística computacional, Transliteración, Procesamiento de lenguaje natural, Reconocimiento de entidades
Keyword:	Computer science, Data processing, Computing linguistics, Transliteration, Natural language processing, Entity recognition
Texte intégral:	Texto completo (Ver HTML)

Named Entity Recognition in Hindi using Maximum Entropy and Transliteration

Espere un momento...