Part-of-Speech Tagging for Mizo Language Using Conditional Random Field



Document title: Part-of-Speech Tagging for Mizo Language Using Conditional Random Field
Journal: Computación y sistemas
Database:
System number: 000560631
ISSN: 1405-5546
Authors: 1
2
1
3
Institutions: 1Mizoram University, Department of Information Technology, India
2National Institute of Technology Silchar, Department of Computer Science and Engineering, India
3Mizoram University, Department of Electronics and Communication Engineering, India
Year:
Season: Oct-Dic
Volumen: 25
Number: 4
Pages: 803-812
Country: México
Language: Inglés
English abstract Part of speech (POS) tagging assigns a class or tag to each token in a sentence. The tag allocated to a word is mainly its part of speech or any other class of interest. Several applications of Natural Language Processing (NLP) require it as a prerequisite. The development of part-of-speech tagging for the under-resourced Mizo language is presented in this study, which makes use of a stochastic model known as Conditional Random Field (CRF). The CRF is a discriminative probabilistic classifier that considers both the context of a given word and the tag transition probabilities in the training dataset. A corpus of approximately 30,000 words was collected and manually annotated with the proposed tagset for system evaluation. On various sizes of training and test sets, the tagger achieved 89.46 % accuracy, 89.3 % F1-score, 89.42 % precision, and 89.48 % recall.
Keyword: Mizo POS tagging,
Conditional random field,
Mizo part of speech tagger,
Computational linguistics
Full text: Texto completo (Ver HTML) Texto completo (Ver PDF)