Journal: | Computación y sistemas |
Database: | |
System number: | 000560631 |
ISSN: | 1405-5546 |
Authors: | Nunsanga, Morrel V. L.1 Pakray, Partha2 Lallawmsanga, C.1 Singh, L. Lolit Kumar3 |
Institutions: | 1Mizoram University, Department of Information Technology, India 2National Institute of Technology Silchar, Department of Computer Science and Engineering, India 3Mizoram University, Department of Electronics and Communication Engineering, India |
Year: | 2021 |
Season: | Oct-Dic |
Volumen: | 25 |
Number: | 4 |
Pages: | 803-812 |
Country: | México |
Language: | Inglés |
English abstract | Part of speech (POS) tagging assigns a class or tag to each token in a sentence. The tag allocated to a word is mainly its part of speech or any other class of interest. Several applications of Natural Language Processing (NLP) require it as a prerequisite. The development of part-of-speech tagging for the under-resourced Mizo language is presented in this study, which makes use of a stochastic model known as Conditional Random Field (CRF). The CRF is a discriminative probabilistic classifier that considers both the context of a given word and the tag transition probabilities in the training dataset. A corpus of approximately 30,000 words was collected and manually annotated with the proposed tagset for system evaluation. On various sizes of training and test sets, the tagger achieved 89.46 % accuracy, 89.3 % F1-score, 89.42 % precision, and 89.48 % recall. |
Keyword: | Mizo POS tagging, Conditional random field, Mizo part of speech tagger, Computational linguistics |
Full text: | Texto completo (Ver HTML) Texto completo (Ver PDF) |