Comparing Pre-Trained Language Model for Arabic Hate Speech Detection



Título del documento: Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
Revista: Computación y sistemas
Base de datos:
Número de sistema: 000607892
ISSN: 1405-5546
Autores: 1
1
2
Instituciones: 1Echahid Cheikh Larbi Tebessi University, Laboratory of Vision and Artificial Intelligence, Argelia
2Mohamed Khider University of Biskra, Faculty of Sciences and Technology, Argelia
Año:
Periodo: Abr-Jun
Volumen: 28
Número: 2
Paginación: 681-693
País: México
Idioma: Inglés
Resumen en inglés Today, the classification of hate speech in Arabic tweets has garnered significant attention from scholars worldwide. Although numerous classification approaches proposed in response to this interest, two primary challenges persist are reliance on handcrafted features and limited performance rates. This paper addresses the task of identifying Arabic hate speech on Twitter, aiming to deepen insights into the efficacy of novel machine-learning techniques. Specifically, we compare the performance of traditional machine learning-based approaches with state-of-the-art pre-trained language models based on Transfer Learning, as well as deep learning models. Our experiments, conducted on a benchmark dataset using a standard evaluation scenario, reveal several key findings. Firstly, multidialectal pre-trained language models demonstrate superior performance compared to monolingual and multilingual variants. Secondly, fine-tuning the pre-trained large language models significantly enhances the accuracy of hate speech classification in Arabic tweets. Our primary contribution lies in achieving promising results for the corresponding task through the application of multidialectal pre-trained language models trained on Twitter data.
Keyword: Arabic hate speech detection,
Fine-tuning,
Transfer learning,
AraBERT
Texto completo: Texto completo (Ver PDF) Texto completo (Ver HTML)