Revista: | Computación y sistemas |
Base de datos: | |
Número de sistema: | 000560363 |
ISSN: | 1405-5546 |
Autores: | Abdellaoui, Houssem1 Zrigui, Mounir2 |
Instituciones: | 1Universite de Tunis, National High School of Engineering of Tunis, Túnez. Túnez 2University of Monastir, Faculte des Sciences de Monastir, Monastir. Túnez |
Año: | 2018 |
Periodo: | Jul-Sep |
Volumen: | 22 |
Número: | 3 |
Paginación: | 777-786 |
País: | México |
Idioma: | Inglés |
Tipo de documento: | Artículo |
Resumen en inglés | Our paper presents a distant supervision algorithm for automatically collecting and labeling ’TEAD‘, a dataset for Arabic Sentiment Analysis (SA), using emojis and sentiment lexicons. The data was gathered from Twitter during the period between the 1st of June and the 30th of November 2017. Although the idea of using emojis to collect and label training data for SA, is not novel, getting this approach to work for Arabic dialect was very challenging. We ended up with more than 6 million tweets labeled as Positive, Negative or Neutral. We present the algorithm used to deal with mixed-content tweets (Modern Standard Arabic MSA and Dialect Arabic DA). We also provide properties and statistics of the dataset alongside experiments results. Our tryouts covered a wide range of standard classifiers proved to be efficient for sentiment classification problem. |
Disciplinas: | Ciencias de la computación |
Palabras clave: | Inteligencia artificial, Análisis de sentimientos, Dialecto árabe, Emojis, Procesamiento de lenguaje natural, |
Keyword: | Emojis, Natural language processing, Twitter, Sentiment analysis, Arabic dialect, Artificial intelligence |
Texto completo: | Texto completo (Ver HTML) Texto completo (Ver PDF) |