Comparison of Clustering Algorithms for the Identification of Topics on Twitter



Título del documento: Comparison of Clustering Algorithms for the Identification of Topics on Twitter
Revista: Latin-American Journal of Computing (LAJC)
Base de datos:
Número de sistema: 000565061
ISSN: 1390-9134
Autores: 1
1
Instituciones: 1University of Technology of Paraná,
Año:
Volumen: 3
Número: 1
Paginación: 19-26
País: Ecuador
Idioma: Inglés
Resumen en inglés Topic Identification in Social Networks has become an important task when dealing with event detection, particularly when global communities are affected. In order to attack this problem, text processing techniques and machine learning algorithms have been extensively used. In this paper we compare four clustering algorithms – k-means, k-medoids, DBSCAN and NMF (Non-negative Matrix Factorization) – in order to detect topics related to textual messages obtained from Twitter. The algorithms were applied to a database initially composed by tweets having hashtags related to the recent Nepal earthquake as initial context. Obtained results suggest that the NMF clustering algorithm presents superior results, providing simpler clusters that are also easier to interpret.
Keyword: Twitter topics identification,
NMF algorithm,
clustering algorithms,
text processing
Texto completo: Texto completo (Ver PDF)