Comparison of Clustering Algorithms for the Identification of Topics on Twitter



Document title: Comparison of Clustering Algorithms for the Identification of Topics on Twitter
Journal: Latin-American Journal of Computing (LAJC)
Database:
System number: 000565061
ISSN: 1390-9134
Authors: 1
1
Institutions: 1University of Technology of Paraná,
Year:
Volumen: 3
Number: 1
Pages: 19-26
Country: Ecuador
Language: Inglés
English abstract Topic Identification in Social Networks has become an important task when dealing with event detection, particularly when global communities are affected. In order to attack this problem, text processing techniques and machine learning algorithms have been extensively used. In this paper we compare four clustering algorithms – k-means, k-medoids, DBSCAN and NMF (Non-negative Matrix Factorization) – in order to detect topics related to textual messages obtained from Twitter. The algorithms were applied to a database initially composed by tweets having hashtags related to the recent Nepal earthquake as initial context. Obtained results suggest that the NMF clustering algorithm presents superior results, providing simpler clusters that are also easier to interpret.
Keyword: Twitter topics identification,
NMF algorithm,
clustering algorithms,
text processing
Full text: Texto completo (Ver PDF)