Mining Ethnic Content Online with Additively Regularized Topic Models



Document title: Mining Ethnic Content Online with Additively Regularized Topic Models
Journal: Computación y sistemas
Database: PERIÓDICA
System number: 000411058
ISSN: 1405-5546
Authors: 2
1
1
1
3
Institutions: 1National Research University, Higher School of Economics, San Petersburgo. Rusia
2Moscow State University, Moscú. Rusia
3Moscow institute of Physics and Technology, Moscú. Rusia
Year:
Season: Jul-Sep
Volumen: 20
Number: 3
Pages: 387-403
Country: México
Language: Inglés
Document type: Artículo
Approach: Experimental, aplicado
English abstract Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality
Disciplines: Ciencias de la computación,
Literatura y lingüística
Keyword: Procesamiento de datos,
Lingüística aplicada,
Lingüística computacional,
Minería de texto,
Regularización aditiva
Keyword: Computer science,
Literature and linguistics,
Data processing,
Applied linguistics,
Computing linguistics,
Text mining,
Additive regularization
Full text: Texto completo (Ver HTML)