Revista: | Computación y sistemas |
Base de datos: | |
Número de sistema: | 000607915 |
ISSN: | 1405-5546 |
Autores: | Castillo, Esteban1 Cervantes, Ofelia2 |
Instituciones: | 1Tecnológico de Monterrey, Escuela de Ingeniería y Ciencias, México 2Universidad de las Américas Puebla, Department of Computer Science, México |
Año: | 2024 |
Periodo: | Abr-Jun |
Volumen: | 28 |
Número: | 2 |
Paginación: | 489-505 |
País: | México |
Idioma: | Inglés |
Resumen en inglés | This paper presents a text mining approach for extracting valuable patterns from social media documents in the context of U.S. immigration. The paper points out the uncovering of statistical features alongside linguistic elements based on graph techniques. The use of graphs provide rich data structures for representing lexical and syntactic aspects of texts, allowing the discovery of complex patterns that used by experts could provide valuable insight. The proposed method is applied over a Twitter-X/-Reddit dataset that comprise English and Spanish language samples from 2016 up to 2019. Experimental results showed that our interpretation of classic statistic techniques provide a baseline understanding of the topic while a more robust analysis (graphs) permits to uncover/predict hidden patterns over large amount of samples. In particular, the use of a co-occurrence graph helped to obtain relevant words, phrases and sentences while a user-interaction graph allow to detect important users, communities and interactions among themselves. |
Keyword: | Text mining, Statistics, Graph mining, Social network analysis, Natural language processing, Big data |
Texto completo: | Texto completo (Ver PDF) Texto completo (Ver HTML) |