Social Media - Processing Romanian Chat and Discourse Analysis



Título del documento: Social Media - Processing Romanian Chat and Discourse Analysis
Revue: Computación y sistemas
Base de datos: PERIÓDICA
Número de sistema: 000411059
ISSN: 1405-5546
Autores: 1
1
1
Instituciones: 1Al. I. Cuza University, Faculty of Computer Science, Iasi. Rumania
Año:
Periodo: Jul-Sep
Volumen: 20
Número: 3
Paginación: 405-414
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Experimental, aplicado
Resumen en inglés In order to obtain a balanced corpus, a sub-corpus of 2,576 sentences illustrating contemporary social media language has been added to the Dependency Treebank for Romanian. The texts were taken from the chat. The subject of this paper is to describe the second step of processing non-standard texts with a hybrid POS-tagger for Romanian and with a Malt parser, both until now trained on standard language and on other styles of communication. The results obtained show that the UAIC tools are comparable with the tools for other languages trained on similar corpora. Another purpose is to develop this resource, the Dependency Treebank for Romanian, not only quantitatively, doubling its dimension in a year, but also changing its format with a new one, compatible with other similar foreign corpora, and adding new, more complex annotation layers. A semantic layer and a discursive annotation will be added, permitting the study of discursive and conversational particularities. Finally, examples illustrating discursive particularities of the chat communication are discussed
Disciplinas: Ciencias de la computación,
Literatura y lingüística
Palabras clave: Procesamiento de datos,
Análisis del discurso,
Lingüística aplicada,
Lingüística computacional,
Procesamiento de textos,
Redes sociales
Keyword: Computer science,
Literature and linguistics,
Data processing,
Applied linguistics,
Discourse analysis,
Computing linguistics,
Text processing,
Social networks
Texte intégral: Texto completo (Ver HTML)