Revista: | Polibits |
Base de datos: | PERIÓDICA |
Número de sistema: | 000374542 |
ISSN: | 1870-9044 |
Autores: | Neunerdt, Melanie1 Reyer, Michael1 Mathar, Rudolf1 |
Instituciones: | 1RWTH Aachen University, Institute for Theoretical Information Technology, Werringerweg, Aachen. Alemania |
Año: | 2013 |
Periodo: | Jul-Dic |
Número: | 48 |
Paginación: | 61-68 |
País: | México |
Idioma: | Inglés |
Tipo de documento: | Artículo |
Enfoque: | Aplicado, descriptivo |
Resumen en inglés | Using social media tools such as blogs and forums have become more and more popular in recent years. Hence, a huge collection of social media texts from different communities is available for accessing user opinions, e.g., for marketing studies or acceptance research. Typically, methods from Natural Language Processing are applied to social media texts to automatically recognize user opinions. A fundamental component of the linguistic pipeline in Natural Language Processing is Part-of-Speech tagging. Most state-of-the-art Part-of-Speech taggers are trained on newspaper corpora, which differ in many ways from non-standardized social media text. Hence, applying common taggers to such texts results in performance degradation. In this paper, we present extensions to a basic Markov model tagger for the annotation of social media texts. Considering the German standard Stuttgart/Tübinger TagSet (STTS), we distinguish 54 tag classes. Applying our approach improves the tagging accuracy for social media texts considerably, when we train our model on a combination of annotated texts from newspapers and Web comments |
Disciplinas: | Ciencias de la computación |
Palabras clave: | Inteligencia artificial, Procesamiento de lenguaje natural, Marcado de textos, Blogs, Minería de texto |
Keyword: | Computer science, Artificial intelligence, Natural language processing, Text tagging, Blogs, Text mining |
Texto completo: | Texto completo (Ver HTML) |