Disentangling the Wikipedia Category Graph for Corpus Extraction

Ngonga Ngomo, Axel Cyrille; Schumacher, Frank


Título del documento:	Disentangling the Wikipedia Category Graph for Corpus Extraction
Revue:	Polibits
Base de datos:	PERIÓDICA
Número de sistema:	000368167
ISSN:	1870-9044
Autores:	Ngonga Ngomo, Axel Cyrille¹ Schumacher, Frank¹
Instituciones:	¹University of Leipzig, Department of Computer Science, Leipzig. Alemania
Año:	2009
Periodo:	Ene-Jun
Número:	39
País:	México
Idioma:	Inglés
Tipo de documento:	Artículo
Enfoque:	Experimental, aplicado
Resumen en inglés	In several areas of research such as knowledge management and natural language processing, domain–specific corpora are required for tasks such as terminology extraction and ontology learning. The presented investigations herein are based on the assumption that Wikipedia can be used for the purpose of corpus extraction. It presents the advantage of possessing a semantic layer, which should ease the extraction of domain–specific corpora. Yet, as the Wikipedia category graph is scale–free, it can not be used as it is for these purposes. In this paper, we propose a novel approach to graph clustering called BorderFlow, which we use and evaluate on the Wikipedia category graph. Additional possible applications of these results in the area of information retrieval are presented
Disciplinas:	Ciencias de la computación
Palabras clave:	Procesamiento de datos, Lingüística computacional, Procesamiento de lenguaje natural, Estructura de dominio, Agrupamiento de grafos
Keyword:	Computer science, Data processing, Computing linguistics, Natural language processing, Domain structure, Graph clustering
Texte intégral:	Texto completo (Ver HTML)

Disentangling the Wikipedia Category Graph for Corpus Extraction

Espere un momento...