Comparison of Different Graph Distance Metrics for Semantic Text Based Classification



Título del documento: Comparison of Different Graph Distance Metrics for Semantic Text Based Classification
Revista: Polibits
Base de datos: PERIÓDICA
Número de sistema: 000376480
ISSN: 1870-9044
Autores: 1
1
2
2
Instituciones: 1Jadavpur University, Computer Science and Engineering Department, Calcuta, Bengala Occidental. India
2Universidade de Evora, Departamento de Ciencia da Computacao, Evora. Portugal
Año:
Periodo: Ene-Jun
Número: 49
Paginación: 51-58
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés Nowadays semantic information of text is used largely for text classification task instead of bag-of-words approaches. This is due to having some limitations of bag of word approaches to represent text appropriately for certain kind of documents. On the other hand, semantic information can be represented through feature vectors or graphs. Among them, graph is normally better than traditional feature vector due to its powerful data structure. However, very few methodologies exist in the literature for semantic representation of graph. Error tolerant graph matching techniques such as graph similarity measures can be utilised for text classification. However, the techniques like Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) for graph similarity measures are computationally NP-hard problem. In the present paper summarized texts are used during extraction of semantic information to make it computationally faster. The semantic information of texts are represented through the discourse representation structures and later transformed into graphs. Five different graph distance measures based on Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) are used with k-NN classifier to evaluate text classification task. The text documents are taken from Reuters21578 text database distributed over 20 classes. Ten documents of each class for both training and testing purpose are used in the present work. From the results, it has been observed that the techniques have more or less equivalent potential to do text classification and as good as traditional bag-of-words approaches
Disciplinas: Ciencias de la computación
Palabras clave: Procesamiento de datos,
Clasificación de textos,
Semántica,
Gráficas
Keyword: Computer science,
Data processing,
Text classification,
Semantics,
Graphics
Texto completo: Texto completo (Ver HTML)