Comparison of Different Graph Distance Metrics for Semantic Text Based Classification



Document title: Comparison of Different Graph Distance Metrics for Semantic Text Based Classification
Journal: Polibits
Database: PERIÓDICA
System number: 000376480
ISSN: 1870-9044
Authors: 1
1
2
2
Institutions: 1Jadavpur University, Computer Science and Engineering Department, Calcuta, Bengala Occidental. India
2Universidade de Evora, Departamento de Ciencia da Computacao, Evora. Portugal
Year:
Season: Ene-Jun
Number: 49
Pages: 51-58
Country: México
Language: Inglés
Document type: Artículo
Approach: Analítico, descriptivo
English abstract Nowadays semantic information of text is used largely for text classification task instead of bag-of-words approaches. This is due to having some limitations of bag of word approaches to represent text appropriately for certain kind of documents. On the other hand, semantic information can be represented through feature vectors or graphs. Among them, graph is normally better than traditional feature vector due to its powerful data structure. However, very few methodologies exist in the literature for semantic representation of graph. Error tolerant graph matching techniques such as graph similarity measures can be utilised for text classification. However, the techniques like Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) for graph similarity measures are computationally NP-hard problem. In the present paper summarized texts are used during extraction of semantic information to make it computationally faster. The semantic information of texts are represented through the discourse representation structures and later transformed into graphs. Five different graph distance measures based on Maximum Common Subgraph (mcs) and Minimum Common Supergraph (MCS) are used with k-NN classifier to evaluate text classification task. The text documents are taken from Reuters21578 text database distributed over 20 classes. Ten documents of each class for both training and testing purpose are used in the present work. From the results, it has been observed that the techniques have more or less equivalent potential to do text classification and as good as traditional bag-of-words approaches
Disciplines: Ciencias de la computación
Keyword: Procesamiento de datos,
Clasificación de textos,
Semántica,
Gráficas
Keyword: Computer science,
Data processing,
Text classification,
Semantics,
Graphics
Full text: Texto completo (Ver HTML)