Semantic Textual Similarity Methods, Tools, and Applications: A Survey



Título del documento: Semantic Textual Similarity Methods, Tools, and Applications: A Survey
Revue: Computación y sistemas
Base de datos: PERIÓDICA
Número de sistema: 000410216
ISSN: 1405-5546
Autores: 1
1
2
3
Instituciones: 1National Institute of Technology Mizoram, Mizoram, Aizawl. India
2Instituto Politécnico Nacional, Centro de Investigación en Computación, Ciudad de México. México
3Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Puebla. México
Año:
Periodo: Oct-Dic
Volumen: 20
Número: 4
Paginación: 647-665
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Experimental, aplicado
Resumen en inglés Measuring Semantic Textual Similarity (STS), between words/ terms, sentences, paragraph and document plays an important role in computer science and computational linguistic. It also has many applications over several fields such as Biomedical Informatics and Geoinformation. In this paper, we present a survey on different methods of textual similarity and we also reported about the availability of different software and tools those are useful for STS. In natural language processing (NLP), STS is a important component for many tasks such as document summarization, word sense disambiguation, short answer grading, information retrieval and extraction. We split out the measures for semantic similarity into three broad categories such as (i) Topological/Knowledge-based (ii) Statistical/ Corpus Based (iii) String based. More emphasis is given to the methods related to the WordNet taxonomy. Because topological methods, plays an important role to understand intended meaning of an ambiguous word, which is very difficult to process computationally. We also propose a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between nodes/words and a uni-gram language model is implemented over a large corpus to assign the information content value between the two nodes of different classes
Disciplinas: Ciencias de la computación
Palabras clave: Procesamiento de datos,
Análisis de textos,
Procesamiento de lenguaje natural,
Similitud semántica,
Contenido de información
Keyword: Computer science,
Data processing,
Text analysis,
Natural language processing,
Semantic similarity,
Information content
Texte intégral: Texto completo (Ver HTML)