Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition



Título del documento: Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition
Revue: Computación y sistemas
Base de datos: PERIÓDICA
Número de sistema: 000379432
ISSN: 1405-5546
Autores: 1
1
1
Instituciones: 1Instituto Politécnico Nacional, Centro de Investigación en Computación, México, Distrito Federal. México
Año:
Periodo: Jul-Sep
Volumen: 18
Número: 3
Paginación: 517-554
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this problem, several lexical, syntactic and semantic based techniques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntactic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures, and linear combination and a similarity matrix built in turn from syntactic n-grams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each approach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syntactic dependency n-grams, or syntactic constituent n-grams. We compare too both approaches with traditional n-grams and state-of-the-art systems
Disciplinas: Ciencias de la computación,
Literatura y lingüística
Palabras clave: Lingüística aplicada,
Lingüística computacional,
Análisis de textos,
Reconocimiento de paráfrasis,
Análisis de componentes,
Análisis de dependencias,
n-gramas sintácticos
Keyword: Computer science,
Literature and linguistics,
Applied linguistics,
Computing linguistics,
Text analysis,
Paraphrase recognition,
Constituent analysis,
Dependency analysis,
Syntactic n-grams
Texte intégral: Texto completo (Ver HTML)