Paraphrase and Textual Entailment Generation in Czech



Título del documento: Paraphrase and Textual Entailment Generation in Czech
Revue: Computación y sistemas
Base de datos: PERIÓDICA
Número de sistema: 000379433
ISSN: 1405-5546
Autores: 1
Instituciones: 1Masaryk University, Faculty of Informatics, Brno, Jihomoravsky. República Checa
Año:
Periodo: Jul-Sep
Volumen: 18
Número: 3
Paginación: 555-568
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés Paraphrase and textual entailment generation can support natural language processing (NLP) tasks that simulate text understanding, e.g., text summarization, plagiarism detection, or question answering. A paraphrase, i.e., a sentence with the same meaning, conveys a certain piece of information with new words and new syntactic structures. Textual entailment, i.e., an inference that humans will judge most likely true, can employ real-world knowledge in order to make some implicit information explicit. Paraphrases can also be seen as mutual entailments. We present a new system that generates paraphrases and textual entailments from a given text in the Czech language. First, the process is rule-based, i.e., the system analyzes the input text, produces its inner representation, transforms it according to transformation rules, and generates new sentences. Second, the generated sentences are ranked according to a statistical model and only the best ones are output. The decision whether a paraphrase or textual entailment is correct or not is left to humans. For this purpose we designed an annotation game based on a conversation between a detective (the human player) and his assistant (the system). The result of such annotation is a collection of annotated pairs text-hypothesis. Currently, the system and the game are intended to collect data in the Czech language. However, the idea can be applied for other languages. So far, we have collected 3,321 H-T pairs. From these pairs, 1,563 were judged correct (47.06 %), 1,238 (37.28 %) were judged incorrect entailments, and 520 (15.66 %) were judged non-sense or unknown
Disciplinas: Ciencias de la computación,
Literatura y lingüística
Palabras clave: Inteligencia artificial,
Lingüística aplicada,
Vinculación de pruebas,
Procesamiento de lenguaje natural,
Paráfrasis
Keyword: Computer science,
Literature and linguistics,
Artificial intelligence,
Applied linguistics,
Textual entailment,
Natural language processing,
Paraphrase
Texte intégral: Texto completo (Ver HTML)