TrainQA: a Training Corpus for Corpus-Based Question Answering Systems



Título del documento: TrainQA: a Training Corpus for Corpus-Based Question Answering Systems
Revue: Polibits
Base de datos: PERIÓDICA
Número de sistema: 000368140
ISSN: 1870-9044
Autores: 1
1
2
2
Instituciones: 1Universidad de Alicante, Departamento de Software y Sistemas Computacionales, Alicante. España
2Universidad Técnica de Valencia, Departamento de Sistemas de Información y Computación, Valencia. España
Año:
Periodo: Jul-Dic
Número: 40
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Experimental, aplicado
Resumen en inglés This paper describes the development of an English corpus of factoid TREC–like question–answer pairs. The corpus obtained consists of more than 70,000 samples, containing each one the following information: a question, its question type, an exact answer to the question, the different contexts levels (sentence, paragraph and document) where the answer occurs inside a document, and a label indicating whether the answer is correct (a positive sample) or not (a negative sample). For instance, TrainQA can be used for training a binary classifier in order to decide if a given answer is correct (positive) to the question formulated or not (negative). To our knowledge, this is the first corpus aimed to train on every stage of a trainable Question Answering system: question classification, information retrieval, answer extraction and answer validation
Disciplinas: Ciencias de la computación
Palabras clave: Sistemas de información,
Sistemas expertos,
Capacitación,
Preguntas
Keyword: Computer science,
Information systems,
Expert systems,
Training,
Questions
Texte intégral: Texto completo (Ver HTML)