TrainQA: a Training Corpus for Corpus-Based Question Answering Systems



Document title: TrainQA: a Training Corpus for Corpus-Based Question Answering Systems
Journal: Polibits
Database: PERIÓDICA
System number: 000368140
ISSN: 1870-9044
Authors: 1
1
2
2
Institutions: 1Universidad de Alicante, Departamento de Software y Sistemas Computacionales, Alicante. España
2Universidad Técnica de Valencia, Departamento de Sistemas de Información y Computación, Valencia. España
Year:
Season: Jul-Dic
Number: 40
Country: México
Language: Inglés
Document type: Artículo
Approach: Experimental, aplicado
English abstract This paper describes the development of an English corpus of factoid TREC–like question–answer pairs. The corpus obtained consists of more than 70,000 samples, containing each one the following information: a question, its question type, an exact answer to the question, the different contexts levels (sentence, paragraph and document) where the answer occurs inside a document, and a label indicating whether the answer is correct (a positive sample) or not (a negative sample). For instance, TrainQA can be used for training a binary classifier in order to decide if a given answer is correct (positive) to the question formulated or not (negative). To our knowledge, this is the first corpus aimed to train on every stage of a trainable Question Answering system: question classification, information retrieval, answer extraction and answer validation
Disciplines: Ciencias de la computación
Keyword: Sistemas de información,
Sistemas expertos,
Capacitación,
Preguntas
Keyword: Computer science,
Information systems,
Expert systems,
Training,
Questions
Full text: Texto completo (Ver HTML)