Soft Cardinality in Semantic Text Processing: Experience of the SemEval International Competitions



Document title: Soft Cardinality in Semantic Text Processing: Experience of the SemEval International Competitions
Journal: Polibits
Database: PERIÓDICA
System number: 000383482
ISSN: 1870-9044
Authors: 1
1
2
Institutions: 1Universidad Nacional de Colombia, Departamento de Ingeniería de Sistemas e Industrial, Bogotá. Colombia
2Instituto Politécnico Nacional, Centro de Investigación en Computación, México, Distrito Federal. México
Year:
Season: Ene-Jun
Number: 51
Pages: 63-72
Country: México
Language: Inglés
Document type: Artículo
Approach: Aplicado, descriptivo
English abstract Soft cardinality is a generalization of the classic set cardinality (i.e., the number of elements in a set), which exploits similarities between elements to provide a "soft" counting of the number of elements in a collection. This model is so general that can be used interchangeability as cardinality function in resemblance coefficients such as Jaccard's, Dice's, cosine and others. Beyond that, cardinality-based features can be extracted from pairs of objects being compared to learn adaptive similarity functions from training data. This approach can be used for comparing any object that can be represented as a set or bag. We and other international teams used soft cardinality to address a series of natural language processing (NLP) tasks in the recent SemEval (semantic evaluation) competitions from 2012 to 2014. The systems based on soft cardinality have always been among the best systems in all the tasks in which they participated. This paper describes our experience in that journey by presenting the generalities of the model and some practical techniques for using soft cardinality for NLP problems
Disciplines: Ciencias de la computación,
Bibliotecología y ciencia de la información
Keyword: Procesamiento de datos,
Tecnología de la información,
Cómputo aproximado,
Medidas de similaridad,
Semántica,
Procesamiento de lenguaje natural
Keyword: Computer science,
Library and information science,
Data processing,
Information technology,
Soft computing,
Similarity measures,
Semantics,
Natural language processing
Full text: Texto completo (Ver HTML)