Assesing the Feature-Driven Nature of Similarity-based Sorting of Verbs



Document title: Assesing the Feature-Driven Nature of Similarity-based Sorting of Verbs
Journal: Polibits
Database: PERIÓDICA
System number: 000358997
ISSN: 1870-9044
Authors: 1
1
1
1
1
Institutions: 1Norwegian University of Science and Technology, Trondheim, Sor-Trondelag. Noruega
Year:
Season: Ene-Jun
Number: 43
Country: México
Language: Inglés
Document type: Artículo
Approach: Analítico, descriptivo
English abstract The paper presents a computational analysis of the results from a sorting task with motion verbs in Norwegian. The sorting behavior of humans rests on the features they use when they compare two or more words. We investigate what these features are and how differential each feature may be in sorting. The key rationale for our method of analysis is the assumption that a sorting task rests on a similarity assessment process. The main idea is that a set of features underlies this similarity judgment, and similarity between two verbs amounts to the sum of the weighted similarity between the given set of features. The computational methodology used to investigate the features is as follows. Based on the frequency of co–occurrence of verbs in the human generated cluster, weights of a given set of features are computed using linear regression. The weights are used, in turn, to compute a similarity matrix between the verbs. This matrix is used as an input for the agglomerative hierarchical clustering. If the selected/projected set of features aligns with the features the participants used when sorting verbs in groups, then the clusters we obtain using this computational method would align with the clusters generated by humans. Otherwise, the method proceeds with modifying the feature set and repeating the process. Features promoting clusters that align with human–generated clusters are evaluated by a set of human experts and the results show that the method manages to identify the appropriate feature sets. This method can be applied in analyzing a variety of data ranging from experimental free production data, to linguistic data from controlled experiments in the assessment of semantic relations and hierarchies within languages and across languages
Disciplines: Ciencias de la computación,
Literatura y lingüística
Keyword: Procesamiento de datos,
Lingüística aplicada,
Lingüística computacional,
Formas verbales,
Clasificación de verbos,
Similitud
Keyword: Computer science,
Literature and linguistics,
Data processing,
Applied linguistics,
Computing linguistics,
Verb features,
Verd sorting,
Similarity
Full text: Texto completo (Ver HTML)