Assesing the Feature-Driven Nature of Similarity-based Sorting of Verbs



Título del documento: Assesing the Feature-Driven Nature of Similarity-based Sorting of Verbs
Revista: Polibits
Base de datos: PERIÓDICA
Número de sistema: 000358997
ISSN: 1870-9044
Autores: 1
1
1
1
1
Instituciones: 1Norwegian University of Science and Technology, Trondheim, Sor-Trondelag. Noruega
Año:
Periodo: Ene-Jun
Número: 43
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés The paper presents a computational analysis of the results from a sorting task with motion verbs in Norwegian. The sorting behavior of humans rests on the features they use when they compare two or more words. We investigate what these features are and how differential each feature may be in sorting. The key rationale for our method of analysis is the assumption that a sorting task rests on a similarity assessment process. The main idea is that a set of features underlies this similarity judgment, and similarity between two verbs amounts to the sum of the weighted similarity between the given set of features. The computational methodology used to investigate the features is as follows. Based on the frequency of co–occurrence of verbs in the human generated cluster, weights of a given set of features are computed using linear regression. The weights are used, in turn, to compute a similarity matrix between the verbs. This matrix is used as an input for the agglomerative hierarchical clustering. If the selected/projected set of features aligns with the features the participants used when sorting verbs in groups, then the clusters we obtain using this computational method would align with the clusters generated by humans. Otherwise, the method proceeds with modifying the feature set and repeating the process. Features promoting clusters that align with human–generated clusters are evaluated by a set of human experts and the results show that the method manages to identify the appropriate feature sets. This method can be applied in analyzing a variety of data ranging from experimental free production data, to linguistic data from controlled experiments in the assessment of semantic relations and hierarchies within languages and across languages
Disciplinas: Ciencias de la computación,
Literatura y lingüística
Palabras clave: Procesamiento de datos,
Lingüística aplicada,
Lingüística computacional,
Formas verbales,
Clasificación de verbos,
Similitud
Keyword: Computer science,
Literature and linguistics,
Data processing,
Applied linguistics,
Computing linguistics,
Verb features,
Verd sorting,
Similarity
Texto completo: Texto completo (Ver HTML)