Keywords Identification within Greek URLs



Título del documento: Keywords Identification within Greek URLs
Revista: Polibits
Base de datos: PERIÓDICA
Número de sistema: 000359040
ISSN: 1870-9044
Autores: 1
1
1
Instituciones: 1Patras University, Computer Engineering and Informatics Department, Patras. Grecia
Año:
Periodo: Ene-Jun
Número: 43
País: México
Idioma: Inglés
Tipo de documento: Artículo
Enfoque: Analítico, descriptivo
Resumen en inglés In this paper we propose a method that identifies and extracts keywords within URLs, focusing on the Greek Web and especially on URLs containing Greek terms. Although there are previous works on how to process Greek online content, none of them focuses on keyword identification within URLs of the Greek web domain. In addition, there are many known techniques for web page categorization based on URLs but, none addresses the case of URLs containing transliterated Greek terms. The proposed method integrates two components; a URL tokenizer that segments URL tokens into meaningful words and a Latin–to–Greek script transliteration engine that relies on a dictionary and a set of orthographic and syntactic rules for converting Latin verbalized word tokens into Greek terms. The experimental evaluation of our method against a sample of 1,000 Greek URLs reveals that it can be fruitfully exploited towards automatic keyword identification within Greek URLs
Disciplinas: Ciencias de la computación,
Literatura y lingüística
Palabras clave: Procesamiento de datos,
Lingüística aplicada,
Lingüística computacional,
Transliteración,
Palabras clave,
Segmentación de palabras,
Localizador uniforme de recursos
Keyword: Computer science,
Literature and linguistics,
Data processing,
Applied linguistics,
Computing linguistics,
Transliteration,
Keywords,
Word segmentation,
Uniform resource locator
Texto completo: Texto completo (Ver HTML)