String Distances for Near-duplicate Detection



Document title: String Distances for Near-duplicate Detection
Journal: Polibits
Database: PERIÓDICA
System number: 000355835
ISSN: 1870-9044
Authors: 1
1
1
2
Institutions: 1University of Bucharest, Faculty of Mathematics and Computer Science, Bucarest. Rumania
2University of Bucharest, Faculty of Foreign Languages and Literatures, Bucarest. Rumania
Year:
Number: 45
Country: México
Language: Inglés
Document type: Artículo
Approach: Analítico, descriptivo
English abstract Near-duplicate detection is important when dealing with large, noisy databases in data mining tasks. In this paper, we present the results of applying the Rank distance and the Smith-Waterman distance, along with more popular string similarity measures such as the Levenshtein distance, together with a disjoint set data structure, for the problem of near-duplicate detection
Disciplines: Ciencias de la computación
Keyword: Procesamiento de datos,
Análisis y sistematización de la información,
Minería de datos,
Detección de duplicados,
Similitud de cadenas,
Bases de datos
Keyword: Computer science,
Data processing,
Information analysis,
Data mining,
Duplicates detection,
String similarity,
Data bases
Full text: Texto completo (Ver HTML)