Maria del Pilar Angeles, Luis Fernando Perez-Franco
{"title":"重复数据删除过程中字符串编码函数的分析","authors":"Maria del Pilar Angeles, Luis Fernando Perez-Franco","doi":"10.1109/ICIEV.2015.7333979","DOIUrl":null,"url":null,"abstract":"The present research is aimed to help users to identify which encoding functions are more effective than others in terms of data matching quality. Therefore, we have carried out an evaluation of data matching considering Soundex Daitch-Mokotoff [1], Fuzzy Soundex [2] and Modified Soundex [3] in terms of precision-recall and f-measure. As far as we know such comparison against these encoding functions has not been presented before in such terms. The Daitch Mokotoff Soundex function has a better performance during string comparison. However, its execution time is greater than Fuzzy Soundex and Modified Soundex. This is justificable since the Daitch-Mokotoff Soundex algorithm is more complex and meticulous.","PeriodicalId":367355,"journal":{"name":"2015 International Conference on Informatics, Electronics & Vision (ICIEV)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Analysis of string encoding functions during de-duplication process\",\"authors\":\"Maria del Pilar Angeles, Luis Fernando Perez-Franco\",\"doi\":\"10.1109/ICIEV.2015.7333979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present research is aimed to help users to identify which encoding functions are more effective than others in terms of data matching quality. Therefore, we have carried out an evaluation of data matching considering Soundex Daitch-Mokotoff [1], Fuzzy Soundex [2] and Modified Soundex [3] in terms of precision-recall and f-measure. As far as we know such comparison against these encoding functions has not been presented before in such terms. The Daitch Mokotoff Soundex function has a better performance during string comparison. However, its execution time is greater than Fuzzy Soundex and Modified Soundex. This is justificable since the Daitch-Mokotoff Soundex algorithm is more complex and meticulous.\",\"PeriodicalId\":367355,\"journal\":{\"name\":\"2015 International Conference on Informatics, Electronics & Vision (ICIEV)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Informatics, Electronics & Vision (ICIEV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIEV.2015.7333979\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Informatics, Electronics & Vision (ICIEV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIEV.2015.7333979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis of string encoding functions during de-duplication process
The present research is aimed to help users to identify which encoding functions are more effective than others in terms of data matching quality. Therefore, we have carried out an evaluation of data matching considering Soundex Daitch-Mokotoff [1], Fuzzy Soundex [2] and Modified Soundex [3] in terms of precision-recall and f-measure. As far as we know such comparison against these encoding functions has not been presented before in such terms. The Daitch Mokotoff Soundex function has a better performance during string comparison. However, its execution time is greater than Fuzzy Soundex and Modified Soundex. This is justificable since the Daitch-Mokotoff Soundex algorithm is more complex and meticulous.