Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu
{"title":"一种结合遗传规划的实体解析方法","authors":"Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu","doi":"10.1109/WISA.2014.46","DOIUrl":null,"url":null,"abstract":"Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties' comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes' pair, leaving no burden of setting thresholds to the user.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"ERGP: A Combined Entity Resolution Approach with Genetic Programming\",\"authors\":\"Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu\",\"doi\":\"10.1109/WISA.2014.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties' comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes' pair, leaving no burden of setting thresholds to the user.\",\"PeriodicalId\":366169,\"journal\":{\"name\":\"2014 11th Web Information System and Application Conference\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th Web Information System and Application Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WISA.2014.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th Web Information System and Application Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2014.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ERGP: A Combined Entity Resolution Approach with Genetic Programming
Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties' comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes' pair, leaving no burden of setting thresholds to the user.