{"title":"Methods of entity resolution in dataspaces","authors":"Yuelin Jia, Wei Lu, Chang Su","doi":"10.1117/12.2692046","DOIUrl":null,"url":null,"abstract":"Dataspace is a new way of data integration. Entity resolution identifies two records that point to the same entity in the real world. In this paper, a record graph is constructed by using the records in the data set. The redundant comparisons are removed by pruning the record graph, and the records is divided into blocks according to the pruned graph. The subsequent entity resolution work is only carried out in blocks. When the entity is parsed in the block, the method of attribute mapping and expression representing attribute value is used to further divide the data to ensure the accuracy of parsing. Methods experiments were carried out on real data sets.","PeriodicalId":361127,"journal":{"name":"International Conference on Images, Signals, and Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Images, Signals, and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2692046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dataspace is a new way of data integration. Entity resolution identifies two records that point to the same entity in the real world. In this paper, a record graph is constructed by using the records in the data set. The redundant comparisons are removed by pruning the record graph, and the records is divided into blocks according to the pruned graph. The subsequent entity resolution work is only carried out in blocks. When the entity is parsed in the block, the method of attribute mapping and expression representing attribute value is used to further divide the data to ensure the accuracy of parsing. Methods experiments were carried out on real data sets.