{"title":"Entity Resolution Using Logistic Regression as an extension to the Rule-Based Oyster System","authors":"Fumiko Kobayashi, Aziz Eram, J. Talburt","doi":"10.1109/MIPR.2018.00033","DOIUrl":null,"url":null,"abstract":"This paper describes two experiments in entity resolution. In both experiments, person references were classified as \"linked\" or \"not linked\" by two different methods. The first method used an entity resolution (ER) system and employed standard \"if-then\" Boolean matching rules. The second method used the supervised machine learning technique of logistic regression to classify the references as \"linked\" or \"not linked\". The objective of the experiments was to compare the linking performance of both methods to evaluate the effectiveness of logistic regression as an extension to the existing match functions provided in the OYSTER ER System. One experiment used actual school enrollment data and the other used synthetic data. In both cases the performance of the logistic regression classification compared favorably with rule-based results.","PeriodicalId":320000,"journal":{"name":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MIPR.2018.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper describes two experiments in entity resolution. In both experiments, person references were classified as "linked" or "not linked" by two different methods. The first method used an entity resolution (ER) system and employed standard "if-then" Boolean matching rules. The second method used the supervised machine learning technique of logistic regression to classify the references as "linked" or "not linked". The objective of the experiments was to compare the linking performance of both methods to evaluate the effectiveness of logistic regression as an extension to the existing match functions provided in the OYSTER ER System. One experiment used actual school enrollment data and the other used synthetic data. In both cases the performance of the logistic regression classification compared favorably with rule-based results.