Kun Qian, D. Burdick, Sairam Gurajada, Lucian Popa
{"title":"使用SystemER学习小型企业数据的可解释实体解析算法","authors":"Kun Qian, D. Burdick, Sairam Gurajada, Lucian Popa","doi":"10.1145/3336499.3338010","DOIUrl":null,"url":null,"abstract":"The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.","PeriodicalId":148424,"journal":{"name":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER\",\"authors\":\"Kun Qian, D. Burdick, Sairam Gurajada, Lucian Popa\",\"doi\":\"10.1145/3336499.3338010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.\",\"PeriodicalId\":148424,\"journal\":{\"name\":\"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3336499.3338010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3336499.3338010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER
The 2019 FEIII CALI data challenge aims at linking different representations of the same real-world entities across multiple public datasets that collect identification and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classifies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of laborintensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human effort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.