{"title":"一种采用分组结构的记录链接接口,可快速收集更丰富的标签","authors":"K. Frisoli, Benjamin LeRoy, Rebecca Nugent","doi":"10.1109/DSAA.2019.00073","DOIUrl":null,"url":null,"abstract":"Linking historical data longitudinally allows researchers to better characterize topics like population mobility, the impact of local / national events, and generational changes. The ideal linking process would involve subject matter experts with detailed information about each record, including any relationships to other records, however, this in-depth process is expensive and often infeasible. Record linkage is the process of identifying and labeling records corresponding to unique entities. These statistical models largely rely on pairwise comparisons, under-utilizing information about group structure and historical knowledge. Moreover, model performance can be limited by using labels of unknown certainty or origin. In record linkage, we are rarely given information about the number of labelers, how often they agreed, or the labeling process itself. Understanding how and why records are linked together for the dual purposes of gaining insights into the human decision-making process and improving record linkage models is an exciting, high impact area of research. We present an interactive labeling interface for use at the initial stages of the (potentially crowdsourced) record linkage process. The interface captures labeled records while tracking the labeler actions. The interface allows labelers to view and interact with the records at both the individual and group level, thereby providing nested labels. We simultaneously receive information about the label certainty and the labeler's decision-making process via repeated label instances and click-streams. We demonstrate the utility of this interface on the recently released, unlabeled 1901 and 1911 Ireland Census records and discuss the benefits of richer labels.","PeriodicalId":416037,"journal":{"name":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Record Linkage Interface That Incorporates Group Structure to Rapidly Collect Richer Labels\",\"authors\":\"K. Frisoli, Benjamin LeRoy, Rebecca Nugent\",\"doi\":\"10.1109/DSAA.2019.00073\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linking historical data longitudinally allows researchers to better characterize topics like population mobility, the impact of local / national events, and generational changes. The ideal linking process would involve subject matter experts with detailed information about each record, including any relationships to other records, however, this in-depth process is expensive and often infeasible. Record linkage is the process of identifying and labeling records corresponding to unique entities. These statistical models largely rely on pairwise comparisons, under-utilizing information about group structure and historical knowledge. Moreover, model performance can be limited by using labels of unknown certainty or origin. In record linkage, we are rarely given information about the number of labelers, how often they agreed, or the labeling process itself. Understanding how and why records are linked together for the dual purposes of gaining insights into the human decision-making process and improving record linkage models is an exciting, high impact area of research. We present an interactive labeling interface for use at the initial stages of the (potentially crowdsourced) record linkage process. The interface captures labeled records while tracking the labeler actions. The interface allows labelers to view and interact with the records at both the individual and group level, thereby providing nested labels. We simultaneously receive information about the label certainty and the labeler's decision-making process via repeated label instances and click-streams. We demonstrate the utility of this interface on the recently released, unlabeled 1901 and 1911 Ireland Census records and discuss the benefits of richer labels.\",\"PeriodicalId\":416037,\"journal\":{\"name\":\"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSAA.2019.00073\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2019.00073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Record Linkage Interface That Incorporates Group Structure to Rapidly Collect Richer Labels
Linking historical data longitudinally allows researchers to better characterize topics like population mobility, the impact of local / national events, and generational changes. The ideal linking process would involve subject matter experts with detailed information about each record, including any relationships to other records, however, this in-depth process is expensive and often infeasible. Record linkage is the process of identifying and labeling records corresponding to unique entities. These statistical models largely rely on pairwise comparisons, under-utilizing information about group structure and historical knowledge. Moreover, model performance can be limited by using labels of unknown certainty or origin. In record linkage, we are rarely given information about the number of labelers, how often they agreed, or the labeling process itself. Understanding how and why records are linked together for the dual purposes of gaining insights into the human decision-making process and improving record linkage models is an exciting, high impact area of research. We present an interactive labeling interface for use at the initial stages of the (potentially crowdsourced) record linkage process. The interface captures labeled records while tracking the labeler actions. The interface allows labelers to view and interact with the records at both the individual and group level, thereby providing nested labels. We simultaneously receive information about the label certainty and the labeler's decision-making process via repeated label instances and click-streams. We demonstrate the utility of this interface on the recently released, unlabeled 1901 and 1911 Ireland Census records and discuss the benefits of richer labels.