Thomas C. van Dijk, Norbert Fischer, Bernhard Häussner
{"title":"众包数据的算法改进:内在质量度量、局部最优和共识","authors":"Thomas C. van Dijk, Norbert Fischer, Bernhard Häussner","doi":"10.1145/3397536.3422260","DOIUrl":null,"url":null,"abstract":"Raw crowdsourced data is often of questionable quality. The typical solution to this is redundancy: ask multiple independent participants the same question and take some form of majority answer. However, this can be wasteful in terms of human effort. In this paper we show that algorithmic analysis of the data is able to get higher quality results out of a given amount of crowd effort (or alternatively, that less crowd effort would have sufficed for the same level of quality). Our case study is based on a publicly available crowdsourced data set by the New York Public Library, featuring building footprints in historical insurance atlases. Besides evaluating the quality improvement achieved by our methods, we provide both a command line interface for batch-mode processing and an interactive web interface; both work with standard data formats and are available as open source software.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"52 357 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Algorithmic Improvement of Crowdsourced Data: Intrinsic Quality Measures, Local Optima, and Consensus\",\"authors\":\"Thomas C. van Dijk, Norbert Fischer, Bernhard Häussner\",\"doi\":\"10.1145/3397536.3422260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Raw crowdsourced data is often of questionable quality. The typical solution to this is redundancy: ask multiple independent participants the same question and take some form of majority answer. However, this can be wasteful in terms of human effort. In this paper we show that algorithmic analysis of the data is able to get higher quality results out of a given amount of crowd effort (or alternatively, that less crowd effort would have sufficed for the same level of quality). Our case study is based on a publicly available crowdsourced data set by the New York Public Library, featuring building footprints in historical insurance atlases. Besides evaluating the quality improvement achieved by our methods, we provide both a command line interface for batch-mode processing and an interactive web interface; both work with standard data formats and are available as open source software.\",\"PeriodicalId\":233918,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Advances in Geographic Information Systems\",\"volume\":\"52 357 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Advances in Geographic Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3397536.3422260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397536.3422260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Algorithmic Improvement of Crowdsourced Data: Intrinsic Quality Measures, Local Optima, and Consensus
Raw crowdsourced data is often of questionable quality. The typical solution to this is redundancy: ask multiple independent participants the same question and take some form of majority answer. However, this can be wasteful in terms of human effort. In this paper we show that algorithmic analysis of the data is able to get higher quality results out of a given amount of crowd effort (or alternatively, that less crowd effort would have sufficed for the same level of quality). Our case study is based on a publicly available crowdsourced data set by the New York Public Library, featuring building footprints in historical insurance atlases. Besides evaluating the quality improvement achieved by our methods, we provide both a command line interface for batch-mode processing and an interactive web interface; both work with standard data formats and are available as open source software.