S. Milusheva, R. Marty, Guadalupe Bedoya, Elizabeth Resor, Sarah Williams, Arianna Legovini
{"title":"众包能创造缺失的坠机数据吗?","authors":"S. Milusheva, R. Marty, Guadalupe Bedoya, Elizabeth Resor, Sarah Williams, Arianna Legovini","doi":"10.1145/3378393.3402264","DOIUrl":null,"url":null,"abstract":"UPDATED---June 1, 2020. Road traffic crashes (RTCs) are the primary cause of death among children and young adults. Yet data on RTCs is incomplete, hindering effective road safety policymaking in many developing countries where mortality is purportedly highest. We web-scrape 850,000 tweets to create crash data and develop a machine learning algorithm to geolocate RTCs. Our algorithm is nearly twice as precise as a standard geoparsing algorithm in identifying the set of locations that include the crash location. Above and beyond, it identifies the unique location of a crash from the set of possible locations in a majority of cases. We dispatch a set of motorcycle drivers to the site of the presumed crash in real time to verify the validity of the crowdsourced data and document the performance of the algorithm. The study can be used as a proof of concept for countries interested to improve RTC data at low cost through a machine learning approach and substantially increase the data available to analyze RTCs and prioritize road safety policies.","PeriodicalId":176951,"journal":{"name":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Can crowdsourcing create the missing crash data?\",\"authors\":\"S. Milusheva, R. Marty, Guadalupe Bedoya, Elizabeth Resor, Sarah Williams, Arianna Legovini\",\"doi\":\"10.1145/3378393.3402264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"UPDATED---June 1, 2020. Road traffic crashes (RTCs) are the primary cause of death among children and young adults. Yet data on RTCs is incomplete, hindering effective road safety policymaking in many developing countries where mortality is purportedly highest. We web-scrape 850,000 tweets to create crash data and develop a machine learning algorithm to geolocate RTCs. Our algorithm is nearly twice as precise as a standard geoparsing algorithm in identifying the set of locations that include the crash location. Above and beyond, it identifies the unique location of a crash from the set of possible locations in a majority of cases. We dispatch a set of motorcycle drivers to the site of the presumed crash in real time to verify the validity of the crowdsourced data and document the performance of the algorithm. The study can be used as a proof of concept for countries interested to improve RTC data at low cost through a machine learning approach and substantially increase the data available to analyze RTCs and prioritize road safety policies.\",\"PeriodicalId\":176951,\"journal\":{\"name\":\"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3378393.3402264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378393.3402264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
UPDATED---June 1, 2020. Road traffic crashes (RTCs) are the primary cause of death among children and young adults. Yet data on RTCs is incomplete, hindering effective road safety policymaking in many developing countries where mortality is purportedly highest. We web-scrape 850,000 tweets to create crash data and develop a machine learning algorithm to geolocate RTCs. Our algorithm is nearly twice as precise as a standard geoparsing algorithm in identifying the set of locations that include the crash location. Above and beyond, it identifies the unique location of a crash from the set of possible locations in a majority of cases. We dispatch a set of motorcycle drivers to the site of the presumed crash in real time to verify the validity of the crowdsourced data and document the performance of the algorithm. The study can be used as a proof of concept for countries interested to improve RTC data at low cost through a machine learning approach and substantially increase the data available to analyze RTCs and prioritize road safety policies.