基于用户嵌入的OpenStreetMap故意破坏检测

Yinxiao Li, T. J. Anderson, Yiqi Niu
{"title":"基于用户嵌入的OpenStreetMap故意破坏检测","authors":"Yinxiao Li, T. J. Anderson, Yiqi Niu","doi":"10.1145/3459637.3482213","DOIUrl":null,"url":null,"abstract":"OpenStreetMap (OSM) is a free and openly-editable database of geographic information. Over the years, OSM has evolved into the world's largest open knowledge base of geospatial data, and protecting OSM from the risk of vandalized and falsified information has become paramount to ensuring its continued success. However, despite the increasing usage of OSM and a wide interest in vandalism detection on open knowledge bases such as Wikipedia and Wikidata, OSM has not attracted as much attention from the research community, partially due to a lack of publicly available vandalism corpus. In this paper, we report on the construction of the first OSM vandalism corpus, and release it publicly. We describe a user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM. We validate the model against our vandalism corpus, and observe solid improvements in key metrics. The validated model is deployed into production for vandalism detection on Daylight Map.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"117 11","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Vandalism Detection in OpenStreetMap via User Embeddings\",\"authors\":\"Yinxiao Li, T. J. Anderson, Yiqi Niu\",\"doi\":\"10.1145/3459637.3482213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenStreetMap (OSM) is a free and openly-editable database of geographic information. Over the years, OSM has evolved into the world's largest open knowledge base of geospatial data, and protecting OSM from the risk of vandalized and falsified information has become paramount to ensuring its continued success. However, despite the increasing usage of OSM and a wide interest in vandalism detection on open knowledge bases such as Wikipedia and Wikidata, OSM has not attracted as much attention from the research community, partially due to a lack of publicly available vandalism corpus. In this paper, we report on the construction of the first OSM vandalism corpus, and release it publicly. We describe a user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM. We validate the model against our vandalism corpus, and observe solid improvements in key metrics. The validated model is deployed into production for vandalism detection on Daylight Map.\",\"PeriodicalId\":405296,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"volume\":\"117 11\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459637.3482213\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

OpenStreetMap (OSM)是一个免费且开放编辑的地理信息数据库。多年来,OSM已发展成为世界上最大的地理空间数据开放知识库,保护OSM免受破坏和伪造信息的风险已成为确保其持续成功的最重要因素。然而,尽管OSM的使用越来越多,并且对维基百科和维基数据等开放知识库的破坏检测产生了广泛的兴趣,但OSM并没有引起研究界的太多关注,部分原因是缺乏公开的破坏语料库。在本文中,我们报告了第一个OSM故意破坏语料库的构建,并向公众发布。我们描述了一种用户嵌入方法来创建OSM用户嵌入,并将嵌入特征添加到机器学习模型中,以改进OSM中的破坏检测。我们根据我们的破坏语料库验证模型,并观察到关键指标的坚实改进。经过验证的模型已投入生产,用于日光地图上的破坏检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Vandalism Detection in OpenStreetMap via User Embeddings
OpenStreetMap (OSM) is a free and openly-editable database of geographic information. Over the years, OSM has evolved into the world's largest open knowledge base of geospatial data, and protecting OSM from the risk of vandalized and falsified information has become paramount to ensuring its continued success. However, despite the increasing usage of OSM and a wide interest in vandalism detection on open knowledge bases such as Wikipedia and Wikidata, OSM has not attracted as much attention from the research community, partially due to a lack of publicly available vandalism corpus. In this paper, we report on the construction of the first OSM vandalism corpus, and release it publicly. We describe a user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM. We validate the model against our vandalism corpus, and observe solid improvements in key metrics. The validated model is deployed into production for vandalism detection on Daylight Map.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信