通过修订元数据的时空分析来检测维基百科的破坏行为?

European Workshop on System Security Pub Date : 2010-04-13 DOI:10.1145/1752046.1752050

Andrew G. West, Sampath Kannan, Insup Lee

{"title":"通过修订元数据的时空分析来检测维基百科的破坏行为?","authors":"Andrew G. West, Sampath Kannan, Insup Lee","doi":"10.1145/1752046.1752050","DOIUrl":null,"url":null,"abstract":"Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust.\n In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.","PeriodicalId":302603,"journal":{"name":"European Workshop on System Security","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"92","resultStr":"{\"title\":\"Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?\",\"authors\":\"Andrew G. West, Sampath Kannan, Insup Lee\",\"doi\":\"10.1145/1752046.1752050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust.\\n In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.\",\"PeriodicalId\":302603,\"journal\":{\"name\":\"European Workshop on System Security\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"92\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Workshop on System Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1752046.1752050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Workshop on System Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1752046.1752050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 92

摘要

毫无成效的编辑破坏了协同编辑的百科全书——维基百科的质量。它们不仅传播不诚实和冒犯性的内容，还迫使编辑浪费时间来纠正这种破坏行为。语言处理已被用于对抗这些恶意编辑，但与电子邮件垃圾邮件一样，这些过滤器是可避免的，并且计算复杂。与此同时，最近的研究表明，空间和时间特征可以有效地减少垃圾邮件，同时轻量级和健壮性。在本文中，我们利用修订元数据的时空属性来检测维基百科上的破坏行为。一种称为回滚的管理形式的回滚允许标记恶意编辑，这在许多方面与非冒犯性编辑形成对比。至关重要的是，这些功能都不需要检查文章或修改文本。最终，我们生成了一个分类器，该分类器将破坏行为标记为与我们打算补充的自然语言工作相当的性能(85%的准确率和50%的召回率)。分类器是可扩展的(每秒处理100多个编辑)，并已用于定位超过5000个人工确认的破坏事件，超出了我们的标签集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?

Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Workshop on System Security

自引率

0.00%

发文量