一个以内容-上下文为中心的方法来检测维基百科中的破坏行为

Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu
{"title":"一个以内容-上下文为中心的方法来检测维基百科中的破坏行为","authors":"Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu","doi":"10.4108/ICST.COLLABORATECOM.2013.254059","DOIUrl":null,"url":null,"abstract":"Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.","PeriodicalId":222111,"journal":{"name":"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","volume":"251 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A content-context-centric approach for detecting vandalism in Wikipedia\",\"authors\":\"Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu\",\"doi\":\"10.4108/ICST.COLLABORATECOM.2013.254059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.\",\"PeriodicalId\":222111,\"journal\":{\"name\":\"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"volume\":\"251 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/ICST.COLLABORATECOM.2013.254059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ICST.COLLABORATECOM.2013.254059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

像维基百科这样的协作式在线社交媒体(CSM)应用程序不仅彻底改变了万维网,而且对现代自由社会也产生了巨大的积极影响。不幸的是,维基百科也成为了各种破坏攻击的目标。大多数现有的破坏检测技术依赖于简单的文本特征,如存在辱骂性语言或垃圾邮件。这些技术对复杂的破坏编辑是无效的,这些编辑通常不包含与破坏行为相关的泄密标记。在本文中,我们提出了一种用于破坏检测的上下文感知方法。本文提出了一个内容-上下文感知的破坏检测框架。主要的想法是量化编辑中包含的单词与维基百科文章的主题和现有内容的匹配程度。为此,我们提出了两个新的度量标准,称为WWW共现概率和顶级共现概率。我们还开发了评估这两个指标的有效机制,以及利用这些指标的基于机器学习的方案。本文给出了一系列实验来证明所提出方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A content-context-centric approach for detecting vandalism in Wikipedia
Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信