一个以内容-上下文为中心的方法来检测维基百科中的破坏行为

9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing Pub Date : 2013-12-12 DOI:10.4108/ICST.COLLABORATECOM.2013.254059

Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu

{"title":"一个以内容-上下文为中心的方法来检测维基百科中的破坏行为","authors":"Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu","doi":"10.4108/ICST.COLLABORATECOM.2013.254059","DOIUrl":null,"url":null,"abstract":"Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.","PeriodicalId":222111,"journal":{"name":"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","volume":"251 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A content-context-centric approach for detecting vandalism in Wikipedia\",\"authors\":\"Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Kang Li, C. Pu\",\"doi\":\"10.4108/ICST.COLLABORATECOM.2013.254059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.\",\"PeriodicalId\":222111,\"journal\":{\"name\":\"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"volume\":\"251 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/ICST.COLLABORATECOM.2013.254059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ICST.COLLABORATECOM.2013.254059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

像维基百科这样的协作式在线社交媒体(CSM)应用程序不仅彻底改变了万维网，而且对现代自由社会也产生了巨大的积极影响。不幸的是，维基百科也成为了各种破坏攻击的目标。大多数现有的破坏检测技术依赖于简单的文本特征，如存在辱骂性语言或垃圾邮件。这些技术对复杂的破坏编辑是无效的，这些编辑通常不包含与破坏行为相关的泄密标记。在本文中，我们提出了一种用于破坏检测的上下文感知方法。本文提出了一个内容-上下文感知的破坏检测框架。主要的想法是量化编辑中包含的单词与维基百科文章的主题和现有内容的匹配程度。为此，我们提出了两个新的度量标准，称为WWW共现概率和顶级共现概率。我们还开发了评估这两个指标的有效机制，以及利用这些指标的基于机器学习的方案。本文给出了一系列实验来证明所提出方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A content-context-centric approach for detecting vandalism in Wikipedia

Collaborative online social media (CSM) applications such as Wikipedia have not only revolutionized the World Wide Web, but they also have had a hugely positive effect on modern free societies. Unfortunately, Wikipedia has also become target to a wide-variety of vandalism attacks. Most existing vandalism detection techniques rely upon simple textual features such as existence of abusive language or spammy words. These techniques are ineffective against sophisticated vandal edits, which often do not contain the tell-tale markers associated with vandalism. In this paper, we argue for a context-aware approach for vandalism detection. This paper proposes a content-context-aware vandalism detection framework. The main idea is to quantify how well the words contained in the edit fit into the topic and the existing content of the Wikipedia article. We present two novel metrics, called WWW co-occurrence probability and top-ranked co-occurrence probability for this purpose. We also develop efficient mechanisms for evaluating these two metrics, and machine learning-based schemes that utilize these metrics. The paper presents a range of experiments to demonstrate the effectiveness of the proposed approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing

自引率

0.00%

发文量