维基百科中文章与用户共同分类的框架

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Pub Date : 2010-08-31 DOI:10.1109/WI-IAT.2010.223

Lei Liu, P. Tan

{"title":"维基百科中文章与用户共同分类的框架","authors":"Lei Liu, P. Tan","doi":"10.1109/WI-IAT.2010.223","DOIUrl":null,"url":null,"abstract":"The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A Framework for Co-classification of Articles and Users in Wikipedia\",\"authors\":\"Lei Liu, P. Tan\",\"doi\":\"10.1109/WI-IAT.2010.223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.\",\"PeriodicalId\":340211,\"journal\":{\"name\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2010.223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2010.223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

维基百科的庞大规模以及其内容的易于创建和编辑使维基百科成为各种分类任务的有趣领域，包括主题检测、垃圾邮件检测和破坏行为检测。这些任务通常被转换为基于链接的分类问题，其中文章或用户的类标签是根据其基于内容和基于链接的特征确定的。先前的工作主要集中在分类编辑或文章(但不是两者)。然而，在许多情况下，可以通过了解用户和文章的分类标签来辅助分类(例如，垃圾邮件发送者比非垃圾邮件发送者更有可能发布垃圾邮件内容)。本文提出了一种新的框架来对维基百科条目和编者进行联合分类，假设它们的类别之间存在对应关系。我们的实验结果表明，所提出的协同分类算法优于独立训练的分类器来预测文章和编辑的类别标签。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Framework for Co-classification of Articles and Users in Wikipedia

The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

自引率

0.00%

发文量