维基百科中文章与用户共同分类的框架

Lei Liu, P. Tan
{"title":"维基百科中文章与用户共同分类的框架","authors":"Lei Liu, P. Tan","doi":"10.1109/WI-IAT.2010.223","DOIUrl":null,"url":null,"abstract":"The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.","PeriodicalId":340211,"journal":{"name":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A Framework for Co-classification of Articles and Users in Wikipedia\",\"authors\":\"Lei Liu, P. Tan\",\"doi\":\"10.1109/WI-IAT.2010.223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.\",\"PeriodicalId\":340211,\"journal\":{\"name\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2010.223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2010.223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

维基百科的庞大规模以及其内容的易于创建和编辑使维基百科成为各种分类任务的有趣领域,包括主题检测、垃圾邮件检测和破坏行为检测。这些任务通常被转换为基于链接的分类问题,其中文章或用户的类标签是根据其基于内容和基于链接的特征确定的。先前的工作主要集中在分类编辑或文章(但不是两者)。然而,在许多情况下,可以通过了解用户和文章的分类标签来辅助分类(例如,垃圾邮件发送者比非垃圾邮件发送者更有可能发布垃圾邮件内容)。本文提出了一种新的框架来对维基百科条目和编者进行联合分类,假设它们的类别之间存在对应关系。我们的实验结果表明,所提出的协同分类算法优于独立训练的分类器来预测文章和编辑的类别标签。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Framework for Co-classification of Articles and Users in Wikipedia
The massive size of Wikipedia and the ease with which its content can be created and edited has made Wikipedia an interesting domain for a variety of classification tasks, including topic detection, spam detection, and vandalism detection. These tasks are typically cast into a link-based classification problem, in which the class label of an article or a user is determined from its content-based and link-based features. Prior works have focused primarily on classifying either the editors or the articles (but not both). Yet there are many situations in which the classification can be aided by knowing collectively the class labels of the users and articles (e.g., spammers are more likely to post spam content than non-spammers). This paper presents a novel framework to jointly classify the Wikipedia articles and editors, assuming there are correspondences between their classes. Our experimental results demonstrate that the proposed co-classification algorithm outperforms classifiers that are trained independently to predict the class labels of articles and editors.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信