On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia

Edgardo Ferretti, M. Errecalde, Maik Anderka, Benno Stein
{"title":"On the Use of Reliable-Negatives Selection Strategies in the PU Learning Approach for Quality Flaws Prediction in Wikipedia","authors":"Edgardo Ferretti, M. Errecalde, Maik Anderka, Benno Stein","doi":"10.1109/DEXA.2014.52","DOIUrl":null,"url":null,"abstract":"Learning from positive and unlabeled examples (PU learning) has proven to be an effective method in several Web mining applications. In particular, in the 1st International Competition on Quality Flaw Prediction in Wikipedia in 2012, a tailored PU learning approach performed best amongst the competitors. A key feature of that approach is the introduction of sampling strategies within the original PU learning procedure. The paper in hand revisits the winner approach of 2012 and elaborates on neglected aspects in order to provide evidence for the usefulness of sampling in PU learning. In this regard, we propose a modification to this PU learning approach, and we show how the different sampling strategies affect the flaw prediction effectiveness. Our analysis is based on the original evaluation corpus of the 2012-competition on quality flaw prediction. A main outcome is that under the best sampling strategy, our new modified version of PU learning increases in average the flaw prediction effectiveness by 18.31%, when compared against the winning approach of the competition.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 25th International Workshop on Database and Expert Systems Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2014.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Learning from positive and unlabeled examples (PU learning) has proven to be an effective method in several Web mining applications. In particular, in the 1st International Competition on Quality Flaw Prediction in Wikipedia in 2012, a tailored PU learning approach performed best amongst the competitors. A key feature of that approach is the introduction of sampling strategies within the original PU learning procedure. The paper in hand revisits the winner approach of 2012 and elaborates on neglected aspects in order to provide evidence for the usefulness of sampling in PU learning. In this regard, we propose a modification to this PU learning approach, and we show how the different sampling strategies affect the flaw prediction effectiveness. Our analysis is based on the original evaluation corpus of the 2012-competition on quality flaw prediction. A main outcome is that under the best sampling strategy, our new modified version of PU learning increases in average the flaw prediction effectiveness by 18.31%, when compared against the winning approach of the competition.
基于PU学习方法的可靠负选择策略在维基百科质量缺陷预测中的应用
从正样例和未标记样例中学习(PU学习)已被证明是许多Web挖掘应用程序中的一种有效方法。特别是,在2012年第一届维基百科质量缺陷预测国际竞赛中,定制的PU学习方法在竞争对手中表现最好。该方法的一个关键特征是在原始PU学习过程中引入了抽样策略。这篇论文回顾了2012年的赢家方法,并详细阐述了被忽视的方面,以便为采样在PU学习中的有用性提供证据。在这方面,我们提出了一种改进的PU学习方法,并展示了不同的采样策略如何影响缺陷预测的有效性。我们的分析基于2012年质量缺陷预测大赛的原始评价语料库。一个主要结果是,在最佳采样策略下,与竞争对手的获胜方法相比,我们的新改进版本的PU学习平均提高了18.31%的缺陷预测效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信