Measuring the quality of web content using factual information

E. Lex, Michael Völske, M. Errecalde, Edgardo Ferretti, L. Cagnina, Christopher Horn, Benno Stein, M. Granitzer
{"title":"Measuring the quality of web content using factual information","authors":"E. Lex, Michael Völske, M. Errecalde, Edgardo Ferretti, L. Cagnina, Christopher Horn, Benno Stein, M. Granitzer","doi":"10.1145/2184305.2184308","DOIUrl":null,"url":null,"abstract":"Nowadays, many decisions are based on information found in the Web. For the most part, the disseminating sources are not certified, and hence an assessment of the quality and credibility of Web content became more important than ever. With factual density we present a simple statistical quality measure that is based on facts extracted from Web content using Open Information Extraction. In a first case study, we use this measure to identify featured/good articles in Wikipedia. We compare the factual density measure with word count, a measure that has successfully been applied to this task in the past. Our evaluation corroborates the good performance of word count in Wikipedia since featured/good articles are often longer than non-featured. However, for articles of similar lengths the word count measure fails while factual density can separate between them with an F-measure of 90.4%. We also investigate the use of relational features for categorizing Wikipedia articles into featured/good versus non-featured ones. If articles have similar lengths, we achieve an F-measure of 86.7% and 84% otherwise.","PeriodicalId":230983,"journal":{"name":"WebQuality '12","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WebQuality '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2184305.2184308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 47

Abstract

Nowadays, many decisions are based on information found in the Web. For the most part, the disseminating sources are not certified, and hence an assessment of the quality and credibility of Web content became more important than ever. With factual density we present a simple statistical quality measure that is based on facts extracted from Web content using Open Information Extraction. In a first case study, we use this measure to identify featured/good articles in Wikipedia. We compare the factual density measure with word count, a measure that has successfully been applied to this task in the past. Our evaluation corroborates the good performance of word count in Wikipedia since featured/good articles are often longer than non-featured. However, for articles of similar lengths the word count measure fails while factual density can separate between them with an F-measure of 90.4%. We also investigate the use of relational features for categorizing Wikipedia articles into featured/good versus non-featured ones. If articles have similar lengths, we achieve an F-measure of 86.7% and 84% otherwise.
使用事实信息来衡量网络内容的质量
如今,许多决策都是基于在Web上找到的信息。在大多数情况下,传播来源没有经过认证,因此对网络内容的质量和可信度的评估变得比以往任何时候都重要。对于事实密度,我们提出了一个简单的统计质量度量,该度量基于使用开放信息提取从Web内容中提取的事实。在第一个案例研究中,我们使用这种方法来识别维基百科中的特色/优秀文章。我们将事实密度测量与单词计数进行比较,单词计数在过去已经成功地应用于这项任务。我们的评估证实了维基百科中单词计数的良好表现,因为特色/好的文章通常比非特色文章更长。然而,对于长度相似的文章,字数测量失败,而事实密度可以区分它们,f测量值为90.4%。我们还研究了使用关系特征将维基百科文章分类为特色/好与非特色文章。如果文章长度相似,f值为86.7%,否则为84%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信