A new readability measure for web documents and its evaluation on an effective web search engine

Yume Sasaki, Takuya Komatsuda, Atsushi Keyaki, Jun Miyazaki
{"title":"A new readability measure for web documents and its evaluation on an effective web search engine","authors":"Yume Sasaki, Takuya Komatsuda, Atsushi Keyaki, Jun Miyazaki","doi":"10.1145/3011141.3011172","DOIUrl":null,"url":null,"abstract":"In this study, we propose a readability measure for Web documents and an information retrieval system that considers readability. Previous information retrieval systems aim to identify documents that are relevant to a given query; however, as information requirements of search system users becomes increasingly diverse and complicated, systems that take such new criteria into account are constantly being introduced. In particular, the focus of our present paper is on readability. Given that the population of non-native English speakers exceeds that of native English speakers, incorporating readability into an information retrieval system is crucial. Therefore, we propose (1) a readability measure that considers document simplicity and document structure as new features for readability and (2) a score fusion method that combines relevance and readability scores. In our experimental results, we found that our proposed readability measure outperformed an existing readability measure. Moreover, we found score fusion methods using a statistical framework called a copula improved overall accuracy as compared to such existing methods as linear combination.","PeriodicalId":247823,"journal":{"name":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3011141.3011172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this study, we propose a readability measure for Web documents and an information retrieval system that considers readability. Previous information retrieval systems aim to identify documents that are relevant to a given query; however, as information requirements of search system users becomes increasingly diverse and complicated, systems that take such new criteria into account are constantly being introduced. In particular, the focus of our present paper is on readability. Given that the population of non-native English speakers exceeds that of native English speakers, incorporating readability into an information retrieval system is crucial. Therefore, we propose (1) a readability measure that considers document simplicity and document structure as new features for readability and (2) a score fusion method that combines relevance and readability scores. In our experimental results, we found that our proposed readability measure outperformed an existing readability measure. Moreover, we found score fusion methods using a statistical framework called a copula improved overall accuracy as compared to such existing methods as linear combination.
一种新的网络文档可读性度量方法及其在一个有效的网络搜索引擎上的评价
在这项研究中,我们提出了一个Web文档的可读性度量和一个考虑可读性的信息检索系统。以前的信息检索系统旨在识别与给定查询相关的文档;然而,随着搜索系统用户的信息需求日益多样化和复杂化,考虑到这些新标准的系统不断被引入。特别地,我们当前论文的重点是可读性。鉴于非英语为母语的人口超过了英语为母语的人口,将可读性纳入信息检索系统是至关重要的。因此,我们提出(1)一种将文档简单性和文档结构作为可读性新特征的可读性度量方法,以及(2)一种结合相关性和可读性分数的评分融合方法。在我们的实验结果中,我们发现我们提出的可读性度量优于现有的可读性度量。此外,我们发现,与现有的线性组合等方法相比,使用称为copula的统计框架的分数融合方法提高了整体准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信