Is Simple English Wikipedia As Simple And Easy-to-Understand As We Expect It To Be?

Sanja Štajner, Sergiu Nisioi, Daniel Ibanez
{"title":"Is Simple English Wikipedia As Simple And Easy-to-Understand As We Expect It To Be?","authors":"Sanja Štajner, Sergiu Nisioi, Daniel Ibanez","doi":"10.1145/3439231.3439263","DOIUrl":null,"url":null,"abstract":"Conceptual complexity of a written text plays an important role in maintaining reader's interest in reading it. Therefore, automatic text simplification systems should, apart from considering lexical and syntactic complexity of a text, also consider the conceptual complexity. In this study, we analyze and compare two widely used English text simplification corpora, one professionally produced (Newsela) and the other collaboratively made by amateurs and enthusiasts (English Wikipedia–Simple English Wikipedia), focusing on 19 conceptual complexity features. The results indicated that simplification operations made during the production of Simple English Wikipedia in many cases do not follow the patterns of the professionally simplified corpora, thus casting doubts on adequacy of using Simple English Wikipedia as training material for automatic text simplification systems.","PeriodicalId":210400,"journal":{"name":"Proceedings of the 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3439231.3439263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Conceptual complexity of a written text plays an important role in maintaining reader's interest in reading it. Therefore, automatic text simplification systems should, apart from considering lexical and syntactic complexity of a text, also consider the conceptual complexity. In this study, we analyze and compare two widely used English text simplification corpora, one professionally produced (Newsela) and the other collaboratively made by amateurs and enthusiasts (English Wikipedia–Simple English Wikipedia), focusing on 19 conceptual complexity features. The results indicated that simplification operations made during the production of Simple English Wikipedia in many cases do not follow the patterns of the professionally simplified corpora, thus casting doubts on adequacy of using Simple English Wikipedia as training material for automatic text simplification systems.
简单的英文维基百科像我们期望的那样简单易懂吗?
书面文本的概念复杂性对保持读者的阅读兴趣起着重要的作用。因此,文本自动简化系统除了要考虑文本的词汇和句法复杂性外,还要考虑文本的概念复杂性。在这项研究中,我们分析和比较了两个广泛使用的英语文本简化语料库,一个是专业制作的(Newsela),另一个是由业余爱好者和爱好者合作制作的(英语维基百科-简单英语维基百科),重点关注19个概念复杂性特征。结果表明,在简单英语维基百科制作过程中进行的简化操作在很多情况下并没有遵循专业简化语料库的模式,从而对使用简单英语维基百科作为自动文本简化系统的训练材料的充分性产生了怀疑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信