JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers

Akio Hayakawa, Tomoyuki Kajiwara, Hiroki Ouchi, Taro Watanabe
{"title":"JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers","authors":"Akio Hayakawa, Tomoyuki Kajiwara, Hiroki Ouchi, Taro Watanabe","doi":"10.18653/v1/2022.tsar-1.17","DOIUrl":null,"url":null,"abstract":"The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.tsar-1.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.
JADES:针对非母语人士的日语文本简化新数据集
文本简化的用户依赖性使其评价模糊不清。有针对性的评估数据集阐明了简化的目的,尽管它的规范很难定义。我们根据公开的词汇和语法配置文件建立了JADES(日语简化评估数据集),这是一个针对非日语母语人士的文本简化数据集。JADES包括3907个由专家注释的复杂-简单句子对。对JADES的分析表明,通过简化,应用了广泛且多次的重写操作。此外,我们分析了来自几个基准系统的JADES输出以及它们的自动和手动分数。这些分析结果突出了英语和日语在操作和评价方面的差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信