Evaluating Neural Text Simplification in the Medical Domain

Laurens Van den Bercken, Robert-Jan Sips, C. Lofi
{"title":"Evaluating Neural Text Simplification in the Medical Domain","authors":"Laurens Van den Bercken, Robert-Jan Sips, C. Lofi","doi":"10.1145/3308558.3313630","DOIUrl":null,"url":null,"abstract":"Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48

Abstract

Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.
评价医学领域的神经文本简化
卫生素养,即阅读和理解医学文献的能力,是公共卫生的一个相关组成部分。不幸的是,许多医学文本很难被普通大众理解,因为它们针对的是高技能的专业人士,使用复杂的语言和特定领域的术语。在这里,自动文本简化使文本易于理解将是非常有益的。然而,医学文本简化的研究和发展受到缺乏公开可用的训练和测试语料库的阻碍,这些语料库包含复杂的医学句子及其对齐的简化版本。在本文中,我们引入了这样一个数据集来帮助医学文本简化研究。该数据集是通过使用现有对齐语料库中的专家知识和一种新颖的简单的、独立于语言的单语文本对齐方法过滤对齐的健康句而创建的。此外,我们使用该数据集来训练最先进的神经机器翻译模型,并将其与使用自动评估和广泛的人类专家评估在一般简化数据集上训练的模型进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信