Evaluating Neural Text Simplification in the Medical Domain

The World Wide Web Conference Pub Date : 2019-05-13 DOI:10.1145/3308558.3313630

Laurens Van den Bercken, Robert-Jan Sips, C. Lofi

引用次数: 48

Abstract

Health literacy, i.e. the ability to read and understand medical text, is a relevant component of public health. Unfortunately, many medical texts are hard to grasp by the general population as they are targeted at highly-skilled professionals and use complex language and domain-specific terms. Here, automatic text simplification making text commonly understandable would be very beneficial. However, research and development into medical text simplification is hindered by the lack of openly available training and test corpora which contain complex medical sentences and their aligned simplified versions. In this paper, we introduce such a dataset to aid medical text simplification research. The dataset is created by filtering aligned health sentences using expert knowledge from an existing aligned corpus and a novel simple, language independent monolingual text alignment method. Furthermore, we use the dataset to train a state-of-the-art neural machine translation model, and compare it to a model trained on a general simplification dataset using an automatic evaluation, and an extensive human-expert evaluation.

查看原文本刊更多论文

评价医学领域的神经文本简化

卫生素养，即阅读和理解医学文献的能力，是公共卫生的一个相关组成部分。不幸的是，许多医学文本很难被普通大众理解，因为它们针对的是高技能的专业人士，使用复杂的语言和特定领域的术语。在这里，自动文本简化使文本易于理解将是非常有益的。然而，医学文本简化的研究和发展受到缺乏公开可用的训练和测试语料库的阻碍，这些语料库包含复杂的医学句子及其对齐的简化版本。在本文中，我们引入了这样一个数据集来帮助医学文本简化研究。该数据集是通过使用现有对齐语料库中的专家知识和一种新颖的简单的、独立于语言的单语文本对齐方法过滤对齐的健康句而创建的。此外，我们使用该数据集来训练最先进的神经机器翻译模型，并将其与使用自动评估和广泛的人类专家评估在一般简化数据集上训练的模型进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The World Wide Web Conference

自引率

0.00%

发文量