Lexical Normalization of User-Generated Medical Text

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task Pub Date : 2019-08-01 DOI:10.18653/v1/W19-3202

A. Dirkson, S. Verberne, G. van Oortmerssen, Wessel Kraaij

引用次数: 2

Abstract

In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.

查看原文本刊更多论文

用户生成医学文本的词汇规范化

在医学领域，用户生成的社交媒体文本越来越多地被用作科学医学文献的有价值的补充知识来源。口语语言的使用和拼写错误使这种知识的提取变得复杂。然而，这些数据的词法规范化还没有得到适当的解决。本文提出了一种用于医疗社交媒体的无监督、数据驱动的拼写纠正模块。我们的方法优于最先进的拼写纠正，可以检测错误，F0.5为0.888。此外，我们提出了一个新的语料库拼写错误的检测和纠正在医疗病人论坛。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

自引率

0.00%

发文量