{"title":"A Simple Data Augmentation Method to Improve the Performance of Named Entity Recognition Models in Medical Domain","authors":"Abdul Majeed Issifu, M. Ganiz","doi":"10.1109/UBMK52708.2021.9558986","DOIUrl":null,"url":null,"abstract":"Easy Data Augmentation is originally developed for text classification tasks. It consists of four basic methods: Synonym Replacement, Random Insertion, Random Deletion, and Random Swap. They yield accuracy improvements on several deep neural network models. In this study we apply these methods to a new domain. We augment Named Entity Recognition datasets from medical domain. Although the augmentation task is much more difficult due to the nature of named entities which consist of word or word groups in the sentences, we show that we can improve the named entity recognition performance.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Easy Data Augmentation is originally developed for text classification tasks. It consists of four basic methods: Synonym Replacement, Random Insertion, Random Deletion, and Random Swap. They yield accuracy improvements on several deep neural network models. In this study we apply these methods to a new domain. We augment Named Entity Recognition datasets from medical domain. Although the augmentation task is much more difficult due to the nature of named entities which consist of word or word groups in the sentences, we show that we can improve the named entity recognition performance.
Easy Data Augmentation最初是为文本分类任务开发的。它包括四种基本方法:同义词替换、随机插入、随机删除和随机交换。它们在几个深度神经网络模型上提高了精度。在这项研究中,我们将这些方法应用到一个新的领域。我们增强了医学领域的命名实体识别数据集。尽管由于命名实体是由句子中的词或词组组成的,因此增强任务更加困难,但我们表明我们可以提高命名实体的识别性能。