A survey of data augmentation in named entity recognition

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yi Huang , Yuhan Gao , Chengjuan Ren
{"title":"A survey of data augmentation in named entity recognition","authors":"Yi Huang ,&nbsp;Yuhan Gao ,&nbsp;Chengjuan Ren","doi":"10.1016/j.neucom.2025.130856","DOIUrl":null,"url":null,"abstract":"<div><div>Data augmentation (DA), initially prominent in Computer Vision (CV), has been successfully adapted to Natural Language Processing (NLP), proving effective in mitigating data scarcity problems in the context of few-shot settings or scenarios where deep learning techniques may underperform. Moreover, the primary goal of DA is to expand and diversify training datasets by different methods, enabling models to generate more diverse and high-quality sythetic data for training the NER models. This survey explored DA techniques in the context of Named Entity Recognition (NER), including linguistic features and four categories of data augmentation methods. Furthermore, we reviewed commonly used datasets in DA tasks, discussed some potential practical applications, and examined key challenges and future directions in DA for NER. These findings serve as a valuable reference for learners and offer insights for researchers. As an essential and cost-effective approach, DA alleviates data scarcity and overfitting in the NER models by facilitating the integration of diverse augmentation methods.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130856"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225015280","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Data augmentation (DA), initially prominent in Computer Vision (CV), has been successfully adapted to Natural Language Processing (NLP), proving effective in mitigating data scarcity problems in the context of few-shot settings or scenarios where deep learning techniques may underperform. Moreover, the primary goal of DA is to expand and diversify training datasets by different methods, enabling models to generate more diverse and high-quality sythetic data for training the NER models. This survey explored DA techniques in the context of Named Entity Recognition (NER), including linguistic features and four categories of data augmentation methods. Furthermore, we reviewed commonly used datasets in DA tasks, discussed some potential practical applications, and examined key challenges and future directions in DA for NER. These findings serve as a valuable reference for learners and offer insights for researchers. As an essential and cost-effective approach, DA alleviates data scarcity and overfitting in the NER models by facilitating the integration of diverse augmentation methods.
命名实体识别中数据增强的研究
数据增强(DA),最初在计算机视觉(CV)中突出,已经成功地适应于自然语言处理(NLP),证明在少数镜头设置或深度学习技术可能表现不佳的场景中有效缓解数据稀缺问题。此外,数据挖掘的主要目标是通过不同的方法扩展和多样化训练数据集,使模型能够生成更多样化和高质量的综合数据,用于训练NER模型。本研究探讨了命名实体识别(NER)背景下的数据挖掘技术,包括语言特征和四类数据增强方法。此外,我们回顾了数据处理任务中常用的数据集,讨论了一些潜在的实际应用,并研究了面向NER的数据处理的主要挑战和未来方向。这些发现为学习者提供了有价值的参考,并为研究人员提供了见解。数据挖掘作为一种必要且经济有效的方法,通过促进多种增强方法的集成,缓解了NER模型中的数据稀缺性和过拟合问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信