基于混合的命名实体识别交叉一致性训练

Geonsik Youn, Bohan Yoon, Seungbin Ji, Dahee Ko, J. Rhee
{"title":"基于混合的命名实体识别交叉一致性训练","authors":"Geonsik Youn, Bohan Yoon, Seungbin Ji, Dahee Ko, J. Rhee","doi":"10.1145/3571560.3571576","DOIUrl":null,"url":null,"abstract":"Named Entity Recognition (NER) is one of the first stages in deep natural language understanding. The state-of-the-art deep NER models are dependent on high-quality and massive datasets. Also, the NER tasks require token-level labels. For this reason, there is a problem that annotating many sentences for the NER tasks is time-consuming and expensive. To solve this problem, many prior studies have been conducted to use the auto annotated weakly labeled data. However, the weakly labeled data contains a lot of noises that are obstructive to the training of NER models. We propose to use MixUp and cross-consistency training (CCT) together as a strategy to use weakly labeled data for NER tasks. In this study, the proposed method stems from the idea that MixUp, which was recently considered the data augmentation strategy, hinders the NER model training. Inspired by this point, we propose to use MixUp as a perturbation of cross-consistency training for NER. Experiments conducted on several NER benchmarks demonstrate the proposed method achieves improved performance compared to employing only a few human-annotated data.","PeriodicalId":143909,"journal":{"name":"Proceedings of the 6th International Conference on Advances in Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MixUp based Cross-Consistency Training for Named Entity Recognition\",\"authors\":\"Geonsik Youn, Bohan Yoon, Seungbin Ji, Dahee Ko, J. Rhee\",\"doi\":\"10.1145/3571560.3571576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Named Entity Recognition (NER) is one of the first stages in deep natural language understanding. The state-of-the-art deep NER models are dependent on high-quality and massive datasets. Also, the NER tasks require token-level labels. For this reason, there is a problem that annotating many sentences for the NER tasks is time-consuming and expensive. To solve this problem, many prior studies have been conducted to use the auto annotated weakly labeled data. However, the weakly labeled data contains a lot of noises that are obstructive to the training of NER models. We propose to use MixUp and cross-consistency training (CCT) together as a strategy to use weakly labeled data for NER tasks. In this study, the proposed method stems from the idea that MixUp, which was recently considered the data augmentation strategy, hinders the NER model training. Inspired by this point, we propose to use MixUp as a perturbation of cross-consistency training for NER. Experiments conducted on several NER benchmarks demonstrate the proposed method achieves improved performance compared to employing only a few human-annotated data.\",\"PeriodicalId\":143909,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Advances in Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Advances in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3571560.3571576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Advances in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3571560.3571576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

命名实体识别(NER)是深度自然语言理解的首要阶段之一。最先进的深度NER模型依赖于高质量和大量的数据集。此外,NER任务需要令牌级标签。由于这个原因,存在一个问题,即为NER任务注释许多句子既耗时又昂贵。为了解决这一问题,已有许多研究使用自动标注弱标记数据。然而,弱标记数据中含有大量的噪声,阻碍了NER模型的训练。我们建议将MixUp和交叉一致性训练(CCT)一起作为一种策略,在NER任务中使用弱标记数据。在本研究中,提出的方法源于最近被认为是数据增强策略的MixUp阻碍了NER模型的训练。受此启发,我们建议使用MixUp作为NER交叉一致性训练的扰动。在几个NER基准测试中进行的实验表明,与仅使用少量人工注释数据相比,所提出的方法取得了更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MixUp based Cross-Consistency Training for Named Entity Recognition
Named Entity Recognition (NER) is one of the first stages in deep natural language understanding. The state-of-the-art deep NER models are dependent on high-quality and massive datasets. Also, the NER tasks require token-level labels. For this reason, there is a problem that annotating many sentences for the NER tasks is time-consuming and expensive. To solve this problem, many prior studies have been conducted to use the auto annotated weakly labeled data. However, the weakly labeled data contains a lot of noises that are obstructive to the training of NER models. We propose to use MixUp and cross-consistency training (CCT) together as a strategy to use weakly labeled data for NER tasks. In this study, the proposed method stems from the idea that MixUp, which was recently considered the data augmentation strategy, hinders the NER model training. Inspired by this point, we propose to use MixUp as a perturbation of cross-consistency training for NER. Experiments conducted on several NER benchmarks demonstrate the proposed method achieves improved performance compared to employing only a few human-annotated data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信