ABB-BERT: A BERT model for disambiguating abbreviations and contractions

Q3 Arts and Humanities
Icon Pub Date : 2022-07-08 DOI:10.48550/arXiv.2207.04008
Prateek Kacker, Andi Cupallari, Aswin Giridhar Subramanian, Nimit Jain
{"title":"ABB-BERT: A BERT model for disambiguating abbreviations and contractions","authors":"Prateek Kacker, Andi Cupallari, Aswin Giridhar Subramanian, Nimit Jain","doi":"10.48550/arXiv.2207.04008","DOIUrl":null,"url":null,"abstract":"Abbreviations and contractions are commonly found in text across different domains. For example, doctors’ notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.","PeriodicalId":53637,"journal":{"name":"Icon","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.04008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0

Abstract

Abbreviations and contractions are commonly found in text across different domains. For example, doctors’ notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.
用于消除缩写和缩写歧义的BERT模型
缩写和缩写通常出现在不同领域的文本中。例如,医生的笔记中包含许多宫缩,这些宫缩可以根据他们的选择进行个性化设置。现有的拼写校正模型不适合处理扩展,因为单词中的字符减少了很多。在这项工作中,我们提出了ABB-BERT,这是一个基于BERT的模型,它处理包含缩写和缩写的歧义语言。ABB-BERT可以从数千个选项中对它们进行排名,并且是为规模而设计的。它是在维基百科文本上训练的,算法允许它在几乎没有计算的情况下进行微调,以获得更好的域或个人性能。我们正在公开发布源自维基百科的缩写和缩写的训练数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Icon
Icon Arts and Humanities-History and Philosophy of Science
CiteScore
0.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信