Sing Choi, Piyush Puranik, Binay Dahal, Kazem Taghva
{"title":"How to generate data for acronym detection and expansion","authors":"Sing Choi, Piyush Puranik, Binay Dahal, Kazem Taghva","doi":"10.1007/s43674-021-00024-6","DOIUrl":null,"url":null,"abstract":"<div><p>Finding the definitions of acronyms in any given text has been an on going problem with multiple proposed solutions. In this paper, we use the bidirectional encoder representations from transformers question answer model provided by Google to find acronym definitions in a given text. Given an acronym and a passage containing the acronym, our model is expected to find the expansion of the acronym in the passage. Through our experiments, we show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. One of the main contributions of this paper is a systematic method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-021-00024-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Finding the definitions of acronyms in any given text has been an on going problem with multiple proposed solutions. In this paper, we use the bidirectional encoder representations from transformers question answer model provided by Google to find acronym definitions in a given text. Given an acronym and a passage containing the acronym, our model is expected to find the expansion of the acronym in the passage. Through our experiments, we show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. One of the main contributions of this paper is a systematic method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.