Sing Choi, Piyush Puranik, Binay Dahal, Kazem Taghva
{"title":"如何生成首字母缩略词检测和扩展数据","authors":"Sing Choi, Piyush Puranik, Binay Dahal, Kazem Taghva","doi":"10.1007/s43674-021-00024-6","DOIUrl":null,"url":null,"abstract":"<div><p>Finding the definitions of acronyms in any given text has been an on going problem with multiple proposed solutions. In this paper, we use the bidirectional encoder representations from transformers question answer model provided by Google to find acronym definitions in a given text. Given an acronym and a passage containing the acronym, our model is expected to find the expansion of the acronym in the passage. Through our experiments, we show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. One of the main contributions of this paper is a systematic method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"2 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How to generate data for acronym detection and expansion\",\"authors\":\"Sing Choi, Piyush Puranik, Binay Dahal, Kazem Taghva\",\"doi\":\"10.1007/s43674-021-00024-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Finding the definitions of acronyms in any given text has been an on going problem with multiple proposed solutions. In this paper, we use the bidirectional encoder representations from transformers question answer model provided by Google to find acronym definitions in a given text. Given an acronym and a passage containing the acronym, our model is expected to find the expansion of the acronym in the passage. Through our experiments, we show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. One of the main contributions of this paper is a systematic method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.</p></div>\",\"PeriodicalId\":72089,\"journal\":{\"name\":\"Advances in computational intelligence\",\"volume\":\"2 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in computational intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43674-021-00024-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-021-00024-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How to generate data for acronym detection and expansion
Finding the definitions of acronyms in any given text has been an on going problem with multiple proposed solutions. In this paper, we use the bidirectional encoder representations from transformers question answer model provided by Google to find acronym definitions in a given text. Given an acronym and a passage containing the acronym, our model is expected to find the expansion of the acronym in the passage. Through our experiments, we show that this model can correctly predict 94% of acronym expansions assuming a Jaro–Winkler threshold distance of greater than 0.8. One of the main contributions of this paper is a systematic method to create datasets and use them to build a corpus for acronym expansion. Our approach for data generation can be used in many applications where there are no standard datasets.