Gihan Gamage, Daswin De Silva, A. Adikari, D. Alahakoon
{"title":"A BERT-based Idiom Detection Model","authors":"Gihan Gamage, Daswin De Silva, A. Adikari, D. Alahakoon","doi":"10.1109/HSI55341.2022.9869485","DOIUrl":null,"url":null,"abstract":"Idioms are figures of speech that contradict the principle of compositionality. This disposition of idioms can misdirect Natural Language Processing (NLP) techniques, which mostly focus on the literal meaning of terms. In this paper, we propose a novel idiom detection model that distinguishes between literal and idiomatic expressions. It utilizes a token classification approach to fine-tune BERT(Bidirectional Encoder Representations from Transformers). It is empirically evaluated on four idiom datasets, yielding an accuracy of more than 0.94. This model adds to the robustness and diversity of NLP techniques available to process and understand increasing magnitudes of free-form text and speech. Furthermore, the social value of this model is in enabling non-native speakers to comprehend the nuances of a foreign language.","PeriodicalId":282607,"journal":{"name":"2022 15th International Conference on Human System Interaction (HSI)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 15th International Conference on Human System Interaction (HSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HSI55341.2022.9869485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Idioms are figures of speech that contradict the principle of compositionality. This disposition of idioms can misdirect Natural Language Processing (NLP) techniques, which mostly focus on the literal meaning of terms. In this paper, we propose a novel idiom detection model that distinguishes between literal and idiomatic expressions. It utilizes a token classification approach to fine-tune BERT(Bidirectional Encoder Representations from Transformers). It is empirically evaluated on four idiom datasets, yielding an accuracy of more than 0.94. This model adds to the robustness and diversity of NLP techniques available to process and understand increasing magnitudes of free-form text and speech. Furthermore, the social value of this model is in enabling non-native speakers to comprehend the nuances of a foreign language.