{"title":"Hybrid approach for text similarity detection in Vietnamese based on Sentence-BERT and WordNet","authors":"Son Cao, Huy V. Vo, Hang Le, D. Dinh","doi":"10.1145/3548636.3548645","DOIUrl":null,"url":null,"abstract":"In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.","PeriodicalId":384376,"journal":{"name":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3548636.3548645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.