{"title":"基于Sentence-BERT和WordNet的越南语文本相似度检测混合方法","authors":"Son Cao, Huy V. Vo, Hang Le, D. Dinh","doi":"10.1145/3548636.3548645","DOIUrl":null,"url":null,"abstract":"In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.","PeriodicalId":384376,"journal":{"name":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid approach for text similarity detection in Vietnamese based on Sentence-BERT and WordNet\",\"authors\":\"Son Cao, Huy V. Vo, Hang Le, D. Dinh\",\"doi\":\"10.1145/3548636.3548645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.\",\"PeriodicalId\":384376,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Information Technology and Computer Communications\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Information Technology and Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3548636.3548645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Information Technology and Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3548636.3548645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hybrid approach for text similarity detection in Vietnamese based on Sentence-BERT and WordNet
In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.