Hybrid approach for text similarity detection in Vietnamese based on Sentence-BERT and WordNet

Proceedings of the 4th International Conference on Information Technology and Computer Communications Pub Date : 2022-06-23 DOI:10.1145/3548636.3548645

Son Cao, Huy V. Vo, Hang Le, D. Dinh

引用次数: 0

Abstract

In this paper, we explore the task of similarity detection, which determines whether two sentences have the same meaning. Although the task has shown to be important in many natural language processing applications, not much work has been done in Vietnamese. We present an approach based on Sentence-BERT (SBERT) model. Leveraging the pre-trained model and combining it with linguistic knowledge (WordNet), we then tested it on two popular Vietnamese datasets: vnPara and VNPC. Our best model achieves 97.62% F1 score on vnPara and 95.31% F1 score on VNPC.

查看原文本刊更多论文

基于Sentence-BERT和WordNet的越南语文本相似度检测混合方法

在本文中，我们探索了相似度检测任务，即确定两个句子是否具有相同的含义。尽管这项任务在许多自然语言处理应用中已经被证明是重要的，但在越南语中做的工作并不多。我们提出了一种基于句子- bert (SBERT)模型的方法。利用预训练模型并将其与语言知识(WordNet)相结合，然后我们在两个流行的越南数据集:vnPara和VNPC上对其进行了测试。我们的最佳模型在vnPara上的F1得分为97.62%，在VNPC上的F1得分为95.31%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 4th International Conference on Information Technology and Computer Communications

自引率

0.00%

发文量