{"title":"基于文本生成数据增强的语义相似度评价方法","authors":"Jiangfeng Zhou, Dafei Lin, Xinlai Xing, Xiaochuan Zhang","doi":"10.1109/ACAIT56212.2022.10137987","DOIUrl":null,"url":null,"abstract":"The similarity evaluation method based on neural network has achieved good results, but it has higher requirements on the scale and quality of the corpus. Based on this problem, this paper proposes a semantic similarity evaluation method based on text generation data augmentation. This method combines Seq2Seq with a masked language model for data augmentation, and uses the expanded data to fine-tune the pre-trained language model. The pre-trained language model and the Siamese network are combined to build a semantic similarity evaluation model. Finally, experiments were carried out on the standard sentence similarity evaluation data set SentEva12012-2016. Compared with the benchmark model, the Spearman correlation coefficient improved by 3.11%. Experiments show that the semantic similarity evaluation method based on data augmentation can effectively solve the problem of low accuracy due to lack of data.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Similarity Evaluation Method Based on Text Generation Data Augmentation\",\"authors\":\"Jiangfeng Zhou, Dafei Lin, Xinlai Xing, Xiaochuan Zhang\",\"doi\":\"10.1109/ACAIT56212.2022.10137987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The similarity evaluation method based on neural network has achieved good results, but it has higher requirements on the scale and quality of the corpus. Based on this problem, this paper proposes a semantic similarity evaluation method based on text generation data augmentation. This method combines Seq2Seq with a masked language model for data augmentation, and uses the expanded data to fine-tune the pre-trained language model. The pre-trained language model and the Siamese network are combined to build a semantic similarity evaluation model. Finally, experiments were carried out on the standard sentence similarity evaluation data set SentEva12012-2016. Compared with the benchmark model, the Spearman correlation coefficient improved by 3.11%. Experiments show that the semantic similarity evaluation method based on data augmentation can effectively solve the problem of low accuracy due to lack of data.\",\"PeriodicalId\":398228,\"journal\":{\"name\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAIT56212.2022.10137987\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic Similarity Evaluation Method Based on Text Generation Data Augmentation
The similarity evaluation method based on neural network has achieved good results, but it has higher requirements on the scale and quality of the corpus. Based on this problem, this paper proposes a semantic similarity evaluation method based on text generation data augmentation. This method combines Seq2Seq with a masked language model for data augmentation, and uses the expanded data to fine-tune the pre-trained language model. The pre-trained language model and the Siamese network are combined to build a semantic similarity evaluation model. Finally, experiments were carried out on the standard sentence similarity evaluation data set SentEva12012-2016. Compared with the benchmark model, the Spearman correlation coefficient improved by 3.11%. Experiments show that the semantic similarity evaluation method based on data augmentation can effectively solve the problem of low accuracy due to lack of data.