{"title":"Paraphrase Generation with Chinese Short Text Dataset","authors":"Guohui Song, Yongbin Wang","doi":"10.1109/ICCIA49625.2020.00019","DOIUrl":null,"url":null,"abstract":"An obstacle of conducting investigation on paraphrase generation is short of high-quality, publicly-available labeled dataset of sentential paraphrases, which is particularly serious for Chinese paraphrase generation research. Therefore, the study in Chinese paraphrase generation is the starting stage. This paper aimed to use a novel way to create Chinese paraphrase dataset, which contains 8K sentences pairs. The data source comes from a bank QA dataset, in which there are several sentences to express each problem. By calculating the similarity between the same semantic sentences, we can obtain paraphrase pairs to create Chinese paraphrase dataset. Then, we achieve paraphrase generation task by leveraging a classical Seq2Sseq model with attention mechanism. Following previous work and evaluate paraphrase generation result on our Chinese dataset. Experimental results not only show that the dataset is suitable for Chinese paraphrase generation task, but also provides a benchmark for further research on this research area.","PeriodicalId":237536,"journal":{"name":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA49625.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An obstacle of conducting investigation on paraphrase generation is short of high-quality, publicly-available labeled dataset of sentential paraphrases, which is particularly serious for Chinese paraphrase generation research. Therefore, the study in Chinese paraphrase generation is the starting stage. This paper aimed to use a novel way to create Chinese paraphrase dataset, which contains 8K sentences pairs. The data source comes from a bank QA dataset, in which there are several sentences to express each problem. By calculating the similarity between the same semantic sentences, we can obtain paraphrase pairs to create Chinese paraphrase dataset. Then, we achieve paraphrase generation task by leveraging a classical Seq2Sseq model with attention mechanism. Following previous work and evaluate paraphrase generation result on our Chinese dataset. Experimental results not only show that the dataset is suitable for Chinese paraphrase generation task, but also provides a benchmark for further research on this research area.