Paraphrase Generation with Chinese Short Text Dataset

Guohui Song, Yongbin Wang
{"title":"Paraphrase Generation with Chinese Short Text Dataset","authors":"Guohui Song, Yongbin Wang","doi":"10.1109/ICCIA49625.2020.00019","DOIUrl":null,"url":null,"abstract":"An obstacle of conducting investigation on paraphrase generation is short of high-quality, publicly-available labeled dataset of sentential paraphrases, which is particularly serious for Chinese paraphrase generation research. Therefore, the study in Chinese paraphrase generation is the starting stage. This paper aimed to use a novel way to create Chinese paraphrase dataset, which contains 8K sentences pairs. The data source comes from a bank QA dataset, in which there are several sentences to express each problem. By calculating the similarity between the same semantic sentences, we can obtain paraphrase pairs to create Chinese paraphrase dataset. Then, we achieve paraphrase generation task by leveraging a classical Seq2Sseq model with attention mechanism. Following previous work and evaluate paraphrase generation result on our Chinese dataset. Experimental results not only show that the dataset is suitable for Chinese paraphrase generation task, but also provides a benchmark for further research on this research area.","PeriodicalId":237536,"journal":{"name":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Computational Intelligence and Applications (ICCIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIA49625.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

An obstacle of conducting investigation on paraphrase generation is short of high-quality, publicly-available labeled dataset of sentential paraphrases, which is particularly serious for Chinese paraphrase generation research. Therefore, the study in Chinese paraphrase generation is the starting stage. This paper aimed to use a novel way to create Chinese paraphrase dataset, which contains 8K sentences pairs. The data source comes from a bank QA dataset, in which there are several sentences to express each problem. By calculating the similarity between the same semantic sentences, we can obtain paraphrase pairs to create Chinese paraphrase dataset. Then, we achieve paraphrase generation task by leveraging a classical Seq2Sseq model with attention mechanism. Following previous work and evaluate paraphrase generation result on our Chinese dataset. Experimental results not only show that the dataset is suitable for Chinese paraphrase generation task, but also provides a benchmark for further research on this research area.
中文短文本数据集释义生成
缺乏高质量的、公开可用的句子式释义标注数据集是进行释义生成研究的一个障碍,这对于汉语释义生成研究来说尤为严重。因此,对汉语释义生成的研究是起步阶段。本文旨在用一种新颖的方法创建包含8K个句子对的中文释义数据集。数据源来自银行QA数据集,其中有几个句子来表达每个问题。通过计算语义相同的句子之间的相似度,得到释义对,构建中文释义数据集。然后,我们利用经典的带有注意机制的Seq2Sseq模型实现释义生成任务。在我们的中文数据集上评估意译生成结果。实验结果不仅表明该数据集适合中文释义生成任务,而且为该研究领域的进一步研究提供了基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信