{"title":"Creation of a multi-paraphrase corpus based on various elementary operations","authors":"Johanes Effendi, S. Sakti, Satoshi Nakamura","doi":"10.1109/ICSDA.2017.8384465","DOIUrl":null,"url":null,"abstract":"Paraphrases resemble monolingual translations from a source sentence into other sentences that must preserve the original meaning. To build automatic paraphrasing, a collection of paraphrased expressions is required. However, manually collecting paraphrases is expensive and time-consuming. Most existing paraphrases corpora cover only one-to-one parallel sentences and neglect the fact that possible variants of paraphrases can be generated from a single source sentence. The manipulation applied to the original sentences is also difficult to track. Furthermore, a single corpus is mostly dedicated to a single application that is not reusable in other applications. In this research, we construct a paraphrase corpus based on various elementary operations (reordering, substitution, deletion, insertion) in a crowdsourcing platform to generate multi- paraphrase sentences from a source sentence. These elementary paraphrase operations can be utilized for various applications (i.e., deletion for summarization and reordering for machine translation). Our evaluations show the richness and effectiveness of our created corpus.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2017.8384465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Paraphrases resemble monolingual translations from a source sentence into other sentences that must preserve the original meaning. To build automatic paraphrasing, a collection of paraphrased expressions is required. However, manually collecting paraphrases is expensive and time-consuming. Most existing paraphrases corpora cover only one-to-one parallel sentences and neglect the fact that possible variants of paraphrases can be generated from a single source sentence. The manipulation applied to the original sentences is also difficult to track. Furthermore, a single corpus is mostly dedicated to a single application that is not reusable in other applications. In this research, we construct a paraphrase corpus based on various elementary operations (reordering, substitution, deletion, insertion) in a crowdsourcing platform to generate multi- paraphrase sentences from a source sentence. These elementary paraphrase operations can be utilized for various applications (i.e., deletion for summarization and reordering for machine translation). Our evaluations show the richness and effectiveness of our created corpus.