使用新的 ChatGPT 生成的数据集，比较分析 ChatGPT、GPT-3 和 T5 语言模型的转述性能：ParaGPT

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2024-08-15 DOI:10.1111/exsy.13699

Meltem Kurt Pehlivanoğlu, Robera Tadesse Gobosho, Muhammad Abdan Syakura, Vimal Shanmuganathan, Luis de-la-Fuente-Valentín

{"title":"使用新的 ChatGPT 生成的数据集，比较分析 ChatGPT、GPT-3 和 T5 语言模型的转述性能：ParaGPT","authors":"Meltem Kurt Pehlivanoğlu, Robera Tadesse Gobosho, Muhammad Abdan Syakura, Vimal Shanmuganathan, Luis de-la-Fuente-Valentín","doi":"10.1111/exsy.13699","DOIUrl":null,"url":null,"abstract":"<p>Paraphrase generation is a fundamental natural language processing (NLP) task that refers to the process of generating a well-formed and coherent output sentence that exhibits both syntactic and/or lexical diversity from the input sentence, while simultaneously ensuring that the semantic similarity between the two sentences is preserved. However, the availability of high-quality paraphrase datasets has been limited, particularly for machine-generated sentences. In this paper, we present ParaGPT, a new paraphrase dataset of 81,000 machine-generated sentence pairs, including 27,000 reference sentences (ChatGPT-generated sentences), and 81,000 paraphrases obtained by using three different large language models (LLMs): ChatGPT, GPT-3, and T5. We used ChatGPT to generate 27,000 sentences that cover a diverse array of topics and sentence structures, thus providing diverse inputs for the models. In addition, we evaluated the quality of the generated paraphrases using various automatic evaluation metrics. Furthermore, we provide insights into the strengths and drawbacks of each LLM in generating paraphrases by conducting a comparative analysis of the paraphrasing performance of the three LLMs. According to our findings, ChatGPT's performance, as per the evaluation metrics provided, was deemed impressive and commendable, owing to its higher-than-average scores for semantic similarity, which implies a higher degree of similarity between the generated paraphrase and the reference sentence, and its relatively lower scores for syntactic diversity, indicating a greater diversity of syntactic structures in the generated paraphrase. ParaGPT is a valuable resource for researchers working on NLP tasks like paraphrasing, text simplification, and text generation. We make the ParaGPT dataset publicly accessible to researchers, and as far as we are aware, this is the first paraphrase dataset produced based on ChatGPT.</p>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"41 11","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.13699","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 language models using a new ChatGPT generated dataset: ParaGPT\",\"authors\":\"Meltem Kurt Pehlivanoğlu, Robera Tadesse Gobosho, Muhammad Abdan Syakura, Vimal Shanmuganathan, Luis de-la-Fuente-Valentín\",\"doi\":\"10.1111/exsy.13699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Paraphrase generation is a fundamental natural language processing (NLP) task that refers to the process of generating a well-formed and coherent output sentence that exhibits both syntactic and/or lexical diversity from the input sentence, while simultaneously ensuring that the semantic similarity between the two sentences is preserved. However, the availability of high-quality paraphrase datasets has been limited, particularly for machine-generated sentences. In this paper, we present ParaGPT, a new paraphrase dataset of 81,000 machine-generated sentence pairs, including 27,000 reference sentences (ChatGPT-generated sentences), and 81,000 paraphrases obtained by using three different large language models (LLMs): ChatGPT, GPT-3, and T5. We used ChatGPT to generate 27,000 sentences that cover a diverse array of topics and sentence structures, thus providing diverse inputs for the models. In addition, we evaluated the quality of the generated paraphrases using various automatic evaluation metrics. Furthermore, we provide insights into the strengths and drawbacks of each LLM in generating paraphrases by conducting a comparative analysis of the paraphrasing performance of the three LLMs. According to our findings, ChatGPT's performance, as per the evaluation metrics provided, was deemed impressive and commendable, owing to its higher-than-average scores for semantic similarity, which implies a higher degree of similarity between the generated paraphrase and the reference sentence, and its relatively lower scores for syntactic diversity, indicating a greater diversity of syntactic structures in the generated paraphrase. ParaGPT is a valuable resource for researchers working on NLP tasks like paraphrasing, text simplification, and text generation. We make the ParaGPT dataset publicly accessible to researchers, and as far as we are aware, this is the first paraphrase dataset produced based on ChatGPT.</p>\",\"PeriodicalId\":51053,\"journal\":{\"name\":\"Expert Systems\",\"volume\":\"41 11\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.13699\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/exsy.13699\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.13699","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

意译生成是一项基本的自然语言处理（NLP）任务，指的是生成格式良好、连贯的输出句子的过程，该句子与输入句子在句法和/或词汇上具有多样性，同时确保两个句子之间的语义相似性得以保留。然而，高质量的意译数据集一直很有限，尤其是机器生成的句子。在本文中，我们介绍了 ParaGPT，这是一个由 81,000 个机器生成的句子对组成的新转述数据集，其中包括 27,000 个参考句子（ChatGPT 生成的句子），以及通过使用三种不同的大型语言模型（LLM）获得的 81,000 个转述：ChatGPT、GPT-3 和 T5。我们使用 ChatGPT 生成了 27,000 个句子，这些句子涵盖了不同的主题和句子结构，从而为模型提供了不同的输入。此外，我们还使用各种自动评估指标对生成的转述质量进行了评估。此外，我们还通过对三种 LLM 的转述性能进行比较分析，深入了解了每种 LLM 在生成转述时的优缺点。根据我们的研究结果，ChatGPT 在语义相似性方面的得分高于平均水平，这意味着生成的转述句与参考句之间具有更高的相似性，而在句法多样性方面的得分相对较低，这表明生成的转述句具有更高的句法结构多样性，因此，根据我们提供的评估指标，ChatGPT 的性能令人印象深刻，值得称赞。ParaGPT 是研究转述、文本简化和文本生成等 NLP 任务的研究人员的宝贵资源。我们向研究人员公开 ParaGPT 数据集，据我们所知，这是第一个基于 ChatGPT 生成的意译数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 language models using a new ChatGPT generated dataset: ParaGPT

查看原文本刊更多论文

Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 language models using a new ChatGPT generated dataset: ParaGPT

Paraphrase generation is a fundamental natural language processing (NLP) task that refers to the process of generating a well-formed and coherent output sentence that exhibits both syntactic and/or lexical diversity from the input sentence, while simultaneously ensuring that the semantic similarity between the two sentences is preserved. However, the availability of high-quality paraphrase datasets has been limited, particularly for machine-generated sentences. In this paper, we present ParaGPT, a new paraphrase dataset of 81,000 machine-generated sentence pairs, including 27,000 reference sentences (ChatGPT-generated sentences), and 81,000 paraphrases obtained by using three different large language models (LLMs): ChatGPT, GPT-3, and T5. We used ChatGPT to generate 27,000 sentences that cover a diverse array of topics and sentence structures, thus providing diverse inputs for the models. In addition, we evaluated the quality of the generated paraphrases using various automatic evaluation metrics. Furthermore, we provide insights into the strengths and drawbacks of each LLM in generating paraphrases by conducting a comparative analysis of the paraphrasing performance of the three LLMs. According to our findings, ChatGPT's performance, as per the evaluation metrics provided, was deemed impressive and commendable, owing to its higher-than-average scores for semantic similarity, which implies a higher degree of similarity between the generated paraphrase and the reference sentence, and its relatively lower scores for syntactic diversity, indicating a greater diversity of syntactic structures in the generated paraphrase. ParaGPT is a valuable resource for researchers working on NLP tasks like paraphrasing, text simplification, and text generation. We make the ParaGPT dataset publicly accessible to researchers, and as far as we are aware, this is the first paraphrase dataset produced based on ChatGPT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.