{"title":"使用 GPT-4 和 Gemini 进行数据增强和分类的优化生物医学实体关系提取方法。","authors":"Cong-Phuoc Phan, Ben Phan, Jung-Hsien Chiang","doi":"10.1093/database/baae104","DOIUrl":null,"url":null,"abstract":"<p><p>Despite numerous research efforts by teams participating in the BioCreative VIII Track 01 employing various techniques to achieve the high accuracy of biomedical relation tasks, the overall performance in this area still has substantial room for improvement. Large language models bring a new opportunity to improve the performance of existing techniques in natural language processing tasks. This paper presents our improved method for relation extraction, which involves integrating two renowned large language models: Gemini and GPT-4. Our new approach utilizes GPT-4 to generate augmented data for training, followed by an ensemble learning technique to combine the outputs of diverse models to create a more precise prediction. We then employ a method using Gemini responses as input to fine-tune the BioNLP-PubMed-Bert classification model, which leads to improved performance as measured by precision, recall, and F1 scores on the same test dataset used in the challenge evaluation. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463225/pdf/","citationCount":"0","resultStr":"{\"title\":\"Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini.\",\"authors\":\"Cong-Phuoc Phan, Ben Phan, Jung-Hsien Chiang\",\"doi\":\"10.1093/database/baae104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Despite numerous research efforts by teams participating in the BioCreative VIII Track 01 employing various techniques to achieve the high accuracy of biomedical relation tasks, the overall performance in this area still has substantial room for improvement. Large language models bring a new opportunity to improve the performance of existing techniques in natural language processing tasks. This paper presents our improved method for relation extraction, which involves integrating two renowned large language models: Gemini and GPT-4. Our new approach utilizes GPT-4 to generate augmented data for training, followed by an ensemble learning technique to combine the outputs of diverse models to create a more precise prediction. We then employ a method using Gemini responses as input to fine-tune the BioNLP-PubMed-Bert classification model, which leads to improved performance as measured by precision, recall, and F1 scores on the same test dataset used in the challenge evaluation. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463225/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/database/baae104\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baae104","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
摘要
尽管参加 "BioCreative VIII Track 01 "的团队做出了大量研究努力,采用各种技术来实现生物医学关系任务的高准确性,但该领域的整体性能仍有很大的提升空间。大型语言模型为提高自然语言处理任务中现有技术的性能带来了新的机遇。本文介绍了我们对关系提取方法的改进,其中包括整合两个著名的大型语言模型:Gemini 和 GPT-4。我们的新方法利用 GPT-4 生成用于训练的增强数据,然后利用集合学习技术将不同模型的输出结合起来,以创建更精确的预测。然后,我们采用一种使用 Gemini 响应作为输入的方法,对 BioNLP-PubMed-Bert 分类模型进行微调,从而在挑战赛评估中使用的相同测试数据集上,通过精确度、召回率和 F1 分数衡量,提高了性能。数据库网址:https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/。
Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini.
Despite numerous research efforts by teams participating in the BioCreative VIII Track 01 employing various techniques to achieve the high accuracy of biomedical relation tasks, the overall performance in this area still has substantial room for improvement. Large language models bring a new opportunity to improve the performance of existing techniques in natural language processing tasks. This paper presents our improved method for relation extraction, which involves integrating two renowned large language models: Gemini and GPT-4. Our new approach utilizes GPT-4 to generate augmented data for training, followed by an ensemble learning technique to combine the outputs of diverse models to create a more precise prediction. We then employ a method using Gemini responses as input to fine-tune the BioNLP-PubMed-Bert classification model, which leads to improved performance as measured by precision, recall, and F1 scores on the same test dataset used in the challenge evaluation. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/.