Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation Pub Date : 2025-02-17 DOI:10.1093/database/baaf011

Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen

{"title":"Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.","authors":"Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen","doi":"10.1093/database/baaf011","DOIUrl":null,"url":null,"abstract":"<p><p>Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833239/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf011","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.

查看原文本刊更多论文

评估生成式人工智能在检索人工整理的遗传和基因组数据信息方面的性能。

集中存储库中的策划资源通过增强数据的准确性为用户提供高价值的服务。然而，管理是有成本的，因为它需要具有深厚领域知识的人员投入时间和精力。在本文中，我们研究了大型语言模型（LLM）的性能，特别是生成预训练转换器(GPT)-3.5和GPT-4，在针对人类管理员提取和呈现数据方面。为了完成这项任务，我们使用了一小部分关于小麦和大麦遗传学的期刊文章，重点关注诸如耐盐性和抗病性等性状，这些性状正变得越来越重要。随后，这36篇论文由GrainGenes数据库（https://wheat.pw.usda.gov）的专业管理员进行了整理。同时，我们开发了一个基于GPT的检索增强生成问答系统，并比较了GPT在回答性状和数量性状位点（qtl）问题中的表现。我们的研究结果表明，平均而言，GPT-4对手稿的正确分类率为97%，正确提取了80%的特征，并正确提取了61%的标记-性状关联。此外，我们评估了基于gpt的DataFrame代理过滤和汇总整理小麦遗传数据的能力，显示了人类和计算管理员并肩工作的潜力。在一个案例研究中，我们的研究结果表明，GPT-4能够在整个基因组中检索高达91%的与疾病相关的、人类策划的qtl，通过快速工程在特定基因组区域检索高达96%的qtl。此外，我们观察到，在大多数任务中，GPT-4的表现始终优于GPT-3.5，同时产生的幻觉更少，这表明LLM模型的改进将使生成式人工智能成为策展人从科学文献中提取信息的更准确的伙伴。尽管存在局限性，法学硕士展示了提取和呈现信息给生物数据库管理员和用户的潜力，只要用户意识到潜在的不准确性和信息提取不完整的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.