Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge.

IF 1.4 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology Pub Date : 2025-05-19 DOI:10.1089/cmb.2025.0078

Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Shinjae Yoo, Shantenu Jha

{"title":"Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge.","authors":"Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Shinjae Yoo, Shantenu Jha","doi":"10.1089/cmb.2025.0078","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the interactions and regulatory relationships among biomolecules is essential for deciphering complex biological systems and elucidating the mechanisms behind diverse biological functions. Traditionally, the collection of such molecular interaction data has relied on expert curation, a process that is both time-consuming and labor-intensive. To address these limitations, this study explores the use of large language models (LLMs) to automate the genome-scale extraction of molecular interaction knowledge. We evaluate the performance of various LLMs on key biological tasks, including the identification of protein-protein interactions, detection of genes associated with pathways influenced by low-dose radiation, and inference of gene regulatory relationships. Our findings demonstrate that larger LLMs tend to perform better, particularly in extracting intricate gene and protein interactions. Despite their strengths, these models face challenges in recognizing functionally diverse gene groups and highly correlated regulatory relationships. Through a comprehensive analysis using established molecular interaction and pathway databases, we show that LLMs possess the potential to identify relevant biomolecules and predict their interactions, offering valuable insights and marking a significant step toward AI-driven biological knowledge discovery.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2025.0078","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding the interactions and regulatory relationships among biomolecules is essential for deciphering complex biological systems and elucidating the mechanisms behind diverse biological functions. Traditionally, the collection of such molecular interaction data has relied on expert curation, a process that is both time-consuming and labor-intensive. To address these limitations, this study explores the use of large language models (LLMs) to automate the genome-scale extraction of molecular interaction knowledge. We evaluate the performance of various LLMs on key biological tasks, including the identification of protein-protein interactions, detection of genes associated with pathways influenced by low-dose radiation, and inference of gene regulatory relationships. Our findings demonstrate that larger LLMs tend to perform better, particularly in extracting intricate gene and protein interactions. Despite their strengths, these models face challenges in recognizing functionally diverse gene groups and highly correlated regulatory relationships. Through a comprehensive analysis using established molecular interaction and pathway databases, we show that LLMs possess the potential to identify relevant biomolecules and predict their interactions, offering valuable insights and marking a significant step toward AI-driven biological knowledge discovery.

查看原文本刊更多论文

用于提取分子相互作用和途径知识的大型语言模型的比较性能评价。

了解生物分子之间的相互作用和调控关系，对于破译复杂的生物系统和阐明多种生物功能背后的机制至关重要。传统上，这种分子相互作用数据的收集依赖于专家管理，这一过程既耗时又费力。为了解决这些限制，本研究探索了使用大型语言模型（LLMs）来自动提取分子相互作用知识的基因组规模。我们评估了各种llm在关键生物学任务上的表现，包括鉴定蛋白质-蛋白质相互作用，检测与受低剂量辐射影响的途径相关的基因，以及基因调控关系的推断。我们的研究结果表明，较大的llm往往表现更好，特别是在提取复杂的基因和蛋白质相互作用方面。尽管这些模型具有优势，但在识别功能多样化的基因群和高度相关的调控关系方面仍面临挑战。通过使用已建立的分子相互作用和途径数据库进行综合分析，我们表明llm具有识别相关生物分子并预测其相互作用的潜力，提供了有价值的见解，并标志着人工智能驱动的生物知识发现迈出了重要一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computational Biology 生物-计算机：跨学科应用

CiteScore

3.60

自引率

5.90%

发文量

113

审稿时长

6-12 weeks

期刊介绍： Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases