David J Degnan, Clayton W Strauch, Moses Y Obiri, Erik D VonKaenel, Grace S Kim, James D Kershaw, David L Novelli, Karl Tl Pazdernik, Lisa M Bramer
{"title":"Protein-Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools.","authors":"David J Degnan, Clayton W Strauch, Moses Y Obiri, Erik D VonKaenel, Grace S Kim, James D Kershaw, David L Novelli, Karl Tl Pazdernik, Lisa M Bramer","doi":"10.1021/acs.jproteome.4c00535","DOIUrl":null,"url":null,"abstract":"<p><p>The study of protein-protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5395-5404"},"PeriodicalIF":3.8000,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acs.jproteome.4c00535","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The study of protein-protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.
期刊介绍:
Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".