{"title":"iTraNet: a web-based platform for integrated trans-omics network visualization and analysis.","authors":"Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda","doi":"10.1093/bioadv/vbae141","DOIUrl":"https://doi.org/10.1093/bioadv/vbae141","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.</p><p><strong>Results: </strong>We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in <i>ob/ob</i> mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.</p><p><strong>Availability and implementation: </strong>iTraNet is available at https://itranet.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae141"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.","authors":"Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu","doi":"10.1093/bioadv/vbae144","DOIUrl":"https://doi.org/10.1093/bioadv/vbae144","url":null,"abstract":"<p><strong>Motivation: </strong>RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.</p><p><strong>Results: </strong>RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/iyak/RNAelem.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae144"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-28eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae143
Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin
{"title":"FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.","authors":"Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin","doi":"10.1093/bioadv/vbae143","DOIUrl":"10.1093/bioadv/vbae143","url":null,"abstract":"<p><strong>Motivation: </strong>Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.</p><p><strong>Results: </strong>We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.</p><p><strong>Availability and implementation: </strong>Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae143"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-27eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae139
Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica
{"title":"Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis.","authors":"Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica","doi":"10.1093/bioadv/vbae139","DOIUrl":"10.1093/bioadv/vbae139","url":null,"abstract":"<p><strong>Motivation: </strong>As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.</p><p><strong>Results: </strong>Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.</p><p><strong>Availability and implementation: </strong>https://carpentries-incubator.github.io/pangenomics-workshop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-27eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae146
David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr
{"title":"Chronogram: an R package for data curation and analysis of infection and vaccination cohort studies.","authors":"David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr","doi":"10.1093/bioadv/vbae146","DOIUrl":"https://doi.org/10.1093/bioadv/vbae146","url":null,"abstract":"<p><strong>Motivation: </strong>Observational cohort studies that track vaccine and infection responses offer real-world data to inform pandemic policy. Translating biological hypotheses, such as whether different patterns of accumulated antigenic exposures confer differing antibody responses, into analysis code can be onerous, particularly when source data is dis-aggregated.</p><p><strong>Results: </strong>The R package chronogram introduces the class chronogram, where metadata is seamlessly aggregated with sparse infection episode, clinical and laboratory data. Each experimental modality is added sequentially, allowing the incorporation of new data, such as specialized time-consuming research assays, or their downstream analyses. Source data can be any rectangular data format, including database tables (such as structured query language databases). This supports annotations that aggregate data types/sources, for example, combining symptoms, molecular testing, and sequencing of one or more infectious episodes in a pathogen-agnostic manner. Chronogram arranges observational data to allow the translation of biological hypotheses into their corresponding code via a shared vocabulary.</p><p><strong>Availability and implementation: </strong>Chronogram is implemented R and available under an MIT licence at: https://www.github.com/FrancisCrickInstitute/chronogram<b>;</b> a user manual is available at: https://franciscrickinstitute.github.io/chronogram/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae146"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-20eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae137
Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali
{"title":"VCAb: a web-tool for structure-guided exploration of antibodies.","authors":"Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali","doi":"10.1093/bioadv/vbae137","DOIUrl":"https://doi.org/10.1093/bioadv/vbae137","url":null,"abstract":"<p><strong>Motivation: </strong>Effective responses against immune challenges require antibodies of different isotypes performing specific effector functions. Structural information on these isotypes is essential to engineer antibodies with desired physico-chemical features of their antigen-binding properties, and optimal developability as potential therapeutics. <i>In silico</i> mutational scanning profiles on antibody structures would further pinpoint candidate mutations for enhancing antibody stability and function. Current antibody structure databases lack consistent annotations of isotypes and structural coverage of 3D antibody structures, as well as computed deep mutation profiles.</p><p><strong>Results: </strong>The <i>V</i> and <i>C</i> region bearing <i>a</i>nti<i>b</i>ody (VCAb) web-tool is established to clarify these annotations and provides an accessible resource to facilitate antibody engineering and design. VCAb currently provides data on 7,166 experimentally determined antibody structures including both V and C regions from different species. Additionally, VCAb provides annotations of species and isotypes with numbering schemes applied. These information can be interactively queried or downloaded in batch.</p><p><strong>Availability and implementation: </strong>VCAb is implemented as a R shiny application to enable interactive data interrogation. The online application is freely accessible https://fraternalilab.cs.ucl.ac.uk/VCAb/. The source code to generate the database and the online application is available open-source at https://github.com/Fraternalilab/VCAb.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae137"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-20eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae136
Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard
{"title":"DECOMICS, a shiny application for unsupervised cell type deconvolution and biological interpretation of bulk omic data.","authors":"Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard","doi":"10.1093/bioadv/vbae136","DOIUrl":"https://doi.org/10.1093/bioadv/vbae136","url":null,"abstract":"<p><strong>Summary: </strong>Unsupervised deconvolution algorithms are often used to estimate cell composition from bulk tissue samples. However, applying cell-type deconvolution and interpreting the results remain a challenge, even more without prior training in bioinformatics. Here, we propose a tool for estimating and identifying cell type composition from bulk transcriptomes or methylomes. DECOMICS is a shiny-web application dedicated to unsupervised deconvolution approaches of bulk omic data. It provides (i) a variety of existing algorithms to perform deconvolution on the gene expression or methylation-level matrix, (ii) an enrichment analysis module to aid biological interpretation of the deconvolved components, based on enrichment analysis, and (iii) some visualization tools. Input data can be downloaded in csv format and preprocessed in the web application (normalization, transformation, and feature selection). The results of the deconvolution, enrichment, and visualization processes can be downloaded.</p><p><strong>Availability and implementation: </strong>DECOMICS is an R-shiny web application that can be launched (i) directly from a local R session using the R package available here: https://gitlab.in2p3.fr/Magali.Richard/decomics (either by installing it locally or via a virtual machine and a Docker image that we provide); or (ii) in the Biosphere-IFB Clouds Federation for Life Science, a multi-cloud environment scalable for high-performance computing: https://biosphere.france-bioinformatique.fr/catalogue/appliance/193/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae136"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-18eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae132
Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman
{"title":"Investigation of protein family relationships with deep learning.","authors":"Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman","doi":"10.1093/bioadv/vbae132","DOIUrl":"https://doi.org/10.1093/bioadv/vbae132","url":null,"abstract":"<p><strong>Motivation: </strong>In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison.</p><p><strong>Results: </strong>We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families.</p><p><strong>Availability and implementation: </strong>github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae132"},"PeriodicalIF":2.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-13eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae134
Joseph Hastings, Donghyung Lee, Michael J O'Connell
{"title":"Batch-effect correction in single-cell RNA sequencing data using JIVE.","authors":"Joseph Hastings, Donghyung Lee, Michael J O'Connell","doi":"10.1093/bioadv/vbae134","DOIUrl":"10.1093/bioadv/vbae134","url":null,"abstract":"<p><strong>Motivation: </strong>In single-cell RNA sequencing analysis, addressing batch effects-technical artifacts stemming from factors such as varying sequencing technologies, equipment, and capture times-is crucial. These factors can cause unwanted variation and obfuscate the underlying biological signal of interest. The joint and individual variation explained (JIVE) method can be used to extract shared biological patterns from multi-source sequencing data while adjusting for individual non-biological variations (i.e. batch effect). However, its current implementation is originally designed for bulk sequencing data, making it computationally infeasible for large-scale single-cell sequencing datasets.</p><p><strong>Results: </strong>In this study, we enhance JIVE for large-scale single-cell data by boosting its computational efficiency. Additionally, we introduce a novel application of JIVE for batch-effect correction on multiple single-cell sequencing datasets. Our enhanced method aims to decompose single-cell sequencing datasets into a joint structure capturing the true biological variability and individual structures, which capture technical variability within each batch. This joint structure is then suitable for use in downstream analyses. We benchmarked the results against four popular tools, Seurat v5, Harmony, LIGER, and Combat-seq, which were developed for this purpose. JIVE performed best in terms of preserving cell-type effects and in scenarios in which the batch sizes are balanced.</p><p><strong>Availability and implementation: </strong>The JIVE implementation used for this analysis can be found at https://github.com/oconnell-statistics-lab/scJIVE.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae134"},"PeriodicalIF":2.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2024-09-11eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae133
Hasin Rehana, Nur Bengisu Çam, Mert Basmaci, Jie Zheng, Christianah Jemiyo, Yongqun He, Arzucan Özgür, Junguk Hur
{"title":"Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.","authors":"Hasin Rehana, Nur Bengisu Çam, Mert Basmaci, Jie Zheng, Christianah Jemiyo, Yongqun He, Arzucan Özgür, Junguk Hur","doi":"10.1093/bioadv/vbae133","DOIUrl":"https://doi.org/10.1093/bioadv/vbae133","url":null,"abstract":"<p><strong>Motivation: </strong>Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. As biomedical literature continues to grow rapidly, there is an increasing need for automated and accurate extraction of these interactions to facilitate scientific discovery. Pretrained language models, such as generative pretrained transformers and bidirectional encoder representations from transformers, have shown promising results in natural language processing tasks.</p><p><strong>Results: </strong>We evaluated the performance of PPI identification using multiple transformer-based models across three manually curated gold-standard corpora: Learning Language in Logic with 164 interactions in 77 sentences, Human Protein Reference Database with 163 interactions in 145 sentences, and Interaction Extraction Performance Assessment with 335 interactions in 486 sentences. Models based on bidirectional encoder representations achieved the best overall performance, with BioBERT achieving the highest recall of 91.95% and F1 score of 86.84% on the Learning Language in Logic dataset. Despite not being explicitly trained for biomedical texts, GPT-4 showed commendable performance, comparable to the bidirectional encoder models. Specifically, GPT-4 achieved the highest precision of 88.37%, a recall of 85.14%, and an F1 score of 86.49% on the same dataset. These results suggest that GPT-4 can effectively detect protein interactions from text, offering valuable applications in mining biomedical literature.</p><p><strong>Availability and implementation: </strong>The source code and datasets used in this study are available at https://github.com/hurlab/PPI-GPT-BERT.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae133"},"PeriodicalIF":2.4,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}