Bioinformatics advances最新文献

筛选
英文 中文
iTraNet: a web-based platform for integrated trans-omics network visualization and analysis. iTraNet:基于网络的跨组学网络可视化综合分析平台。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae141
Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda
{"title":"iTraNet: a web-based platform for integrated trans-omics network visualization and analysis.","authors":"Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda","doi":"10.1093/bioadv/vbae141","DOIUrl":"https://doi.org/10.1093/bioadv/vbae141","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.</p><p><strong>Results: </strong>We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in <i>ob/ob</i> mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.</p><p><strong>Availability and implementation: </strong>iTraNet is available at https://itranet.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae141"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins. RNAelem:一种发现由 RNA 结合蛋白结合的 RNA 中序列结构图案的算法。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae144
Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu
{"title":"RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.","authors":"Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu","doi":"10.1093/bioadv/vbae144","DOIUrl":"https://doi.org/10.1093/bioadv/vbae144","url":null,"abstract":"<p><strong>Motivation: </strong>RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.</p><p><strong>Results: </strong>RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/iyak/RNAelem.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae144"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. FAVOR-GPT:全基因组变异功能注释的自然语言生成界面。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae143
Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin
{"title":"FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.","authors":"Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin","doi":"10.1093/bioadv/vbae143","DOIUrl":"10.1093/bioadv/vbae143","url":null,"abstract":"<p><strong>Motivation: </strong>Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.</p><p><strong>Results: </strong>We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.</p><p><strong>Availability and implementation: </strong>Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae143"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis. 迎接基因组分析的挑战:合作开发的泛基因组学和拓扑数据分析研讨会。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae139
Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica
{"title":"Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis.","authors":"Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica","doi":"10.1093/bioadv/vbae139","DOIUrl":"10.1093/bioadv/vbae139","url":null,"abstract":"<p><strong>Motivation: </strong>As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.</p><p><strong>Results: </strong>Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.</p><p><strong>Availability and implementation: </strong>https://carpentries-incubator.github.io/pangenomics-workshop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chronogram: an R package for data curation and analysis of infection and vaccination cohort studies. Chronogram:用于感染和疫苗接种队列研究数据整理和分析的 R 软件包。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae146
David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr
{"title":"Chronogram: an R package for data curation and analysis of infection and vaccination cohort studies.","authors":"David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr","doi":"10.1093/bioadv/vbae146","DOIUrl":"https://doi.org/10.1093/bioadv/vbae146","url":null,"abstract":"<p><strong>Motivation: </strong>Observational cohort studies that track vaccine and infection responses offer real-world data to inform pandemic policy. Translating biological hypotheses, such as whether different patterns of accumulated antigenic exposures confer differing antibody responses, into analysis code can be onerous, particularly when source data is dis-aggregated.</p><p><strong>Results: </strong>The R package chronogram introduces the class chronogram, where metadata is seamlessly aggregated with sparse infection episode, clinical and laboratory data. Each experimental modality is added sequentially, allowing the incorporation of new data, such as specialized time-consuming research assays, or their downstream analyses. Source data can be any rectangular data format, including database tables (such as structured query language databases). This supports annotations that aggregate data types/sources, for example, combining symptoms, molecular testing, and sequencing of one or more infectious episodes in a pathogen-agnostic manner. Chronogram arranges observational data to allow the translation of biological hypotheses into their corresponding code via a shared vocabulary.</p><p><strong>Availability and implementation: </strong>Chronogram is implemented R and available under an MIT licence at: https://www.github.com/FrancisCrickInstitute/chronogram<b>;</b> a user manual is available at: https://franciscrickinstitute.github.io/chronogram/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae146"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VCAb: a web-tool for structure-guided exploration of antibodies. VCAb:结构引导下的抗体探索网络工具。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-20 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae137
Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali
{"title":"VCAb: a web-tool for structure-guided exploration of antibodies.","authors":"Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali","doi":"10.1093/bioadv/vbae137","DOIUrl":"https://doi.org/10.1093/bioadv/vbae137","url":null,"abstract":"<p><strong>Motivation: </strong>Effective responses against immune challenges require antibodies of different isotypes performing specific effector functions. Structural information on these isotypes is essential to engineer antibodies with desired physico-chemical features of their antigen-binding properties, and optimal developability as potential therapeutics. <i>In silico</i> mutational scanning profiles on antibody structures would further pinpoint candidate mutations for enhancing antibody stability and function. Current antibody structure databases lack consistent annotations of isotypes and structural coverage of 3D antibody structures, as well as computed deep mutation profiles.</p><p><strong>Results: </strong>The <i>V</i> and <i>C</i> region bearing <i>a</i>nti<i>b</i>ody (VCAb) web-tool is established to clarify these annotations and provides an accessible resource to facilitate antibody engineering and design. VCAb currently provides data on 7,166 experimentally determined antibody structures including both V and C regions from different species. Additionally, VCAb provides annotations of species and isotypes with numbering schemes applied. These information can be interactively queried or downloaded in batch.</p><p><strong>Availability and implementation: </strong>VCAb is implemented as a R shiny application to enable interactive data interrogation. The online application is freely accessible https://fraternalilab.cs.ucl.ac.uk/VCAb/. The source code to generate the database and the online application is available open-source at https://github.com/Fraternalilab/VCAb.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae137"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DECOMICS, a shiny application for unsupervised cell type deconvolution and biological interpretation of bulk omic data. DECOMICS 是一款闪亮的应用程序,用于对大量 omic 数据进行无监督细胞类型解卷积和生物学解释。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-20 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae136
Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard
{"title":"DECOMICS, a shiny application for unsupervised cell type deconvolution and biological interpretation of bulk omic data.","authors":"Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard","doi":"10.1093/bioadv/vbae136","DOIUrl":"https://doi.org/10.1093/bioadv/vbae136","url":null,"abstract":"<p><strong>Summary: </strong>Unsupervised deconvolution algorithms are often used to estimate cell composition from bulk tissue samples. However, applying cell-type deconvolution and interpreting the results remain a challenge, even more without prior training in bioinformatics. Here, we propose a tool for estimating and identifying cell type composition from bulk transcriptomes or methylomes. DECOMICS is a shiny-web application dedicated to unsupervised deconvolution approaches of bulk omic data. It provides (i) a variety of existing algorithms to perform deconvolution on the gene expression or methylation-level matrix, (ii) an enrichment analysis module to aid biological interpretation of the deconvolved components, based on enrichment analysis, and (iii) some visualization tools. Input data can be downloaded in csv format and preprocessed in the web application (normalization, transformation, and feature selection). The results of the deconvolution, enrichment, and visualization processes can be downloaded.</p><p><strong>Availability and implementation: </strong>DECOMICS is an R-shiny web application that can be launched (i) directly from a local R session using the R package available here: https://gitlab.in2p3.fr/Magali.Richard/decomics (either by installing it locally or via a virtual machine and a Docker image that we provide); or (ii) in the Biosphere-IFB Clouds Federation for Life Science, a multi-cloud environment scalable for high-performance computing: https://biosphere.france-bioinformatique.fr/catalogue/appliance/193/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae136"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation of protein family relationships with deep learning. 利用深度学习研究蛋白质家族关系。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-18 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae132
Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman
{"title":"Investigation of protein family relationships with deep learning.","authors":"Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman","doi":"10.1093/bioadv/vbae132","DOIUrl":"https://doi.org/10.1093/bioadv/vbae132","url":null,"abstract":"<p><strong>Motivation: </strong>In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison.</p><p><strong>Results: </strong>We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families.</p><p><strong>Availability and implementation: </strong>github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae132"},"PeriodicalIF":2.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Batch-effect correction in single-cell RNA sequencing data using JIVE. 利用 JIVE 对单细胞 RNA 测序数据进行批次效应校正。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-13 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae134
Joseph Hastings, Donghyung Lee, Michael J O'Connell
{"title":"Batch-effect correction in single-cell RNA sequencing data using JIVE.","authors":"Joseph Hastings, Donghyung Lee, Michael J O'Connell","doi":"10.1093/bioadv/vbae134","DOIUrl":"10.1093/bioadv/vbae134","url":null,"abstract":"<p><strong>Motivation: </strong>In single-cell RNA sequencing analysis, addressing batch effects-technical artifacts stemming from factors such as varying sequencing technologies, equipment, and capture times-is crucial. These factors can cause unwanted variation and obfuscate the underlying biological signal of interest. The joint and individual variation explained (JIVE) method can be used to extract shared biological patterns from multi-source sequencing data while adjusting for individual non-biological variations (i.e. batch effect). However, its current implementation is originally designed for bulk sequencing data, making it computationally infeasible for large-scale single-cell sequencing datasets.</p><p><strong>Results: </strong>In this study, we enhance JIVE for large-scale single-cell data by boosting its computational efficiency. Additionally, we introduce a novel application of JIVE for batch-effect correction on multiple single-cell sequencing datasets. Our enhanced method aims to decompose single-cell sequencing datasets into a joint structure capturing the true biological variability and individual structures, which capture technical variability within each batch. This joint structure is then suitable for use in downstream analyses. We benchmarked the results against four popular tools, Seurat v5, Harmony, LIGER, and Combat-seq, which were developed for this purpose. JIVE performed best in terms of preserving cell-type effects and in scenarios in which the batch sizes are balanced.</p><p><strong>Availability and implementation: </strong>The JIVE implementation used for this analysis can be found at https://github.com/oconnell-statistics-lab/scJIVE.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae134"},"PeriodicalIF":2.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text. 评估用于生物医学文本中蛋白质-蛋白质相互作用识别的 GPT 和 BERT 模型。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-11 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae133
Hasin Rehana, Nur Bengisu Çam, Mert Basmaci, Jie Zheng, Christianah Jemiyo, Yongqun He, Arzucan Özgür, Junguk Hur
{"title":"Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text.","authors":"Hasin Rehana, Nur Bengisu Çam, Mert Basmaci, Jie Zheng, Christianah Jemiyo, Yongqun He, Arzucan Özgür, Junguk Hur","doi":"10.1093/bioadv/vbae133","DOIUrl":"https://doi.org/10.1093/bioadv/vbae133","url":null,"abstract":"<p><strong>Motivation: </strong>Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. As biomedical literature continues to grow rapidly, there is an increasing need for automated and accurate extraction of these interactions to facilitate scientific discovery. Pretrained language models, such as generative pretrained transformers and bidirectional encoder representations from transformers, have shown promising results in natural language processing tasks.</p><p><strong>Results: </strong>We evaluated the performance of PPI identification using multiple transformer-based models across three manually curated gold-standard corpora: Learning Language in Logic with 164 interactions in 77 sentences, Human Protein Reference Database with 163 interactions in 145 sentences, and Interaction Extraction Performance Assessment with 335 interactions in 486 sentences. Models based on bidirectional encoder representations achieved the best overall performance, with BioBERT achieving the highest recall of 91.95% and F1 score of 86.84% on the Learning Language in Logic dataset. Despite not being explicitly trained for biomedical texts, GPT-4 showed commendable performance, comparable to the bidirectional encoder models. Specifically, GPT-4 achieved the highest precision of 88.37%, a recall of 85.14%, and an F1 score of 86.49% on the same dataset. These results suggest that GPT-4 can effectively detect protein interactions from text, offering valuable applications in mining biomedical literature.</p><p><strong>Availability and implementation: </strong>The source code and datasets used in this study are available at https://github.com/hurlab/PPI-GPT-BERT.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae133"},"PeriodicalIF":2.4,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142333636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信