Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
DeepDR: a deep learning library for drug response prediction. DeepDR:用于药物反应预测的深度学习库。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae688
Zhengxiang Jiang, Pengyong Li
{"title":"DeepDR: a deep learning library for drug response prediction.","authors":"Zhengxiang Jiang, Pengyong Li","doi":"10.1093/bioinformatics/btae688","DOIUrl":"10.1093/bioinformatics/btae688","url":null,"abstract":"<p><strong>Summary: </strong>Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface.</p><p><strong>Availability and implementation: </strong>DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic modelling of signalling pathways when ODEs are not feasible. 当 ODEs 不可行时,信号通路的动态建模。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae683
Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz
{"title":"Dynamic modelling of signalling pathways when ODEs are not feasible.","authors":"Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz","doi":"10.1093/bioinformatics/btae683","DOIUrl":"10.1093/bioinformatics/btae683","url":null,"abstract":"<p><strong>Motivation: </strong>Mathematical modelling plays a crucial role in understanding inter- and intracellular signalling processes. Currently, ordinary differential equations (ODEs) are the predominant approach in systems biology for modelling such pathways. While ODE models offer mechanistic interpretability, they also suffer from limitations, including the need to consider all relevant compounds, resulting in large models difficult to handle numerically and requiring extensive data.</p><p><strong>Results: </strong>In previous work, we introduced the retarded transient function (RTF) as an alternative method for modelling temporal responses of signalling pathways. Here, we extend the RTF approach to integrate concentration or dose-dependencies into the modelling of dynamics. With this advancement, RTF modelling now fully encompasses the application range of ordinary differential equation (ODE) models, which comprises predictions in both time and concentration domains. Moreover, characterizing dose-dependencies provides an intuitive way to investigate and characterize signalling differences between biological conditions or cell-types based on their response to stimulating inputs. To demonstrate the applicability of our extended approach, we employ data from time- and dose-dependent inflammasome activation in bone-marrow derived macrophages (BMDMs) treated with nigericin sodium salt. Our results show the effectiveness of the extended RTF approach as a generic framework for modelling dose-dependent kinetics in cellular signalling. The approach results in intuitively interpretable parameters that describe signal dynamics and enables predictive modelling of time- and dose-dependencies even if only individual cellular components are quantified.</p><p><strong>Availability: </strong>The presented approach is available within the MATLAB-based Data2Dynamics modelling toolbox at https://github.com/Data2Dynamics and https://zenodo.org/records/14008247 and as R code at https://github.com/kreutz-lab/RTF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tiberius: End-to-end deep learning with an HMM for gene prediction. Tiberius:利用 HMM 进行端到端深度学习,实现基因预测。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae685
Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke
{"title":"Tiberius: End-to-end deep learning with an HMM for gene prediction.","authors":"Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke","doi":"10.1093/bioinformatics/btae685","DOIUrl":"10.1093/bioinformatics/btae685","url":null,"abstract":"<p><strong>Motivation: </strong>For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor.</p><p><strong>Results: </strong>We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1-scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.</p><p><strong>Availability and implementation: </strong>https://github.com/Gaius-Augustus/Tiberius.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STRPsearch: fast detection of structured tandem repeat proteins. STRPsearch:快速检测结构串联重复蛋白。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae690
Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon
{"title":"STRPsearch: fast detection of structured tandem repeat proteins.","authors":"Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon","doi":"10.1093/bioinformatics/btae690","DOIUrl":"10.1093/bioinformatics/btae690","url":null,"abstract":"<p><strong>Motivation: </strong>Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. These proteins exhibit distinct secondary structures that form repetitive tertiary arrangements, often resulting in large molecular assemblies. Despite highly variable sequences, STRPs can perform important and diverse biological functions, maintaining a consistent structure with a variable number of repeat units. With the advent of protein structure prediction methods, millions of 3D-models of proteins are now publicly available. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation remains the most accurate method for detecting and classifying STRPs, making it impracticable to annotate millions of structures.</p><p><strong>Results: </strong>We introduce STRPsearch, a novel tool for the rapid identification, classification, and mapping of STRPs. Leveraging manually curated entries from RepeatsDB as the known conformational space of STRPs, STRPsearch employs the latest advances in structural alignment for a fast and accurate detection of repeated structural motifs in proteins, followed by an innovative approach to map units and insertions through the generation of TM-score profiles. STRPsearch is highly scalable, efficiently processing large datasets, and can be applied to both experimental structures and predicted models. Additionally, it demonstrates superior performance compared to existing tools, offering researchers a reliable and comprehensive solution for STRP analysis across diverse proteomes.</p><p><strong>Availability and implementation: </strong>STRPsearch is coded in Python. All scripts and associated documentation are available from: https://github.com/BioComputingUP/STRPsearch.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Damsel: Analysis and visualisation of DamID sequencing in R. Damsel:用 R 进行 DamID 测序的分析和可视化。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae695
Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack
{"title":"Damsel: Analysis and visualisation of DamID sequencing in R.","authors":"Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack","doi":"10.1093/bioinformatics/btae695","DOIUrl":"10.1093/bioinformatics/btae695","url":null,"abstract":"<p><strong>Summary: </strong>DamID sequencing is a technique to map the genome-wide interaction of a protein with DNA. Damsel is the first Bioconductor package to provide an end to end analysis for DamID sequencing data within R. Damsel performs quantification and testing of significant binding sites along with exploratory and visual analysis. Damsel produces results consistent with previous analysis approaches.</p><p><strong>Availability: </strong>The R package Damsel is available for install through the Bioconductor project https://bioconductor.org/packages/release/bioc/html/Damsel.html and the code is available on GitHub https://github.com/Oshlack/Damsel/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivities in protein allocation models reveal distribution of metabolic capacity and flux control. 蛋白质分配模型的敏感性揭示了代谢能力和通量控制的分布。
Bioinformatics (Oxford, England) Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae691
Samira van den Bogaard, Pedro A Saa, Tobias B Alter
{"title":"Sensitivities in protein allocation models reveal distribution of metabolic capacity and flux control.","authors":"Samira van den Bogaard, Pedro A Saa, Tobias B Alter","doi":"10.1093/bioinformatics/btae691","DOIUrl":"10.1093/bioinformatics/btae691","url":null,"abstract":"<p><strong>Motivation: </strong>Expanding on constraint-based metabolic models, protein allocation models (PAMs) enhance flux predictions by accounting for protein resource allocation in cellular metabolism. Yet, to this date, there are no dedicated methods for analyzing and understanding the growth-limiting factors in simulated phenotypes in PAMs.</p><p><strong>Results: </strong>Here, we introduce a systematic framework for identifying the most sensitive enzyme concentrations (sEnz) in PAMs. The framework exploits the primal and dual formulations of these models to derive sensitivity coefficients based on relations between variables, constraints, and the objective function. This approach enhances our understanding of the growth-limiting factors of metabolic phenotypes under specific environmental or genetic conditions. Compared to other traditional methods for calculating sensitivities, sEnz requires substantially less computation time and facilitates more intuitive comparison and analysis of sensitivities. The sensitivities calculated by sEnz cover enzymes, reactions and protein sectors, enabling a holistic overview of the factors influencing metabolism. When applied to an Escherichia coli PAM, sEnz revealed major pathways and enzymes driving overflow metabolism. Overall, sEnz offers a computational efficient framework for understanding PAM predictions and unravelling the factors governing a particular metabolic phenotype.</p><p><strong>Availability and implementation: </strong>sEnz is implemented in the modular toolbox for the generation and analysis of PAMs in Python (PAModelpy; v.0.0.3.3), available on Pypi (https://pypi.org/project/PAModelpy/). The source code together with all other python scripts and notebooks are available on GitHub (https://github.com/iAMB-RWTH-Aachen/PAModelpy).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facilitating phenotyping from clinical texts: the medkit library. 促进从临床文本中进行表型分析:medkit 库。
Bioinformatics (Oxford, England) Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae681
Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet
{"title":"Facilitating phenotyping from clinical texts: the medkit library.","authors":"Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet","doi":"10.1093/bioinformatics/btae681","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae681","url":null,"abstract":"<p><strong>Summary: </strong>Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.</p><p><strong>Results: </strong>To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.</p><p><strong>Availability and implementation: </strong>medkit is available at https://github.com/medkit-lib/medkit.</p><p><strong>Supplementary information: </strong>Documentation, examples and tutorials are available at https://medkit-lib.org/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LmRaC: a functionally extensible tool for LLM interrogation of user experimental results. LmRaC:功能可扩展的 LLM 用户实验结果查询工具。
Bioinformatics (Oxford, England) Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae679
Douglas B Craig, Sorin Drăghici
{"title":"LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.","authors":"Douglas B Craig, Sorin Drăghici","doi":"10.1093/bioinformatics/btae679","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae679","url":null,"abstract":"<p><strong>Motivation: </strong>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.</p><p><strong>Results: </strong>Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).</p><p><strong>Availability and implementation: </strong>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization. AltGosling:为可访问的基因组学数据可视化自动生成文本描述。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae670
Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg
{"title":"AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization.","authors":"Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg","doi":"10.1093/bioinformatics/btae670","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae670","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices. Screen readers require descriptions to convey image content. However, many images lack informative descriptions due to unawareness and difficulty writing such descriptions. Describing complex and interactive visualizations, like genomics data visualizations, is even more challenging. Automatic generation of descriptions could be beneficial, yet current alt text generating models are limited to basic visualizations and cannot be used for genomics.</p><p><strong>Results: </strong>We present AltGosling, an automated description generation tool focused on interactive data visualizations of genome-mapped data, created with the grammar-based genomics toolkit Gosling. The logic-based algorithm of AltGosling creates various descriptions including a tree-structured navigable panel. We co-designed AltGosling with a blind screen reader user (co-author). We show that AltGosling outperforms state-of-the-art large language models and common image-based neural networks for alt text generation of genomics data visualizations. As a first of its kind in genomic research, we lay the groundwork to increase accessibility in the field.</p><p><strong>Availability and implementation: </strong>The source code, examples, and interactive demo are accessible under the MIT License at https://github.com/gosling-lang/altgosling. The package is available at https://www.npmjs.com/package/altgosling.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling. FAPM:超越结构建模的多模式蛋白质功能注释。
Bioinformatics (Oxford, England) Pub Date : 2024-11-14 DOI: 10.1093/bioinformatics/btae680
Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi
{"title":"FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling.","authors":"Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi","doi":"10.1093/bioinformatics/btae680","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae680","url":null,"abstract":"<p><strong>Motivation: </strong>Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and \"tail labels\" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels.</p><p><strong>Results: </strong>We introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信