{"title":"DeepDR: a deep learning library for drug response prediction.","authors":"Zhengxiang Jiang, Pengyong Li","doi":"10.1093/bioinformatics/btae688","DOIUrl":"10.1093/bioinformatics/btae688","url":null,"abstract":"<p><strong>Summary: </strong>Accurate drug response prediction is critical to advancing precision medicine and drug discovery. Recent advances in deep learning (DL) have shown promise in predicting drug response; however, the lack of convenient tools to support such modeling limits their widespread application. To address this, we introduce DeepDR, the first DL library specifically developed for drug response prediction. DeepDR simplifies the process by automating drug and cell featurization, model construction, training, and inference, all achievable with brief programming. The library incorporates three types of drug features along with nine drug encoders, four types of cell features along with nine cell encoders, and two fusion modules, enabling the implementation of up to 135 DL models for drug response prediction. We also explored benchmarking performance with DeepDR, and the optimal models are available on a user-friendly visual interface.</p><p><strong>Availability and implementation: </strong>DeepDR can be installed from PyPI (https://pypi.org/project/deepdr). The source code and experimental data are available on GitHub (https://github.com/user15632/DeepDR).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz
{"title":"Dynamic modelling of signalling pathways when ODEs are not feasible.","authors":"Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz","doi":"10.1093/bioinformatics/btae683","DOIUrl":"10.1093/bioinformatics/btae683","url":null,"abstract":"<p><strong>Motivation: </strong>Mathematical modelling plays a crucial role in understanding inter- and intracellular signalling processes. Currently, ordinary differential equations (ODEs) are the predominant approach in systems biology for modelling such pathways. While ODE models offer mechanistic interpretability, they also suffer from limitations, including the need to consider all relevant compounds, resulting in large models difficult to handle numerically and requiring extensive data.</p><p><strong>Results: </strong>In previous work, we introduced the retarded transient function (RTF) as an alternative method for modelling temporal responses of signalling pathways. Here, we extend the RTF approach to integrate concentration or dose-dependencies into the modelling of dynamics. With this advancement, RTF modelling now fully encompasses the application range of ordinary differential equation (ODE) models, which comprises predictions in both time and concentration domains. Moreover, characterizing dose-dependencies provides an intuitive way to investigate and characterize signalling differences between biological conditions or cell-types based on their response to stimulating inputs. To demonstrate the applicability of our extended approach, we employ data from time- and dose-dependent inflammasome activation in bone-marrow derived macrophages (BMDMs) treated with nigericin sodium salt. Our results show the effectiveness of the extended RTF approach as a generic framework for modelling dose-dependent kinetics in cellular signalling. The approach results in intuitively interpretable parameters that describe signal dynamics and enables predictive modelling of time- and dose-dependencies even if only individual cellular components are quantified.</p><p><strong>Availability: </strong>The presented approach is available within the MATLAB-based Data2Dynamics modelling toolbox at https://github.com/Data2Dynamics and https://zenodo.org/records/14008247 and as R code at https://github.com/kreutz-lab/RTF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke
{"title":"Tiberius: End-to-end deep learning with an HMM for gene prediction.","authors":"Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke","doi":"10.1093/bioinformatics/btae685","DOIUrl":"10.1093/bioinformatics/btae685","url":null,"abstract":"<p><strong>Motivation: </strong>For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor.</p><p><strong>Results: </strong>We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1-scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.</p><p><strong>Availability and implementation: </strong>https://github.com/Gaius-Augustus/Tiberius.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon
{"title":"STRPsearch: fast detection of structured tandem repeat proteins.","authors":"Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon","doi":"10.1093/bioinformatics/btae690","DOIUrl":"10.1093/bioinformatics/btae690","url":null,"abstract":"<p><strong>Motivation: </strong>Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. These proteins exhibit distinct secondary structures that form repetitive tertiary arrangements, often resulting in large molecular assemblies. Despite highly variable sequences, STRPs can perform important and diverse biological functions, maintaining a consistent structure with a variable number of repeat units. With the advent of protein structure prediction methods, millions of 3D-models of proteins are now publicly available. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation remains the most accurate method for detecting and classifying STRPs, making it impracticable to annotate millions of structures.</p><p><strong>Results: </strong>We introduce STRPsearch, a novel tool for the rapid identification, classification, and mapping of STRPs. Leveraging manually curated entries from RepeatsDB as the known conformational space of STRPs, STRPsearch employs the latest advances in structural alignment for a fast and accurate detection of repeated structural motifs in proteins, followed by an innovative approach to map units and insertions through the generation of TM-score profiles. STRPsearch is highly scalable, efficiently processing large datasets, and can be applied to both experimental structures and predicted models. Additionally, it demonstrates superior performance compared to existing tools, offering researchers a reliable and comprehensive solution for STRP analysis across diverse proteomes.</p><p><strong>Availability and implementation: </strong>STRPsearch is coded in Python. All scripts and associated documentation are available from: https://github.com/BioComputingUP/STRPsearch.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack
{"title":"Damsel: Analysis and visualisation of DamID sequencing in R.","authors":"Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack","doi":"10.1093/bioinformatics/btae695","DOIUrl":"10.1093/bioinformatics/btae695","url":null,"abstract":"<p><strong>Summary: </strong>DamID sequencing is a technique to map the genome-wide interaction of a protein with DNA. Damsel is the first Bioconductor package to provide an end to end analysis for DamID sequencing data within R. Damsel performs quantification and testing of significant binding sites along with exploratory and visual analysis. Damsel produces results consistent with previous analysis approaches.</p><p><strong>Availability: </strong>The R package Damsel is available for install through the Bioconductor project https://bioconductor.org/packages/release/bioc/html/Damsel.html and the code is available on GitHub https://github.com/Oshlack/Damsel/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samira van den Bogaard, Pedro A Saa, Tobias B Alter
{"title":"Sensitivities in protein allocation models reveal distribution of metabolic capacity and flux control.","authors":"Samira van den Bogaard, Pedro A Saa, Tobias B Alter","doi":"10.1093/bioinformatics/btae691","DOIUrl":"10.1093/bioinformatics/btae691","url":null,"abstract":"<p><strong>Motivation: </strong>Expanding on constraint-based metabolic models, protein allocation models (PAMs) enhance flux predictions by accounting for protein resource allocation in cellular metabolism. Yet, to this date, there are no dedicated methods for analyzing and understanding the growth-limiting factors in simulated phenotypes in PAMs.</p><p><strong>Results: </strong>Here, we introduce a systematic framework for identifying the most sensitive enzyme concentrations (sEnz) in PAMs. The framework exploits the primal and dual formulations of these models to derive sensitivity coefficients based on relations between variables, constraints, and the objective function. This approach enhances our understanding of the growth-limiting factors of metabolic phenotypes under specific environmental or genetic conditions. Compared to other traditional methods for calculating sensitivities, sEnz requires substantially less computation time and facilitates more intuitive comparison and analysis of sensitivities. The sensitivities calculated by sEnz cover enzymes, reactions and protein sectors, enabling a holistic overview of the factors influencing metabolism. When applied to an Escherichia coli PAM, sEnz revealed major pathways and enzymes driving overflow metabolism. Overall, sEnz offers a computational efficient framework for understanding PAM predictions and unravelling the factors governing a particular metabolic phenotype.</p><p><strong>Availability and implementation: </strong>sEnz is implemented in the modular toolbox for the generation and analysis of PAMs in Python (PAModelpy; v.0.0.3.3), available on Pypi (https://pypi.org/project/PAModelpy/). The source code together with all other python scripts and notebooks are available on GitHub (https://github.com/iAMB-RWTH-Aachen/PAModelpy).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet
{"title":"Facilitating phenotyping from clinical texts: the medkit library.","authors":"Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet","doi":"10.1093/bioinformatics/btae681","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae681","url":null,"abstract":"<p><strong>Summary: </strong>Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.</p><p><strong>Results: </strong>To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.</p><p><strong>Availability and implementation: </strong>medkit is available at https://github.com/medkit-lib/medkit.</p><p><strong>Supplementary information: </strong>Documentation, examples and tutorials are available at https://medkit-lib.org/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.","authors":"Douglas B Craig, Sorin Drăghici","doi":"10.1093/bioinformatics/btae679","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae679","url":null,"abstract":"<p><strong>Motivation: </strong>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.</p><p><strong>Results: </strong>Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).</p><p><strong>Availability and implementation: </strong>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg
{"title":"AltGosling: Automatic Generation of Text Descriptions for Accessible Genomics Data Visualization.","authors":"Thomas C Smits, Sehi L'Yi, Andrew P Mar, Nils Gehlenborg","doi":"10.1093/bioinformatics/btae670","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae670","url":null,"abstract":"<p><strong>Motivation: </strong>Biomedical visualizations are key to accessing biomedical knowledge and detecting new patterns in large datasets. Interactive visualizations are essential for biomedical data scientists and are omnipresent in data analysis software and data portals. Without appropriate descriptions, these visualizations are not accessible to all people with blindness and low vision, who often rely on screen reader accessibility technologies to access visual information on digital devices. Screen readers require descriptions to convey image content. However, many images lack informative descriptions due to unawareness and difficulty writing such descriptions. Describing complex and interactive visualizations, like genomics data visualizations, is even more challenging. Automatic generation of descriptions could be beneficial, yet current alt text generating models are limited to basic visualizations and cannot be used for genomics.</p><p><strong>Results: </strong>We present AltGosling, an automated description generation tool focused on interactive data visualizations of genome-mapped data, created with the grammar-based genomics toolkit Gosling. The logic-based algorithm of AltGosling creates various descriptions including a tree-structured navigable panel. We co-designed AltGosling with a blind screen reader user (co-author). We show that AltGosling outperforms state-of-the-art large language models and common image-based neural networks for alt text generation of genomics data visualizations. As a first of its kind in genomic research, we lay the groundwork to increase accessibility in the field.</p><p><strong>Availability and implementation: </strong>The source code, examples, and interactive demo are accessible under the MIT License at https://github.com/gosling-lang/altgosling. The package is available at https://www.npmjs.com/package/altgosling.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling.","authors":"Wenkai Xiang, Zhaoping Xiong, Huan Chen, Jiacheng Xiong, Wei Zhang, Zunyun Fu, Mingyue Zheng, Bing Liu, Qian Shi","doi":"10.1093/bioinformatics/btae680","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae680","url":null,"abstract":"<p><strong>Motivation: </strong>Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and \"tail labels\" with few known examples. Previous methods mainly focused on protein sequence features, overlooking the semantic meaning of protein labels.</p><p><strong>Results: </strong>We introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}