Wenzhe Xu, Xiaorong Liu, Jie Wang, Fan Zhang, Dongfeng Hu, Dongfeng Hu
{"title":"UAMRL: Multi-Granularity Uncertainty-Aware Multimodal Representation Learning for Drug-Target Affinity Prediction.","authors":"Wenzhe Xu, Xiaorong Liu, Jie Wang, Fan Zhang, Dongfeng Hu, Dongfeng Hu","doi":"10.1093/bioinformatics/btaf512","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf512","url":null,"abstract":"<p><strong>Motivation: </strong>Computational prediction of drug-target affinity (DTA) plays a critical role in modern drug discovery. However, the limited interpretability of traditional deep learning models and the heterogeneity of multimodal data from compounds and proteins hinder their reliability in practical drug development applications.</p><p><strong>Results: </strong>We propose a novel Uncertainty-aware Multimodal Representation Learning (UAMRL) framework to address these challenges. UAMRL employs a dual-stream encoder to learn cross-modal association mappings between drugs and targets in a latent space and integrates heterogeneous information from different modalities. Moreover, an uncertainty quantification mechanism based on the Normal-Inverse-Gamma distribution is introduced to model the reliability of heterogeneous information and suppress less trustworthy contributions during fusion. Experiments show that UAMRL achieves superior predictive accuracy on multiple public DTA datasets, improving both prediction performance and decision transparency.</p><p><strong>Availability: </strong>The source code is available at https://github.com/Astraea2xu/UAMRL.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145194148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandra M Kasianova, Anna V Klepikova, Oleg A Gusev, Guzel R Gazizova, Maria D Logacheva, Aleksey A Penin
{"title":"Full-length isoform constructor (FLIC) - a tool for isoform discovery based on long reads.","authors":"Alexandra M Kasianova, Anna V Klepikova, Oleg A Gusev, Guzel R Gazizova, Maria D Logacheva, Aleksey A Penin","doi":"10.1093/bioinformatics/btaf551","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf551","url":null,"abstract":"<p><strong>Motivation: </strong>Advances in high-throughput sequencing have illuminated the complexity of transcriptome landscape in eukaryotes. An inherent part of this complexity is the presence of multiple isoforms generated by the alternative splicing and the use of alternative transcription start and polyadenylation sites. However, currently available tools have limited capacity to infer full-length isoforms.</p><p><strong>Results: </strong>We developed a new pipeline, FLIC (Full-Length Isoform Constructor). FLIC is based on the long-read transcriptome data and integrates several key features: 1) utilizing biological replicate concordance to filter out noise and artifacts; 2) employing peak calling to precisely identify transcription start and polyadenylation sites; 3) enabling robust isoform reconstruction with minimal reliance on existing annotations. We evaluated FLIC using a dedicated set of real and simulated data of Arabidopsis thaliana cDNA sequencing. Results demonstrate that FLIC accurately reconstructs known and novel isoforms, outperforming existing tools, especially in the absence of reference annotations. A direct comparison with CAGE, currently regarded as the gold standard for transcription start site identification, shows that FLIC is equally accurate, while being much less time-consuming. Thus, FLIC provides a valuable tool for comprehensive transcript characterization, particularly for non-model organisms or when dealing with incomplete or inaccurate annotations.</p><p><strong>Availability: </strong>FLIC is available at https://github.com/albidgy/FLIC.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Dradjat, Massinissa Hamidi, Pierre Bartet, Blaise Hanczar
{"title":"Self-supervised Representation Learning on Gene Expression Data.","authors":"Kevin Dradjat, Massinissa Hamidi, Pierre Bartet, Blaise Hanczar","doi":"10.1093/bioinformatics/btaf533","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf533","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting phenotypes from gene expression data is a crucial task in biomedical research, enabling insights into disease mechanisms, drug responses, and personalized medicine. Traditional machine learning and deep learning rely on supervised learning, which requires large quantities of labeled data that are costly and time-consuming to obtain in the case of gene expression data. Self-supervised learning has recently emerged as a promising approach to overcome these limitations by extracting information directly from the structure of unlabeled data.</p><p><strong>Results: </strong>In this study, we investigate the application of state-of-the-art self-supervised learning methods to bulk gene expression data for phenotype prediction. We selected three self-supervised methods, based on different approaches, to assess their ability to exploit the inherent structure of the data and to generate qualitative representations which can be used for downstream predictive tasks. By using several publicly available gene expression datasets, we demonstrate how the selected methods can effectively capture complex information and improve phenotype prediction accuracy. The results obtained show that self-supervised learning methods can outperform traditional supervised models besides offering significant advantage by reducing the dependency on annotated data. We provide a comprehensive analysis of the performance of each method by highlighting their strengths and limitations. We also provide recommendations for using these methods depending on the case under study. Finally, we outline future research directions to enhance the application of self-supervised learning in the field of gene expression data analysis. This study is the first work that deals with bulk RNA-Seq data and self-supervised learning.</p><p><strong>Availability: </strong>The code and results are available at https://github.com/kdradjat/ssrl-rnaseq.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cancer Survival Prediction based on Soft-Label Guided Contrastive Learning and Global Feature Fusion.","authors":"Huiying Jiang, Wenlan Chen, Fei Guo, Cheng Liang","doi":"10.1093/bioinformatics/btaf552","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf552","url":null,"abstract":"<p><strong>Motivation: </strong>The high complexity and heterogeneity of cancer pose significant challenges to personalized treatment, making the improvement of cancer survival prediction accuracy crucial for clinical decision-making. The integration of multi-omics data enables a more comprehensive capture of multi-layered information in complex biological processes. However, existing survival analysis models still face limitations in accurately extracting and effectively integrating the unique and shared information from multi-omics data.</p><p><strong>Results: </strong>In this paper, we propose a novel prediction model for cancer survival based on soft-label guided contrastive learning and global feature fusion, namely SLCGF. Our model first extracts paired feature representations for each omics using Siamese encoders. We then perform intra-view and inter-view contrastive learning simultaneously, employing a neighborhood-based paradigm to enhance feature discrimination and alignment across omics. To ensure reliable neighbor retention and improve model robustness, we treat the affinities between samples and their high-order neighbors as soft labels to guide the contrastive learning process at both levels. In addition, we adopt a global self-attention mechanism to obtain the unified representation for cancer survival prediction, where the cross-omics connections are fully exploited and complementary information is adaptively integrated. We comprehensively evaluate the performance of our model on 13 cancer multi-omics datasets, and the experimental results demonstrate its superiority over existing approaches.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/LiangSDNULab/SLCGF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese
{"title":"Oncopacket: Integration of Cancer Research Data using GA4GH Phenopackets.","authors":"Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese","doi":"10.1093/bioinformatics/btaf546","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf546","url":null,"abstract":"<p><strong>Summary: </strong>Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket Schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute (NCI) for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 (IDH1) and survival time in brain cancer patients.</p><p><strong>Availability and implementation: </strong>Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).</p><p><strong>Supplementary information: </strong>Phenopackets for 23650 individuals from 12 cancer types, 7816 of which have mutational data (average 80 variants affecting 62 unique genes per patient), are available as a Zenodo dataset: https://doi.org/10.5281/zenodo.14610228. An example of plots summarizing a cohort of phenopackets is available in the online supplement.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assembly and reasoning over semantic mappings at scale for biomedical data integration.","authors":"Charles Tapley Hoyt, Klas Karis, Benjamin M Gyori","doi":"10.1093/bioinformatics/btaf542","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf542","url":null,"abstract":"<p><strong>Motivation: </strong>Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types. Often, these resources overlap by assigning identifiers to the same or related concepts. This creates a data interoperability bottleneck, as integrating data sets and knowledge bases that use identifiers for the same concepts from different resources requires such identifiers to be mapped to each other. However, available mappings are incomplete and fragmented across individual resources, motivating their large-scale integration.</p><p><strong>Results: </strong>We developed SeMRA, a software tool that integrates mappings from multiple sources into a graph data structure. Using graph algorithms, it infers missing mappings implied by available ones while keeping track of provenance and confidence. This allows connecting identifier spaces between which direct mapping was previously not possible. SeMRA implements a customizable workflow that takes a declarative specification as input describing sources to integrate with additional configuration parameters. We used SeMRA to produce the SeMRA Raw Mappings Database, an aggregation of 43.4 million mappings from 127 sources that jointly cover identifiers from 445 ontologies and databases. We also describe benchmarks on specific use cases such as integrating mappings between resources cataloging diseases and cell types.</p><p><strong>Availability: </strong>The code is available under the MIT license at https://github.com/biopragmatics/semra. The SeMRA Raw Mappings Database assembled by SeMRA is available at https://doi.org/10.5281/zenodo.11082038.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145194088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donn Liew, Akesha Dinuli Dharmatilleke, Edwin See, Ee Hou Yong
{"title":"G4STAB: A multi-input deep learning model to predict G-quadruplex thermodynamic stability based on sequence and salt concentration.","authors":"Donn Liew, Akesha Dinuli Dharmatilleke, Edwin See, Ee Hou Yong","doi":"10.1093/bioinformatics/btaf545","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf545","url":null,"abstract":"<p><strong>Motivation: </strong>G-quadruplexes (G4s) are non-canonical nucleic acid structures formed in guanine-rich regions that modulate gene regulation and genomic stability. The thermodynamic stability of G4s directly influences their biological functions and potential as therapeutic targets. However, current quantitative frameworks for predicting G4 stability rely on predetermined structural features, limiting their effectiveness for diverse G4 topologies, and fail to account for environmental factors such as ion concentration and pH that significantly modulate G4 stability in cellular contexts.</p><p><strong>Results: </strong>We present G4STAB, a multi-input deep learning neural network that accurately predicts DNA G4 melting temperatures based on sequence features, salt concentration, and pH. Trained on 2,382 diverse DNA G4 sequences, our model achieves high accuracy (R2 = 0.8) without relying on predetermined G4 structural features. G4STAB successfully captures established G4 stability determinants and proposes previously unobserved sequence-stability relationships. Analysis of 391,502 experimentally validated G4s reveals that cancer-like ionic environments alter G4 stability profiles, with a 13.5-fold increase in number of structures exhibiting physiological melting temperatures (36-42°C). These findings suggest systematic genomic patterns in G4 stability responses across chromosomes and gene types.</p><p><strong>Availability and implementation: </strong>G4STAB is available at https://github.com/donn-liew/G4STAB; G4STAB web database interface is available at https://donn-liew.github.io/g4stab-web-database/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145180775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adrian Chan, Isabel S Naarmann-de Vries, Christoph Dieterich
{"title":"Ψ-co-mAFiA: Concurrent detection of pseudouridine and m6A in single RNA molecules.","authors":"Adrian Chan, Isabel S Naarmann-de Vries, Christoph Dieterich","doi":"10.1093/bioinformatics/btaf536","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf536","url":null,"abstract":"<p><strong>>: </strong>The development of third-generation sequencing technologies enables the detection of RNA modifications at single-molecule resolution. Specifically for direct RNA sequencing (dRNA-Seq) on the ONT platform, we have previously developed an m6A detection algorithm called mAFiA. Here, we present the updated method, now covering all 18 DRACH m6A contexts as well as the identification of pseudouridine sites (Ψ). Our modification level predictions compare favorably with orthogonal methods and respond to knockdown or knock out of writer proteins. The simultaneous detection of multiple modifications on a single RNA molecule opens up the possibility to study cross-modification interactions.</p><p><strong>Availability: </strong>Ψ-co-mAFiA is available at https://github.com/dieterich-lab/psi-co-mAFiA and licensed under GPLv3.0. An archived version of the software is available on Zenodo at https://doi.org/10.5281/zenodo.16797676.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GRUMB: A Genome-Resolved Metagenomic Framework for Monitoring Urban Microbiomes and Diagnosing Pathogen Risk.","authors":"Suleiman Aminu, AbdulAziz Ascandari, Rachid Benhida, Rachid Daoud","doi":"10.1093/bioinformatics/btaf548","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf548","url":null,"abstract":"<p><strong>Summary: </strong>Urban infrastructure hosts dynamic microbial communities that complicate biosurveillance and AMR monitoring. Existing tools rarely combine genome-resolved reconstruction with ecological modeling and batch-aware analytics tailored to infrastructure-scale studies. We present GRUMB (Genome-Resolved Urban Microbiome Biosurveillance), an open-source, SLURM-compatible pipeline that reconstructs high-quality metagenome-assembled genomes (MAGs) from shotgun sequencing reads and integrates taxonomic/functional annotation (CARD, VFDB), batch-aware normalization, ecological diagnostics and machine learning classification of environment types with uncertainty and risk scoring. GRUMB accepts either SRA project accessions or paired-end FASTQ files with metadata, and produces assemblies, MAGs, taxonomic and functional profiles, ecological outputs and risk-informed classification. Its modular design enables reproducible, infrastructure-scale biosurveillance across diverse environments.</p><p><strong>Implementation and availability: </strong>.GRUMB is freely available under the MIT License at: https://github.com/SuleimanAminu/genome-resolved-urban-microbiome-biosurveillance; Zenodo DOI: https://doi.org/10.5281/zenodo.15505402. Requirements: Linux (Ubuntu 20.04+), Python 3.11, R 4.2+, SLURM. Issues and feature requests are tracked on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shawn T O'Neil, Brian M Schilder, Kevin Schaper, Corey Cox, Daniel Korn, Sarah Gehrke, Christopher J Mungall, Melissa A Haendel
{"title":"monarchr: An R Package for Querying Biomedical Knowledge Graphs.","authors":"Shawn T O'Neil, Brian M Schilder, Kevin Schaper, Corey Cox, Daniel Korn, Sarah Gehrke, Christopher J Mungall, Melissa A Haendel","doi":"10.1093/bioinformatics/btaf549","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf549","url":null,"abstract":"<p><strong>Summary: </strong>Biomedical Knowledge Graphs (KGs) aggregate and provide a wealth of information, linking genes and their variants, diseases, phenotypes, and much more. While these data are available in raw and API-hosted form, to date functionality for working with KGs in the R programming language has been limited. We introduce monarchr, a package for querying and manipulating KG data. Support for the expansive Monarch Initiative KG is built in, and monarchr can accommodate any KG in the Knowledge Graph eXchange (KGX) format. This tidy-inspired interface offers researchers an intuitive, iterative approach to querying and visualizing KG data.</p><p><strong>Availability and implementation: </strong>Source code, documentation, and installation instructions are available at https://github.com/monarch-initiative/monarchr.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}