Bioinformatics advancesPub Date : 2026-04-26eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag119
Snehal Shah, Liangjiang Wang
{"title":"Accurate prediction of candidate lncRNAs associated with DNA damage response based on gene expression patterns from graph neural networks.","authors":"Snehal Shah, Liangjiang Wang","doi":"10.1093/bioadv/vbag119","DOIUrl":"https://doi.org/10.1093/bioadv/vbag119","url":null,"abstract":"<p><strong>Motivation: </strong>DNA damage response (DDR) is essential for maintaining genome stability and preventing tumorigenesis. While protein-coding DDR genes have been extensively investigated, long non-coding RNAs (lncRNAs) remain relatively understudied despite the growing evidence of their involvement in DDR. Particularly, it is rather challenging to systematically identify DDR-associated lncRNAs through experimental approaches, which are often time-consuming, labor-intensive, and expensive. Moreover, lncRNAs lack translational open reading frames often targeted by experimental methods.</p><p><strong>Results: </strong>In this study, we have developed a new machine learning approach, GlncDDR, which utilizes graph-based node embedding of gene expression features and supervised learning algorithms to predict candidate lncRNAs associated with DDR. GlncDDR models achieved robust predictive performance with ROC-AUC reaching ∼0.93 on test data. We used the models to predict 1232 candidate lncRNAs, including several known DDR regulators such as <i>JADRR, PINCR, TP53TG1, HOTAIR, MALAT1, ENRICD, and DINOL</i>. Interestingly, 212 of the candidates were found to be located near known DDR genes in the genome, supporting the potential functions of these lncRNAs in DDR. The results demonstrate the effectiveness of predicting DDR-associated lncRNAs based on cancer transcriptomic data and provide valuable targets for exploring the non-coding regulatory landscape of genome stability and cancer drug discovery.</p><p><strong>Availability: </strong>The source code and datasets used in the study are available at https://github.com/BioDataLearning/GlncDDR.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag119"},"PeriodicalIF":2.8,"publicationDate":"2026-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13143433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-26eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag016
Paolo Bresolin, Fabio Vandin
{"title":"General-purpose topology-aware embedding of tumor phylogenetic trees with graph neural networks.","authors":"Paolo Bresolin, Fabio Vandin","doi":"10.1093/bioadv/vbag016","DOIUrl":"https://doi.org/10.1093/bioadv/vbag016","url":null,"abstract":"<p><strong>Motivation: </strong>Phylogenetic trees are tree-like data structures commonly adopted to mathematically represent cancer clonal evolution. The information encoded by phylogenetic trees is important for clinical outcomes, but the automatic extraction of such information is still hard, also due to the fact that working directly with tree-like data structures is complex. This is especially true for machine learning tasks, where models are usually designed for vector data.</p><p><strong>Results: </strong>We introduce CPhyT-GNN, a novel Deep Learning method to compute unsupervised embeddings of phylogenetic trees. The embeddings learnt by CPhyT-GNN are vectors that can be used for a variety of machine learning tasks. CPhyT-GNN is based on Graph Neural Networks, which allow to obtain representations that combine the information provided by the alterations present in the tumor and the topological information provided by the corresponding phylogenetic tree. Experiments with cancer data show that the embeddings learnt by our model are general-purpose and can be applied to different tasks, with results that improve the state-of-the-art.</p><p><strong>Availability and implementation: </strong>Data and code are available at the following link: https://github.com/VandinLab/CPhyT-GNN.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag016"},"PeriodicalIF":2.8,"publicationDate":"2026-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13125753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-25eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag070
Marta Lloret-Llinares, Daniel Thomas-Lopez, José Carbonell-Caballero, Laurence Calzone, Javier Conejero, Jesse P Harrison, Miroslav Kratochvil, Arnau Montagud, Vincent Noël, Henrik Nortamo, Miguel Ponce-de-León, Pablo Rodríguez-Mier, Marco Ruscone, Dénes Türei, Miguel Vazquez, Alessandra Villa, Nadja Zlender, Brane Leskosek, Mariano Vazquez, Alfonso Valencia, Vera Matser, Cath Brooksbank
{"title":"Leveraging training expertise to build capacity in computational personalised medicine.","authors":"Marta Lloret-Llinares, Daniel Thomas-Lopez, José Carbonell-Caballero, Laurence Calzone, Javier Conejero, Jesse P Harrison, Miroslav Kratochvil, Arnau Montagud, Vincent Noël, Henrik Nortamo, Miguel Ponce-de-León, Pablo Rodríguez-Mier, Marco Ruscone, Dénes Türei, Miguel Vazquez, Alessandra Villa, Nadja Zlender, Brane Leskosek, Mariano Vazquez, Alfonso Valencia, Vera Matser, Cath Brooksbank","doi":"10.1093/bioadv/vbag070","DOIUrl":"https://doi.org/10.1093/bioadv/vbag070","url":null,"abstract":"<p><strong>Summary: </strong>Rapid development of genomic technologies in recent years enables personalised medicine to become an essential part of healthcare. Advanced computational methods are required to extract relevant insights that can be applied in clinical settings. This presents a challenge for clinicians and biomedical researchers, who need specialised training to adopt these tools. Within the context of PerMedCoE, the first European Centre of Excellence in Personalised Medicine, we developed and delivered a competency-based training programme to support professionals in the life sciences to work with modelling and simulation tools that integrate omics data to identify biological processes relevant to disease. We identified a set of required competencies in the field and built a series of career profiles with specific competence levels in these. The competencies and profiles contributed to define the focus and target audience of the training activities delivered: a combination of self-paced learning resources, webinars and online and face-to-face synchronous courses. The outputs of the programme (competencies, career profiles and training materials) can be used by biomedical professionals for their own career development or to train others. In addition, the approach can be adopted by other fields with rapid technological advancements and a constant need to upskill professionals.</p><p><strong>Availability and implementation: </strong>The competency framework is reproduced in full in this paper as supplementary material and available on the Competency Hub at https://competency.ebi.ac.uk/framework/permedcoe/2.1.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag070"},"PeriodicalIF":2.8,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13110006/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147790464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-24eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag102
Valerio Arnaboldi, Lynn M Schriml, Matt Jeffryes, Pengyuan Li, Susan Bello, Paul Sternberg, Carol Bult, Charles E Cook
{"title":"AI in biocuration: challenges, opportunities, and a roadmap for sustainable integration.","authors":"Valerio Arnaboldi, Lynn M Schriml, Matt Jeffryes, Pengyuan Li, Susan Bello, Paul Sternberg, Carol Bult, Charles E Cook","doi":"10.1093/bioadv/vbag102","DOIUrl":"https://doi.org/10.1093/bioadv/vbag102","url":null,"abstract":"<p><strong>Motivation: </strong>Biocuration is the integration of biological information databases for the enhancement of research. Curation of these databases is challenged by the exponential growth of scientific data and literature. Integration of machine learning and artificial intelligence methods into biocuration workflows may help address this challenge. We report on the discussions, ideas, and recommendations gathered from a workshop \"AI and biodata resources: implications for sustainability and best practices in biocuration\" at the 18th Annual International Biocuration Conference 2025.</p><p><strong>Results: </strong>Participants agreed that while AI offers transformative potential for efficiency and expanded curatorial capacity, its integration faces substantial hurdles. Key challenges revolve around data and model quality. Reproducibility issues and a lack of open, domain-specific training datasets further compound these problems. Broader concerns include inconsistent data standards, underdeveloped ontologies, unstructured data, legacy systems, and underfunded teams.Despite these issues, several successful AI applications were identified, including tools for literature summarization and workflow assistance. Participants emphasized the need for a refined model of human-AI collaboration requiring clear data provenance and transparency, new skills, and avoiding over-reliance on AI-generated data. The workshop ultimately recommended concerted efforts in infrastructure development, standardization, training, and quality assurance to guide the community toward effective human-AI collaboration that maintains scientific rigor.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag102"},"PeriodicalIF":2.8,"publicationDate":"2026-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13135627/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-17eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag034
Alireza Dehghan, Karim Abbasi, Mohammad Rasoul Kazemi Najaf Abadi
{"title":"Multi-DDA: drug-disease association prediction using a hybrid graph convolutional network with multi-modal drug representations.","authors":"Alireza Dehghan, Karim Abbasi, Mohammad Rasoul Kazemi Najaf Abadi","doi":"10.1093/bioadv/vbag034","DOIUrl":"https://doi.org/10.1093/bioadv/vbag034","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting drug-disease associations (DDAs) is essential for efficient drug repurposing. Although graph convolutional networks (GCNs) on heterogeneous drug-disease graphs are state-of-the-art, they often underutilize the rich, multi-modal data available for drugs, such as targets, enzymes, pathways, and chemical substructures.</p><p><strong>Results: </strong>To address this, we introduce Multi-DDA, a novel framework that systematically integrates these multi-modal drug features into a dedicated learning branch. These enriched drug descriptors are hierarchically combined with the outputs of each graph convolution layer, allowing subsequent layers to selectively refine the most informative node representations. This multi-modal fusion creates more comprehensive drug and disease embeddings. The representations are then processed by a graph attention layer to weigh the importance of different node connections before a final Multi-Layer Perceptron predicts the association matrix. Evaluated on a benchmark dataset of 269 drugs and 598 diseases, Multi-DDA outperforms seven existing methods across key metrics-Area Under the Precision-Recall Curve (AUPR), Area Under the Receiver Operating Characteristic Curve (AUC), and Recall. The significant gains in AUPR and Recall demonstrate its enhanced capability to identify potential DDAs, offering a powerful tool for advancing personalized medicine and drug discovery.</p><p><strong>Availability and implementation: </strong>The source code for Multi-DDA is freely available at https://github.com/dehghan1401/Multi-DDA.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag034"},"PeriodicalIF":2.8,"publicationDate":"2026-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13130202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-17eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag110
Zhiyong Liu, Yuhao Zhou, Wenqing Yang, Yashu Zhang, Yan He, Donghan Li, Songming Zheng, Shiqi Chen, Lijun Fan
{"title":"ULSL: Unified Latent and Similarity Learning for robust multi-omics cancer subtype identification.","authors":"Zhiyong Liu, Yuhao Zhou, Wenqing Yang, Yashu Zhang, Yan He, Donghan Li, Songming Zheng, Shiqi Chen, Lijun Fan","doi":"10.1093/bioadv/vbag110","DOIUrl":"https://doi.org/10.1093/bioadv/vbag110","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer's high heterogeneity necessitates precise molecular classification for improved clinical outcomes. However, current multi-omics clustering often struggles with molecular complexity. We propose Unified Latent and Similarity Learning (ULSL), a novel framework that simultaneously learns latent embeddings and similarity matrices through unified optimization. ULSL employs graph fusion for cross-omics structural consistency and latent representation learning to project data into low-dimensional spaces, effectively mitigating noise and high dimensionality.</p><p><strong>Results: </strong>ULSL was evaluated on synthetic datasets and 10 public cancer datasets from The Cancer Genome Atlas (TCGA). It consistently outperformed seven state-of-the-art methods in accuracy and robustness for subtype identification. On simulated datasets, ULSL maintained superior performance even with weak signal features and high noise levels. On TCGA datasets, ULSL not only identified survival-associated subtypes in a larger number of cancer types but also detected a greater number of clinically enriched features compared to competing approaches. Furthermore, the specific case study on AML demonstrated that ULSL aligns with the biological basis of the traditional FAB classification while offering distinct advantages in prognostic stratification.</p><p><strong>Availability and implementation: </strong>The source code for ULSL is available at https://github.com/codelzy-01/ULSL-1.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag110"},"PeriodicalIF":2.8,"publicationDate":"2026-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13138254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-16eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag112
Sumit Tarafder, Debswapna Bhattacharya
{"title":"PARSEbp: pairwise agreement-based RNA scoring with emphasis on base pairings.","authors":"Sumit Tarafder, Debswapna Bhattacharya","doi":"10.1093/bioadv/vbag112","DOIUrl":"10.1093/bioadv/vbag112","url":null,"abstract":"<p><strong>Motivation: </strong>High-fidelity scoring of RNA three-dimensional structures remains a major challenge in RNA structure prediction and conformational sampling. While single-model methods for scoring RNA structures can capture individual structural features, they fail to capture the broader structural consensus within a conformational ensemble, limiting their effectiveness in ranking and model selection.</p><p><strong>Results: </strong>We present PARSEbp, a fast and effective multi-model RNA scoring method that integrates pairwise structural agreement across the conformational ensemble with base pairing consistency. By leveraging both alignment-based global structural agreement at the three-dimensional level and base pairing consistency at the two-dimensional level, PARSEbp efficiently constructs a consensus similarity matrix from which per-structure accuracy scores are computed. Tested on RNA targets from the Critical Assessment of Structure Prediction (CASP) challenges CASP16 and CASP15, PARSEbp significantly outperforms existing single- and multi-model RNA scoring functions, including traditional statistical potentials, state-of-the-art deep learning methods, and consensus-based approaches, as well as a baseline variant of PARSEbp without the emphasis on base pairings, across a wide range of complementary assessment metrics.</p><p><strong>Availability and implementation: </strong>PARSEbp is freely available at https://github.com/Bhattacharya-Lab/PARSEbp.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag112"},"PeriodicalIF":2.8,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13132658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-15eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag108
Adedayo Olowolayemo, Amina Souag, Konstantinos Sirlantzis, Scott Turner, Cornelia Wilson
{"title":"A review of multi-omics integration techniques across five machine learning method families.","authors":"Adedayo Olowolayemo, Amina Souag, Konstantinos Sirlantzis, Scott Turner, Cornelia Wilson","doi":"10.1093/bioadv/vbag108","DOIUrl":"https://doi.org/10.1093/bioadv/vbag108","url":null,"abstract":"<p><strong>Motivation: </strong>Multi-omics integration methods are now common in cancer studies, but results remain sensitive to design choices, including when fusion occurs, what is fused, and how missingness is handled. As a result, it is difficult to compare studies and determine which integration choices are most reliable for cross-cohort cancer analyses.</p><p><strong>Results: </strong>From a PRISMA-guided review of 30 studies (2020-2025), we find that graph-based or hybrid pipelines dominate, with deep learning as the next most common family, and survival prediction as the main use case. Method families tend to align with the task and time of fusion; graph-hybrid approaches favour early- to intermediate-stage fusion, while deep learning spans the three stages of fusion. Across studies, three recurring trade-offs emerge: early-intermediate fusion can stabilize high-dimensional inputs but is sensitive to modality imbalance; shared latent-space designs better preserve partially observed samples; and late fusion supports more stable subtype structure but makes feature attribution less direct. The main message is that integration works best when fusion choices match the data's noise, sparsity, and missingness, and when interpretability is built into the architecture rather than added later.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag108"},"PeriodicalIF":2.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13135628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147846833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-15eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag100
Connor J Newstead, Josephine Bunch, Melanie J Bailey, Alex Dexter
{"title":"Dimensional cophenetic integrity: a method for evaluation of dimensionality reduction in MSI.","authors":"Connor J Newstead, Josephine Bunch, Melanie J Bailey, Alex Dexter","doi":"10.1093/bioadv/vbag100","DOIUrl":"https://doi.org/10.1093/bioadv/vbag100","url":null,"abstract":"<p><strong>Motivation: </strong>Mass spectrometry imaging data typically contains tens of thousands of pixels, and m/z channels which may relate to biomolecules of interest. It is impossible to visualize such highly dimensional data, and many multi-variate analyses cannot be conducted without reducing dimensionality. dimensionality reduction algorithms are commonly used for data visualisation, feature selection and as part of data clustering workflows in examination of large Mass spectrometry imaging datasets. In this work, we seek to develop methods to determine the ability of dimensionality reduction algorithms to preserve local and global structure within reduced data.</p><p><strong>Results: </strong>We have developed a novel evaluation method-Dimensional Cophenetic Integrity which measures the structure and pattern preservation of dimensionality reduction algorithms based on cophenetic distance of hierarchically clustered samples. We demonstrate that Dimensional Cophenetic Integrity results are indicative of expected tissue segmentation and image quality when compared to known synthetic data. Additionally, we find that optimum dimensionality reduction embeddings derive from hyperparameter selection far outside the typical range and show that Dimensional Cophenetic Integrity can be used as an objective criterion for Bayesian optimization. It is shown that optimization of dimensionality reduction preserve cluster relationships compared to default dimensionality reduction algorithm parameter decisions.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag100"},"PeriodicalIF":2.8,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13110009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147790476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2026-04-13eCollection Date: 2026-01-01DOI: 10.1093/bioadv/vbag107
Gal Gilad, Roded Sharan
{"title":"An optimization framework for hierarchical clustering.","authors":"Gal Gilad, Roded Sharan","doi":"10.1093/bioadv/vbag107","DOIUrl":"https://doi.org/10.1093/bioadv/vbag107","url":null,"abstract":"<p><strong>Motivation: </strong>Hierarchical clustering is a fundamental problem in computational biology, with popular greedy heuristics such as average linkage dating back to the 1950s but no well-defined objective. Recently, a combinatorial optimization criterion for the problem was suggested by Dasgupta. While minimizing this criterion is NP-hard, the popular average linkage method serves as a strong baseline. Nevertheless, its myopic, greedy nature frequently leads to structurally suboptimal hierarchies.</p><p><strong>Results: </strong>To remedy this, we introduce a novel average-linkage-based clustering approach that combines local and global considerations by generating multiple views of the input data and learning how to blend them into an integrated similarity measure. We demonstrate that our method, DOMUS, consistently outperforms strong baselines, including a beam search heuristic, on a wide range of synthetic and classic benchmark datasets. Furthermore, we validate its real-world applicability through a rigorous benchmark on single-cell RNA sequencing data, where it compares favorably with the state-of-the-art HiDeF algorithm.</p><p><strong>Availability and implementation: </strong>The DOMUS framework is implemented in Python and freely available at https://github.com/GalGilad/DOMUS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbag107"},"PeriodicalIF":2.8,"publicationDate":"2026-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13128330/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147824163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}