Bioinformatics (Oxford, England)最新文献

Delineating inter- and intra-antibody repertoire evolution with AntibodyForests. 用AntibodyForests描述抗体库之间和内部的进化。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-09 DOI: 10.1093/bioinformatics/btaf560

Daphne van Ginneken, Valentijn Tromp, Lucas Stalder, Tudor-Stefan Cotet, Sophie Bakker, Anamay Samant, Sai T Reddy, Alexander Yermanos

{"title":"Delineating inter- and intra-antibody repertoire evolution with AntibodyForests.","authors":"Daphne van Ginneken, Valentijn Tromp, Lucas Stalder, Tudor-Stefan Cotet, Sophie Bakker, Anamay Samant, Sai T Reddy, Alexander Yermanos","doi":"10.1093/bioinformatics/btaf560","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf560","url":null,"abstract":"Motivation: The rapid advancements in immune repertoire sequencing, powered by single-cell technologies and artificial intelligence, have created unprecedented opportunities to study B cell evolution at a novel scale and resolution. However, fully leveraging these data requires specialized software capable of performing inter- and intra-repertoire analyses to unravel the complex dynamics of B cell repertoire evolution during immune responses.Results: Here, we present AntibodyForests, software to infer B cell lineages, quantify inter- and intra-antibody repertoire evolution, and analyze somatic hypermutation using protein language models and protein structure.Availability: This R package is available on CRAN and Github at https://github.com/alexyermanos/AntibodyForests, a vignette is available at https://cran.case.edu/web/packages/AntibodyForests/vignettes/AntibodyForests_vignette.html.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145254146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ImmunoPepper: Extracting personalized peptides from complex splicing graphs. 免疫辣椒：从复杂剪接图中提取个性化肽。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-09 DOI: 10.1093/bioinformatics/btaf492

Laurie Prélot, Jiayu Chen, Matthias Hüser, André Kahles, Gunnar Rätsch

{"title":"ImmunoPepper: Extracting personalized peptides from complex splicing graphs.","authors":"Laurie Prélot, Jiayu Chen, Matthias Hüser, André Kahles, Gunnar Rätsch","doi":"10.1093/bioinformatics/btaf492","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf492","url":null,"abstract":"Motivation: RNA Sequencing enables the characterization of a cell's transcript isoforms in healthy and disease conditions. In the context of cancer, local transcript variability may translate to splicing-derived tumor-associated peptides recognized by the immune system. A software tool that extracts such candidate peptides, is of great interest for personalized cancer therapy.Results: We present the open-source software tool ImmunoPepper, which extracts a set of biologically plausible peptides from a splicing graph, derived from a set of RNA-Seq datasets. This peptide set can be personalized with germline and somatic variation and takes novel RNA splice variants into account. ImmunoPepper supports several filtering options, including subtraction of normal tissue background, prediction of MHC-binding affinity, as well as MassSpec-based validation of identified peptides. We analyzed 32 ovarian cancer (TCGA-OV) and 31 breast invasive carcinoma (TCGA-BRCA) samples, with a strict cancer-specific filtering configuration, and obtained on average 834 and 569 cancer-specific predicted MHC-I binding 9-mers per sample, for each cohort, respectively. MassSpec validation with the target-decoy competition Subset-Neighbor-Search (SNS) showed an average validation rate of 4.5% per TCGA-OV sample and 5.3% per TCGA-BRCA sample. This corresponded to 25 MHC-I binders 9-mers per TCGA-OV sample, and 20 MHC-I binders 9-mers per TCGA-BRCA sample in average. Finally, we draw conclusions about the best framework for generation of splicing-derived neoepitopes and recommend to use joint data structures when processing homogeneously a cancer and a normal cohort and to focus on reproducibility of the candidates across generation pipelines.Availability: ImmunoPepper is implemented in Python 3 and is available as open source software at https://github.com/ratschlab/immunopepper. The online documentation can be found at https://immunopepper.readthedocs.io/en/latest/.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145254119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models. 图拓扑在生物医学知识图补全模型性能中的作用。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-07 DOI: 10.1093/bioinformatics/btaf547

Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Edward Morrissey, Carlo Luschi, Ian P Barrett, Daniel Justus

{"title":"The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models.","authors":"Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Edward Morrissey, Carlo Luschi, Ian P Barrett, Daniel Justus","doi":"10.1093/bioinformatics/btaf547","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf547","url":null,"abstract":"Motivation: Knowledge Graph Completion has been increasingly adopted as a useful method for helping address several tasks in biomedical research, such as drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models have been proposed over the years. However, little is known about the properties that render a dataset, and associated modelling choices, useful for a given task. Moreover, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial.Results: In this work, we conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world tasks. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications.Availability and implementation: The code used to perform experiments and analyse results in this article as well as all experimental data is available at https://github.com/graphcore-research/kg-topology-toolbox/tree/main/the_role_of_graph_topology_paper and archived on Zenodo, at https://doi.org/10.5281/zenodo.12097376.Supplementary information: Supplementary data are provided at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parsing GTF and FASTA files using the eccLib Library. 使用eccLib库解析GTF和FASTA文件。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-07 DOI: 10.1093/bioinformatics/btaf558

Tomasz Chady, Zuzanna Karolina Filutowska

{"title":"Parsing GTF and FASTA files using the eccLib Library.","authors":"Tomasz Chady, Zuzanna Karolina Filutowska","doi":"10.1093/bioinformatics/btaf558","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf558","url":null,"abstract":"Summary: Leveraging the Python/C API, eccLib was developed as a high-performance library designed for parsing genomic files and analysing genomic contexts. To the best of the authors' knowledge, it is the fastest Python-based solution available. With eccLib, users can efficiently parse GTF/GFFv3 and FASTA files and utilise the provided methods for additional analysis.Availability and implementation: This library is implemented in C and distributed under the GPL-3.0 licence. It is compatible with any system that has the Python interpreter (CPython) installed. The use of C enables numerous optimisations at both the implementation and algorithmic levels, which are either unachievable or impractical in Python.Contact: tomcha@st.amu.edu.pl, platyna@amu.edu.pl, eccdna@eccdna.pl.Supplementary information: This library is available for installation from the Python Package Index (PyPI) under the name eccLib https://pypi.org/project/eccLib/. The source code is available at https://gitlab.platinum.edu.pl/eccdna/eccLib. The version described by this document (1.1.0) is archived as https://doi.org/10.5281/zenodo.17024282. More detailed documentation can be accessed at https://gitlab-pages.platinum.edu.pl/eccdna/eccLib/.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeNoFo: a file format and toolkit for standardised, comparable de novo gene annotation. DeNoFo：用于标准化、可比较的从头基因注释的文件格式和工具包。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-06 DOI: 10.1093/bioinformatics/btaf539

Elias Dohmen, Margaux Aubel, Lars A Eicholt, Paul Roginski, Victor Luria, Amir Karger, Anna Grandchamp

{"title":"DeNoFo: a file format and toolkit for standardised, comparable de novo gene annotation.","authors":"Elias Dohmen, Margaux Aubel, Lars A Eicholt, Paul Roginski, Victor Luria, Amir Karger, Anna Grandchamp","doi":"10.1093/bioinformatics/btaf539","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf539","url":null,"abstract":"Motivation: De novo genes emerge from previously non-coding regions of the genome, challenging the traditional view that new genes primarily arise through duplication and adaptation of existing ones. Characterised by their rapid evolution and their novel structural properties or functional roles, de novo genes represent a young area of research. Therefore, the field currently lacks established standards and methodologies, leading to inconsistent terminology and challenges in comparing and reproducing results.Results: This work presents a standardised annotation format to document the methodology of de novo gene datasets in a reproducible way. We developed DeNoFo, a toolkit to provide easy access to this format that simplifies annotation of datasets and facilitates comparison across studies. Unifying the different protocols and methods in one standardised format, while providing integration into established file formats, such as fasta or gff, ensures comparability of studies and advances new insights in this rapidly evolving field.Availability and implementation: DeNoFo is available through the official Python Package Index (PyPI) and at https://github.com/EDohmen/denofo. All tools have a graphical user interface and a command line interface. The toolkit is implemented in Python3, available for all major platforms and installable with pip and uv.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Doblin: Inferring dominant clonal lineages from high-resolution DNA barcoding time series. 从高分辨率DNA条形码时间序列推断显性克隆谱系。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-06 DOI: 10.1093/bioinformatics/btaf555

Melis Gencel, David Gagné-Leroux, Adrian W R Serohijos

{"title":"Doblin: Inferring dominant clonal lineages from high-resolution DNA barcoding time series.","authors":"Melis Gencel, David Gagné-Leroux, Adrian W R Serohijos","doi":"10.1093/bioinformatics/btaf555","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf555","url":null,"abstract":"Motivation: The lineage dynamics and history of cells in a population reflect the interplay of evolutionary forces they experience, including mutation, drift, and selection. When the population is polyclonal, lineage dynamics also manifest the extent of clonal competition among co-existing mutational variants. If the population exists in a community of other species, the lineage dynamics could also reflect the population's ecological interaction with the rest of the community. Recent advances in high-resolution lineage tracking via DNA barcoding, coupled with next-generation sequencing of bacteria, yeast, and mammalian cells, allow for precise quantification of clonal dynamics in these organisms.Results: In this work, we introduce Doblin, an R suite for identifying dominant barcode lineages based on high-resolution lineage tracking data. We first benchmarked Doblin's accuracy using lineage data from evolutionary simulations, showing that it recovers the clones' identity and relative fitness in the simulation. Next, we applied Doblin to analyze clonal dynamics in laboratory evolutions of E. coli populations undergoing antibiotic treatment and in colonization experiments of the gut microbial community. Doblin's versatility allows it to be applied to lineage time-series data across different experimental setups.Availability and implementation: Doblin is available on CRAN (https://CRAN.R-project.org/package=doblin) and Github (https://github.com/dagagf/doblin).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NOODAI: A webserver for network-oriented multi-omics data analysis and integration pipeline. 面向网络的多组学数据分析和集成管道的web服务器。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-06 DOI: 10.1093/bioinformatics/btaf553

Tiberiu Totu, Rafael Riudavets Puig, Lukas Jonathan Häuser, Mattia Tomasoni, Hella Anna Bolck, Marija Buljan

{"title":"NOODAI: A webserver for network-oriented multi-omics data analysis and integration pipeline.","authors":"Tiberiu Totu, Rafael Riudavets Puig, Lukas Jonathan Häuser, Mattia Tomasoni, Hella Anna Bolck, Marija Buljan","doi":"10.1093/bioinformatics/btaf553","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf553","url":null,"abstract":"Summary: Omics profiling has proven of great use for unbiased and comprehensive identification of key features that define biological phenotypes and underlie medical conditions. While each omics profile assists characterization of specific molecular components relevant for the studied phenotype, their joint evaluation can offer deeper insights into the overall mechanistic functioning of biological systems. Here, we introduce an approach where, starting from representative traits (e.g., differentially expressed elements) obtained for each omics profile, we construct and analyze joint interaction networks. The resulting networks rely on the existing knowledge of confident interactions among biological entities. We use these maps to identify and describe central elements, which connect multiple entities characteristic of the studied phenotypes and we leverage MONET network decomposition tool in order to highlight functionally connected network modules. In order to enable broad usage of this approach, we developed the NOODAI software platform, which enables integrative omics analysis through a user-friendly interface. The analysis outcomes are presented both as raw output tables as well as informative summary plots and written reports. Since the MONET tool enables the use of algorithms with strong performance in identifying disease-relevant modules, NOODAI software platform can be of a high value for analyzing clinical multi-omics datasets.Availability and implementation: NOODAI is freely accessible at https://omics-oracle.com. Source code is available under GPL3 at: https://github.com/TotuTiberiu/NOODAI with the DOI: 10.5281/zenodo.17203984.Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CANA v1.0.0: efficient quantification of canalization in automata networks. CANA v1.0.0：自动机网络中渠化的有效量化。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf461

Austin M Marcus, Jordan Rozum, Herbert Sizek, Luis M Rocha

{"title":"CANA v1.0.0: efficient quantification of canalization in automata networks.","authors":"Austin M Marcus, Jordan Rozum, Herbert Sizek, Luis M Rocha","doi":"10.1093/bioinformatics/btaf461","DOIUrl":"10.1093/bioinformatics/btaf461","url":null,"abstract":"Summary: The biomolecular networks underpinning cell function exhibit canalization, or the buffering of fluctuations required to function in a noisy environment. We present a new major release of CANA, v1.0.0, an open-source Python package for understanding canalization in automata network models, discrete dynamical systems in which activation of biomolecular entities (e.g. transcription of genes) is modeled as the activity of coupled automata. One understudied putative mechanism for canalization is the functional equivalence of biomolecular regulators (e.g. among the transcription factors for a gene). We study this mechanism using the theory of symmetry in discrete functions. We present a new exact method, schematodes, for finding maximal symmetry groups among the inputs to discrete functions, and integrate it into CANA. The schematodes method substantially outperforms the inexact method of previous CANA versions both in speed and accuracy. We apply CANA v1.0.0 to study symmetry in 74 experimentally supported automata network models from the Cell Collective (CC) repository. The symmetry distribution is significantly different in the CC than in random automata with the same in-degree (connectivity) and bias (average output) (Kolmogorov-Smirnov test, P ≪ .001). Its spread is much wider than in a null model (IQR 0.31 versus IQR 0.20 with equal medians), demonstrating that the CC is enriched in functions with extreme symmetry or asymmetry.Availability and implementation: CANA source is on https://github.com/CASCI-lab/CANA and is installable via pip install cana. Source for schematodes is on https://github.com/CASCI-lab/schematodes. Analysis scripts are on https://github.com/CASCI-lab/symmetryInCellCollective.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144982322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating population-level and cell-based signatures for drug repositioning. 整合群体水平和基于细胞的药物重新定位特征。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf498

Chunfeng He, Yue Xu, Yuan Zhou, Jiayao Fan, Chunxiao Cheng, Ran Meng, Lang Wu, Ruiyuan Pan, Ravi V Shah, Eric R Gamazon, Dan Zhou

{"title":"Integrating population-level and cell-based signatures for drug repositioning.","authors":"Chunfeng He, Yue Xu, Yuan Zhou, Jiayao Fan, Chunxiao Cheng, Ran Meng, Lang Wu, Ruiyuan Pan, Ravi V Shah, Eric R Gamazon, Dan Zhou","doi":"10.1093/bioinformatics/btaf498","DOIUrl":"10.1093/bioinformatics/btaf498","url":null,"abstract":"Motivation: Drug repositioning presents a streamlined and cost-efficient way to expand the range of therapeutic possibilities. Drugs with human genetic evidence are more likely to advance successfully through clinical trials toward Food and Drug Administration approval. Single gene-based drug repositioning methods have been implemented, but approaches leveraging a broad spectrum of molecular signatures remain underexplored.Results: We propose a framework called \"Transcriptome-informed Reversal Distance\" (TReD) that embeds the disease signatures and drug response profiles into a high-dimensional normed space to quantify the reversal potential of candidate drugs in a disease-related cell-based screening. We applied TReD to COVID-19, type 2 diabetes, and Alzheimer's disease (AD), identifying 36, 16, and 11 candidate drugs, respectively. Among these, literature supports 69% (25/36), 31% (5/16), and 64% (7/11) of the drugs, with clinical trials conducted for seven COVID-19 candidates and three AD candidates. In summary, we propose a comprehensive genetics-anchored framework integrating population-level signatures and cell-based screening that has the potential to accelerate the search for new therapeutic strategies.Availability and implementation: Source code and datasets considered in this study are available at Github (https://github.com/zdangm/TReD). An archived snapshot is deposited at Zenodo (https://doi.org/10.5281/zenodo.16791909).","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145034788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CLUEY enables knowledge-guided clustering and cell type detection from single-cell omics data. CLUEY能够从单细胞组学数据中实现知识引导的聚类和细胞类型检测。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI: 10.1093/bioinformatics/btaf528

Daniel Kim, Carissa Chen, Lijia Yu, Jean Yee Hwa Yang, Pengyi Yang

{"title":"CLUEY enables knowledge-guided clustering and cell type detection from single-cell omics data.","authors":"Daniel Kim, Carissa Chen, Lijia Yu, Jean Yee Hwa Yang, Pengyi Yang","doi":"10.1093/bioinformatics/btaf528","DOIUrl":"10.1093/bioinformatics/btaf528","url":null,"abstract":"Motivation: Clustering is a fundamental task in single-cell omics data analysis and can significantly impact downstream analyses and biological interpretations. The standard approach involves grouping cells based on their gene expression profiles, followed by annotating each cluster to a cell type using marker genes. However, the number of cell types detected by different clustering methods can vary substantially due to several factors, including the dimension reduction method used and the choice of parameters of the chosen clustering algorithm. These discrepancies can lead to subjective interpretations in downstream analyses, particularly in manual cell type annotation.Results: To address these challenges, we propose CLUEY, a knowledge-guided framework for cell type detection and clustering of single-cell omics data. CLUEY integrates prior biological knowledge into the clustering process, providing guidance on the optimal number of clusters and enhancing the interpretability of results. We apply CLUEY to both unimodal (e.g. scRNA-seq, scATAC-seq) and multimodal datasets (e.g. CITE-seq, SHARE-seq) and demonstrate its effectiveness in providing biologically meaningful clustering outcomes. These results highlight CLUEY on providing the much-needed guidance in clustering analyses of single-cell omics data.Availability and implementation: CLUEY package is freely available from https://github.com/SydneyBioX/CLUEY.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12506888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0