NAR Genomics and Bioinformatics最新文献

筛选
英文 中文
Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-27 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae184
Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu
{"title":"Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.","authors":"Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu","doi":"10.1093/nargab/lqae184","DOIUrl":"10.1093/nargab/lqae184","url":null,"abstract":"<p><p>Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize (<i>Zea mays</i> L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice (<i>Orzya sativa</i> L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae184"},"PeriodicalIF":4.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142903716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae176
Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur
{"title":"SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes.","authors":"Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur","doi":"10.1093/nargab/lqae176","DOIUrl":"10.1093/nargab/lqae176","url":null,"abstract":"<p><p>The incorporation of sequencing technologies in frontline and public health healthcare settings was vital in developing virus surveillance programs during the Coronavirus Disease 2019 (COVID-19) pandemic caused by transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, increased data acquisition poses challenges for both rapid and accurate analyses. To overcome these hurdles, we developed the SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL) for quick bulk analyses of Illumina short-read sequencing data. SIGNAL is a Snakemake workflow that seamlessly manages parallel tasks to process large volumes of sequencing data. A series of outputs are generated, including consensus genomes, variant calls, lineage assessments and identified variants of concern (VOCs). Compared to other existing SARS-CoV-2 sequencing workflows, SIGNAL is one of the fastest-performing analysis tools while maintaining high accuracy. The source code is publicly available (github.com/jaleezyy/covid-19-signal) and is optimized to run on various systems, with software compatibility and resource management all handled within the workflow. Overall, SIGNAL illustrated its capacity for high-volume analyses through several contributions to publicly funded government public health surveillance programs and can be a valuable tool for continuing SARS-CoV-2 Illumina sequencing efforts and will inform the development of similar strategies for rapid viral sequence assessment.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae176"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring transcription modalities from bimodal, single-cell RNA sequencing data. 从双模单细胞 RNA 测序数据中探索转录模式
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae179
Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara
{"title":"Exploring transcription modalities from bimodal, single-cell RNA sequencing data.","authors":"Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara","doi":"10.1093/nargab/lqae179","DOIUrl":"10.1093/nargab/lqae179","url":null,"abstract":"<p><p>There is a growing interest in generating bimodal, single-cell RNA sequencing (RNA-seq) data for studying biological pathways. These data are predominantly utilized in understanding phenotypic trajectories using RNA velocities; however, the shape information encoded in the two-dimensional resolution of such data is not yet exploited. In this paper, we present an elliptical parametrization of two-dimensional RNA-seq data, from which we derived statistics that reveal four different modalities. These modalities can be interpreted as manifestations of the changes in the rates of splicing, transcription or degradation. We performed our analysis on a cell cycle and a colorectal cancer dataset. In both datasets, we found genes that are not picked up by differential gene expression analysis (DGEA), and are consequently unnoticed, yet visibly delineate phenotypes. This indicates that, in addition to DGEA, searching for genes that exhibit the discovered modalities could aid recovering genes that set phenotypes apart. For communities studying biomarkers and cellular phenotyping, the modalities present in bimodal RNA-seq data broaden the search space of genes, and furthermore, allow for incorporating cellular RNA processing into regulatory analyses.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae179"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining single-cell data for cell type-disease associations.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae180
Kevin G Chen, Kathryn O Farley, Timo Lassmann
{"title":"Mining single-cell data for cell type-disease associations.","authors":"Kevin G Chen, Kathryn O Farley, Timo Lassmann","doi":"10.1093/nargab/lqae180","DOIUrl":"10.1093/nargab/lqae180","url":null,"abstract":"<p><p>A robust understanding of the cellular mechanisms underlying diseases sets the foundation for the effective design of drugs and other interventions. The wealth of existing single-cell atlases offers the opportunity to uncover high-resolution information on expression patterns across various cell types and time points. To better understand the associations between cell types and diseases, we leveraged previously developed tools to construct a standardized analysis pipeline and systematically explored associations across four single-cell datasets, spanning a range of tissue types, cell types and developmental time periods. We utilized a set of existing tools to identify co-expression modules and temporal patterns per cell type and then investigated these modules for known disease and phenotype enrichments. Our pipeline reveals known and novel putative cell type-disease associations across all investigated datasets. In addition, we found that automatically discovered gene co-expression modules and temporal clusters are enriched for drug targets, suggesting that our analysis could be used to identify novel therapeutic targets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae180"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyNetCor: a high-performance Python package for large-scale correlation analysis. PyNetCor:用于大规模相关分析的高性能 Python 软件包。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae177
Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang
{"title":"PyNetCor: a high-performance Python package for large-scale correlation analysis.","authors":"Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang","doi":"10.1093/nargab/lqae177","DOIUrl":"10.1093/nargab/lqae177","url":null,"abstract":"<p><p>The development of multi-omics technologies has generated an abundance of biological datasets, providing valuable resources for investigating potential relationships within complex biological systems. However, most correlation analysis tools face computational challenges when dealing with these high-dimensional datasets containing millions of features. Here, we introduce pyNetCor, a fast and scalable tool for constructing correlation networks on large-scale and high-dimensional data. PyNetCor features optimized algorithms for both full correlation coefficient matrix computation and top-k correlation search, outperforming other tools in the field in terms of runtime and memory consumption. It utilizes a linear interpolation strategy to rapidly estimate <i>P-</i>values and achieve false discovery rate control, demonstrating a speedup of over 110 times compared to existing methods. Overall, pyNetCor supports large-scale correlation analysis, a crucial foundational step for various bioinformatics workflows, and can be easily integrated into downstream applications to accelerate the process of extracting biological insights from data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae177"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae175
John T Chamberlin, Austin E Gillen, Aaron R Quinlan
{"title":"Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing.","authors":"John T Chamberlin, Austin E Gillen, Aaron R Quinlan","doi":"10.1093/nargab/lqae175","DOIUrl":"10.1093/nargab/lqae175","url":null,"abstract":"<p><p>Prevailing poly(dT)-primed 3' single-cell RNA-seq protocols generate barcoded cDNA fragments containing the reverse transcriptase priming site or in principle the polyadenylation site. Direct sequencing across this site was historically difficult because of DNA sequencing errors induced by the homopolymeric primer at the 'barcode' end. Here, we evaluate the capability of 'avidity base chemistry' DNA sequencing from Element Biosciences to sequence through the primer and enable accurate paired-end read alignment and precise quantification of polyadenylation sites. We find that the Element Aviti instrument sequences through the thymine homopolymer into the subsequent cDNA sequence without detectable loss of accuracy. The additional sequence enables direct and independent assignment of reads to polyadenylation sites, which bypasses the complexities and limitations of conventional approaches but does not consistently improve read mapping rates compared to single-end alignment. We also characterize low-level artifacts and demonstrate necessary adjustments to adapter trimming and sequence alignment regardless of platform, particularly in the context of extended read lengths. Our analyses confirm that Element avidity sequencing is an effective alternative to Illumina sequencing for standard single-cell RNA-seq, particularly for polyadenylation site measurement but do not rule out the potential for similar performance from other emerging platforms.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae175"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating Illumina DNA methylation data: biology versus technical artefacts.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae181
Selina Glaser, Helene Kretzmer, Iris Tatjana Kolassa, Matthias Schlesner, Anja Fischer, Isabell Fenske, Reiner Siebert, Ole Ammerpohl
{"title":"Navigating Illumina DNA methylation data: biology versus technical artefacts.","authors":"Selina Glaser, Helene Kretzmer, Iris Tatjana Kolassa, Matthias Schlesner, Anja Fischer, Isabell Fenske, Reiner Siebert, Ole Ammerpohl","doi":"10.1093/nargab/lqae181","DOIUrl":"10.1093/nargab/lqae181","url":null,"abstract":"<p><p>Illumina-based BeadChip arrays have revolutionized genome-wide DNA methylation profiling, pushing it into diagnostics. However, comprehensive quality assessment remains challenging within a wide range of available tissue materials and sample preparation methods. This study tackles two critical issues: differentiating between biological effects and technical artefacts in suboptimal quality samples and the impact of the first sample on the Illumina-like normalization algorithm. We introduce three quality control scores based on global DNA methylation distribution (DB-Score), bin distance from copy number variation analysis (BIN-Score) and consistently methylated CpGs (CM-Score) that rely on biological features rather than internal array controls. These scores, designed to be adjustable for different analysis tools and sample cohort characteristics, were explored and benchmarked across independent cohorts. Additionally, we reveal deviations in beta values caused by different sample rankings with the Illumina-like normalization algorithm, verified these with whole-genome methylation sequencing data and showed effects on differential DNA methylation analysis. Our findings underscore the necessity of consistently utilizing a pre-defined normalization sample within the ranking process to boost reproducibility of the Illumina-like normalization algorithm. Overall, our study delivers valuable insights, practical recommendations and R functions designed to enhance reproducibility and quality assurance of DNA methylation analysis, particularly for challenging sample types.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae181"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AntiBody Sequence Database.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae171
Simon Malesys, Rachel Torchet, Bertrand Saunier, Nicolas Maillet
{"title":"AntiBody Sequence Database.","authors":"Simon Malesys, Rachel Torchet, Bertrand Saunier, Nicolas Maillet","doi":"10.1093/nargab/lqae171","DOIUrl":"10.1093/nargab/lqae171","url":null,"abstract":"<p><p>Antibodies play a crucial role in the humoral immune response against health threats, such as viral infections. Although the theoretical number of human immunoglobulins is well over a trillion, the total number of unique antibody protein sequences accessible in databases is much lower than the number found in a single individual. Training AI (Artificial Intelligence) models, for example to assist in developing serodiagnoses or antibody-based therapies, requires building datasets according to strict criteria to include as many standardized antibody sequences as possible. However, the available sequences are scattered across partially redundant databases, making it difficult to compile them into single non-redundant datasets. Here, we introduce ABSD (AntiBody Sequence Database, https://absd.pasteur.cloud), which contains data from major publicly available resources, creating the largest standardized, automatically updated and non-redundant source of public antibody sequences. This user-friendly and open website enables users to generate lists of antibodies based on selected criteria and download the unique sequence pairs of their variable regions.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae171"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell- and tissue-specific glycosylation pathways informed by single-cell transcriptomics.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae169
Panagiotis Chrysinas, Shriramprasad Venkatesan, Isaac Ang, Vishnu Ghosh, Changyou Chen, Sriram Neelamegham, Rudiyanto Gunawan
{"title":"Cell- and tissue-specific glycosylation pathways informed by single-cell transcriptomics.","authors":"Panagiotis Chrysinas, Shriramprasad Venkatesan, Isaac Ang, Vishnu Ghosh, Changyou Chen, Sriram Neelamegham, Rudiyanto Gunawan","doi":"10.1093/nargab/lqae169","DOIUrl":"10.1093/nargab/lqae169","url":null,"abstract":"<p><p>While single-cell studies have made significant impacts in various subfields of biology, they lag in the Glycosciences. To address this gap, we analyzed single-cell glycogene expressions in the Tabula Sapiens dataset of human tissues and cell types using a recent glycosylation-specific gene ontology (GlycoEnzOnto). At the median sequencing (count) depth, ∼40-50 out of 400 glycogenes were detected in individual cells. Upon increasing the sequencing depth, the number of detectable glycogenes saturates at ∼200 glycogenes, suggesting that the average human cell expresses about half of the glycogene repertoire. Hierarchies in glycogene and glycopathway expressions emerged from our analysis: nucleotide-sugar synthesis and transport exhibited the highest gene expressions, followed by genes for core enzymes, glycan modification and extensions, and finally terminal modifications. Interestingly, the same cell types showed variable glycopathway expressions based on their organ or tissue origin, suggesting nuanced cell- and tissue-specific glycosylation patterns. Probing deeper into the transcription factors (TFs) of glycogenes, we identified distinct groupings of TFs controlling different aspects of glycosylation: core biosynthesis, terminal modifications, etc. We present webtools to explore the interconnections across glycogenes, glycopathways and TFs regulating glycosylation in human cell/tissue types. Overall, the study presents an overview of glycosylation across multiple human organ systems.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae169"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655298/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics.
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae174
Pacôme Prompsy, Mélissa Saichi, Félix Raimundo, Céline Vallot
{"title":"IDclust: Iterative clustering for unsupervised identification of cell types with single cell transcriptomics and epigenomics.","authors":"Pacôme Prompsy, Mélissa Saichi, Félix Raimundo, Céline Vallot","doi":"10.1093/nargab/lqae174","DOIUrl":"10.1093/nargab/lqae174","url":null,"abstract":"<p><p>The increasing diversity of single-cell datasets require systematic cell type characterization. Clustering is a critical step in single-cell analysis, heavily influencing downstream analyses. However, current unsupervised clustering algorithms rely on biologically irrelevant parameters that require manual optimization and fail to capture hierarchical relationships between clusters. We developed IDclust, a framework that identifies clusters with significant biological features at multiple resolutions using biologically meaningful thresholds like fold change, adjusted <i>P</i>-value and fraction of expressing cells. By iteratively processing and clustering subsets of the dataset, IDclust guarantees that all clusters found have significantly different features and stops only when no more interpretable cluster is found. It also creates a hierarchy of clusters, enabling visualization of the hierarchical relationships between different clusters. Analyzing multiple single-cell transcriptomic reference datasets, IDclust achieves superior clustering accuracy compared to state of the art algorithms. We showcase its utility by identifying previously unannotated clusters and identifying branching patterns in scATAC datasets. Using it's unsupervised nature and ability to analyze different -omics, we compare the resolution of different histone marks in multi-omic paired-tag dataset. Overall, IDclust automates single-cell exploration, facilitates cell type annotation and provides a biologically interpretable basis for clustering.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae174"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信