{"title":"Identification of immune and major depressive disorder-related diagnostic markers for early nonalcoholic fatty liver disease by WGCNA and machine learning.","authors":"Yuyun Jia, Yanping Cao, Qin Yin, Xueqian Li, Xiu Wen","doi":"10.3389/fbinf.2025.1594971","DOIUrl":"10.3389/fbinf.2025.1594971","url":null,"abstract":"<p><strong>Background: </strong>Major depressive disorder (MDD) and nonalcoholic fatty liver disease (NAFLD) are highly prevalent conditions that exhibit significant pathophysiological overlap, particularly in metabolic and immune pathways.</p><p><strong>Objective: </strong>This study aims to bridge this gap by integrating transcriptomic data from publicly available repositories and advanced machine learning algorithms to identify novel biomarkers and construct a predictive model facilitates the provision of clinical psychological nursing interventions for early-stage NAFLD in MDD patients.</p><p><strong>Method: </strong>We systematically analyzed transcriptomic data of simple steatosis (SS), nonalcoholic steatohepatitis (NASH), and major depressive disorder (MDD) from GEO databases to construct and validate a diagnostic model. After removing batch effects, we identified differentially expressed genes (DEGs) that distinguished disease and control groups. We further applied Weighted Gene Co-expression Network Analysis (WGCNA) to identify immune-related genes in SS/NASH patients versus controls. The intersection of shared DEGs across both conditions and WGCNA-identified genes was determined and subjected to functional enrichment analysis. Immune cell infiltration levels were quantified using single-sample gene set enrichment analysis (ssGSEA). A predictive model for SS/NASH was developed by evaluating nine machine-learning algorithms with 10-fold cross-validation on the datasets.</p><p><strong>Results: </strong>Fourteen genes strongly linked to both the immune system and the two conditions were identified. Immune cell infiltration profiling revealed distinct immune landscapes in patients versus healthy controls. Moreover, an eight-gene signature was developed, demonstrating superior diagnostic accuracy in both testing and training cohorts. Notably, these eight genes were found to correlate with the severity of early-stage NAFLD.</p><p><strong>Conclusion: </strong>This study established a predictive model for early-stage NAFLD through the integration of bioinformatics and machine learning approaches, with a focus on immune- and MDD-related genes. The eight-gene signature identified in this study represents a novel diagnostic tool for precision medicine, enabling targeted psychological nursing intervention in comorbid populations.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1594971"},"PeriodicalIF":2.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12271764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damiano Clementel, Alessio Del Conte, Alexander Miguel Monzon, Silvio C E Tosatto
{"title":"ngx-mol-viewers: Angular components for interactive molecular visualization in bioinformatics.","authors":"Damiano Clementel, Alessio Del Conte, Alexander Miguel Monzon, Silvio C E Tosatto","doi":"10.3389/fbinf.2025.1586744","DOIUrl":"10.3389/fbinf.2025.1586744","url":null,"abstract":"<p><p>Advancements in bioinformatics have been propelled by technologies like machine learning and have resulted in substantial increases in data generated from both empirical observations and computational models. Hence, well-known biological databases are growing in size and centrality by integrating data from different sources. While the primary goal of these databases is to collect and distribute data through application programming interfaces (APIs), providing visualization and analysis tools directly on the browser interface is crucial for users to understand the data, which increases the usefulness and overall impact of the databases. Currently, some front-end frameworks are available for the sustained development of the user interface (UI) and user experience (UX) of these resources. Angular is one of the most popular frameworks to be broadly adopted within the BioCompUP laboratory. This work describes a library of reusable and customizable components that can be easily integrated into the Angular framework to provide visualizations of various aspects of protein molecules, such as their sequences, structures, and annotations. Currently, the library includes three main independent components. The first is the ngx-structure-viewer, which allows visualization of molecules through the MolStar three-dimensional viewer. The second is the ngx-sequence-viewer, which provides visualization and annotation capabilities for a single sequence or multiple sequence alignments. The third the ngx-features-viewer, enables the mapping and visualization of various biological annotations onto the same molecule. All these tools are available for download through the Node Package Manager (NPM), and more information is available at https://biocomputingup.github.io/ngx-mol-viewers/ (under development).</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1586744"},"PeriodicalIF":2.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12243869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144610386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayesha Sajjad, Ihteshamul Haq, Rabia Syed, Faheem Anwar, Muhammad Hamza, Muhammad Musharaf, Tehmina Kiani, Faisal Nouroz
{"title":"<i>In-silico</i> molecular analysis and blocking of the viral G protein of Nipah virus interacting with ephrin B2 and B3 receptor by using peptide mass fingerprinting.","authors":"Ayesha Sajjad, Ihteshamul Haq, Rabia Syed, Faheem Anwar, Muhammad Hamza, Muhammad Musharaf, Tehmina Kiani, Faisal Nouroz","doi":"10.3389/fbinf.2025.1526566","DOIUrl":"10.3389/fbinf.2025.1526566","url":null,"abstract":"<p><strong>Introduction: </strong>The Nipah virus (NiV), a zoonotic paramyxovirus closely related to the Hendra virus, poses a significant global health threat due to its high mortality rate, zoonotic nature, and recurring outbreaks primarily in Malaysia, Bangladesh, and India. Infection with NiV leads to severe encephalitis and carries a case fatality rate ranging from 40% to 75%. The lack of a vaccine and limited understanding of NiV pathogenesis underscore the urgent need for effective therapeutics. This study focuses on identifying viral peptides of the Nipah virus using the peptide mass fingerprinting technique. This approach identified antiviral peptides acting as potent inhibitors, targeting the viral G-protein's interaction with cellular ephrin-B2 and B3 receptors. These receptors are crucial for viral entry into host cells and subsequent pathogenesis.</p><p><strong>Methods: </strong>Identifying NiV viral peptides not only enhances our understanding of the virus's structural and functional properties but also opens avenues for developing novel therapeutic strategies. By blocking the interaction between the viral G-protein and host receptors, these antiviral peptides offer promising prospects for drug development against NiV.</p><p><strong>Results and discussion: </strong>Twenty-one peptides were identified using peptide mass fingerprinting. These peptides were then subjected to docking analysis with two antiviral peptides of the ephrin B2 receptor and a monoclonal antibody, demonstrating robust stability and binding affinity. These predicted peptides contribute to the broader field of virology by elucidating key aspects of NiV biology and paving the way for the development of targeted antiviral therapies. Future studies may further explore the therapeutic potential of these peptides and their application in combating other viral infections.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1526566"},"PeriodicalIF":2.8,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238059/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengyao Sha, Jun Chen, Haifeng Hou, Huaihui Dou, Yan Zhang
{"title":"Integrated single-cell and bulk RNA dequencing to identify and validate prognostic genes related to T Cell senescence in acute myeloid leukemia.","authors":"Mengyao Sha, Jun Chen, Haifeng Hou, Huaihui Dou, Yan Zhang","doi":"10.3389/fbinf.2025.1606284","DOIUrl":"10.3389/fbinf.2025.1606284","url":null,"abstract":"<p><strong>Background: </strong>T-cell suppression in patients with Acute myeloid leukemia (AML) limits tumor cell clearance. This study aimed to explore the role of T-cell senescence-related genes in AML progression using single-cell RNA sequencing (scRNA-seq), bulk RNA sequencing (RNA-seq), and survival data of patients with AML in the TCGA database.</p><p><strong>Methods: </strong>The Uniform Manifold Approximation and Projection (UMAP) algorithm was used to identify different cell clusters in the GSE116256, and differentially expressed genes (DEGs) in T-cells were identified using the FindAllMarkers analysis. GSE114868 was used to identify DEGs in AML and control samples. Both were crossed with the CellAge database to identify aging-related genes. Univariate and multivariate regression analyses were performed to screen prognostic genes using the AML Cohort in The Cancer Genome Atlas (TCGA) Database (TCGA-LAML), and risk models were constructed to identify high-risk and low-risk patients. Line graphs showing the survival of patients with AML were created based on the independent prognostic factors, and Receiver Operating Characteristic Curve (ROC) curves were used to calculate the predictive accuracy of the line graph. GSE71014 was used to validate the prognostic ability of the risk score model. Tumor immune infiltration analysis was used to compare differences in tumor immune microenvironments between high- and low-risk AML groups. Finally, the expression levels of prognostic genes were verified using polymerase chain reaction (RT-qPCR).</p><p><strong>Results: </strong>31 AMLDEGs associated with aging identified 4 prognostic genes (CALR, CDK6, HOXA9, and PARP1) by univariate, multivariate, and stepwise regression analyses with risk modeling The ROC curves suggested that the line graph based on the independent prognostic factors accurately predicted the 1-, 3-, and 5-year survival of patients with AML. Tumor immune infiltration analyses suggested significant differences in the tumor immune microenvironment between low- and high-risk groups. Prognostic genes showed strong binding activity to target drugs (IGF1R and ABT737). RT-qPCR verified that prognostic gene expression was consistent with the data prediction results.</p><p><strong>Conclusion: </strong>CALR, CDK6, HOXA9, and PARP1 predicted disease progression and prognosis in patients with AML. Based on these, we developed and validated a new AML risk model with great potential for predicting patients' prognosis and survival.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1606284"},"PeriodicalIF":2.8,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12238043/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICARus: a pipeline to extract robust gene expression signatures from transcriptome datasets.","authors":"Zhaorong Li, Juan I Fuxman Bass","doi":"10.3389/fbinf.2025.1604418","DOIUrl":"10.3389/fbinf.2025.1604418","url":null,"abstract":"<p><p>Gene signature extraction from transcriptomics datasets has been instrumental to identify sets of co-regulated genes, identify associations with prognosis, and for biomarker discovery. Independent component analysis (ICA) is a powerful tool to extract such signatures to uncover hidden patterns in complex data and identify coherent gene sets. The ICARus package offers a robust pipeline to perform ICA on transcriptome datasets. While other packages perform ICA using one value of the main parameter (i.e., the number of signatures), ICARus identifies a range of near-optimal parameter values, iterates through these values, and assesses the robustness and reproducibility of the signature components identified. To test the performance of ICARus, we analyzed transcriptome datasets obtained from COVID-19 patients with different outcomes and from lung adenocarcinoma. We identified several reproducible gene expression signatures significantly associated with prognosis, temporal patterns, and cell type composition. The GSEA of these signatures matched findings from previous clinical studies and revealed potentially new biological mechanisms. ICARus with a vignette is available on Github https://github.com/Zha0rong/ICArus.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1604418"},"PeriodicalIF":2.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12222331/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Expert opinions in genomic analysis.","authors":"João C Setubal, Alberto Paccanaro","doi":"10.3389/fbinf.2025.1641083","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1641083","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1641083"},"PeriodicalIF":2.8,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12224187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144562165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing genomic prediction in <i>Arabidopsis thaliana</i> with optimized SNP subset by leveraging gene ontology priors and bin-based combinatorial optimization.","authors":"Qingfang Ba, Heng Zhou, Zheming Yuan, Zhijun Dai","doi":"10.3389/fbinf.2025.1607119","DOIUrl":"10.3389/fbinf.2025.1607119","url":null,"abstract":"<p><p>With the rapid development of high-density molecular marker chips and high-throughput sequencing technologies, genomic selection/prediction (GS/GP) has been widely applied in plant breeding. <i>Arabidopsis thaliana</i>, as a common model organism, provides important resources for dissecting genetic variation and evolutionary mechanisms of complex traits. Quantitative traits are typically influenced by multiple minor-effect genes, which are often functionally related and can be enriched within gene ontology (GO) pathways. However, optimizing marker subsets associated with these pathways to enhance GP performance remains challenging. In this study, we propose an improved GS framework called binGO-GS by integrating GO-based biological priors with a novel bin-based combinatorial SNP subset selection strategy. We evaluated the performance of binGO-GS on nine quantitative traits from two <i>A. thaliana</i> datasets, comprising nearly 1,000 samples and over 1.8 million SNPs. Compared with using either the full marker set or randomly selected markers with Genomic BLUP (GBLUP), binGO-GS achieved statistically significant improvements in prediction accuracy across all traits. Similar improvements were observed across six additional regression models when applying binGO-GS instead of the full marker set. Furthermore, the selected markers for identical or similar morphological traits exhibited consistent patterns in quantity and genomic distribution, supporting the polygenic model of complex quantitative traits driven by minor-effect genes. Taken together, binGO-GS offers a powerful and interpretable approach to enhance GS performance, providing a methodological reference for accelerating plant breeding and germplasm innovation.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1607119"},"PeriodicalIF":2.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ehsan Saghapour, Rahul Sharma, Delower Hossain, Kevin Song, Zhandos Sembay, Jake Y Chen
{"title":"Temporal GeneTerrain: advancing precision medicine through dynamic gene expression visualization.","authors":"Ehsan Saghapour, Rahul Sharma, Delower Hossain, Kevin Song, Zhandos Sembay, Jake Y Chen","doi":"10.3389/fbinf.2025.1602850","DOIUrl":"10.3389/fbinf.2025.1602850","url":null,"abstract":"<p><strong>Introduction: </strong>Understanding the temporal dynamics of gene expression is vital for interpreting biological responses, especially in drug treatment studies. Conventional visualization techniques, such as heatmaps and static clustering, often fail to effectively capture these temporal dynamics, particularly when analyzing large-scale multidimensional datasets. These traditional methods tend to obscure fine-grained temporal transitions, resulting in overcrowded visualizations, diminished clarity, and limited interpretability of biologically significant patterns.</p><p><strong>Methods: </strong>To address these visualization challenges, we introduce Temporal GeneTerrain, an advanced method designed to represent dynamic changes in gene expression over time. We applied Temporal GeneTerrain to compare transcriptomic perturbations induced by mefloquine (M), tamoxifen (T), and withaferin A (W), both individually and in all-pairwise and triple combinations (TM, TW, MW, and TMW), in LNCaP prostate cancer cells using the GSE149428 dataset (0, 3, 6, 9, 12, and 24 h). Expression values were first Z-score normalized, and the 1,000 most variably expressed genes were selected. To ensure coordinated temporal dynamics, we calculated Pearson correlation coefficients among these genes and retained those with r ≥ 0.5, resulting in 999 strongly co-expressed candidates. We then constructed a protein-protein interaction network for these genes and embedded it in two dimensions using the Kamada-Kawai force-directed algorithm. Finally, for each time point and treatment, we mapped the normalized expression values of the corresponding genes onto the fixed Kamada-Kawai layout as Gaussian density fields (σ = 0.03), generating a distinct Temporal GeneTerrain map for each time-condition combination.</p><p><strong>Results: </strong>The application of Temporal GeneTerrain revealed intricate temporal shifts in gene expression, particularly unveiling delayed responses in pathways such as NGF-stimulated transcription and the unfolded protein response under combined drug treatments. Compared to traditional heatmap visualizations, Temporal GeneTerrain significantly improved both resolution and interpretability, effectively capturing gene expression patterns' multidimensional and transient nature. This enhancement provides a solid foundation for further research and analysis, assuring the scientific community of the method's reliability.</p><p><strong>Discussion: </strong>Temporal GeneTerrain addresses the limitations of traditional visualization methods by offering an intuitive and detailed representation of gene expression dynamics. Compared to other approaches, such as heatmaps and static clustering, Temporal GeneTerrain uniquely captures the transient nature of gene expression patterns. This method significantly enhances the interpretability of complex biological datasets, thereby supporting informed decision-making in biological research and therapeutic developme","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1602850"},"PeriodicalIF":2.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12213653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Utility of regional STR marker variations in Tunisian and sub-Saharan populations: insights into forensic and population genetics.","authors":"Asma Attaoui, Hajer Foddha, Houcemeddine Othman, Hassen Ben Abdennebi, Amel Haj Khelil","doi":"10.3389/fbinf.2025.1550730","DOIUrl":"10.3389/fbinf.2025.1550730","url":null,"abstract":"<p><strong>Introduction: </strong>This study investigates the genetic variability and forensic applicability of Short Tandem Repeat (STR) loci including autosomal, X and Y-STR markers, across distinct Tunisian regions and among sub-Saharan African populations. Our objectives were to examine the regional allelic diversity of STR markers in Tunisia, and to assess the utility of these markers for forensic differentiation between Tunisian and sub-Saharan African.</p><p><strong>Methods: </strong>Twenty two STRs were genotyped in 500 Tunisian individuals and 501 sub-Saharan corpses by capillary electrophoresis using commercial system kits. A Chi-square test for homogeneity was applied to assess allele distribution and Principal Component Analysis to assess geographical allele variations. Bioinformatic methods in R packages were used, such as Logistic Regression Model to predict geographic group membership and Random Forest models to evaluate the discriminative power of the analyzed STRs.</p><p><strong>Results and discussion: </strong>Statistical analyses revealed significant allelic variability between Northern, Central, and Southern Tunisia for markers such as D1S1656, D8S1179, and CSF1PO. PCA illustrated a clear genetic distinction between Tunisian and sub-Saharan populations, largely attributable to geographical and historical gene flow barriers. LRM achieved high accuracy (95.96%) in predicting geographic affiliation. RF analysis identified DYS391 as highly discriminative in population differentiation. Our findings align with prior research on Tunisian genetic diversity and extend this knowledge by illustrating allelic frequency variations in order to establish region-specific databases.</p><p><strong>Conclusion: </strong>This study contributes valuable insights into the genetic structure of Tunisian and sub-Saharan populations, emphasizing tailored approaches in forensic practices.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1550730"},"PeriodicalIF":2.8,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12209214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolina Sciaraffa, Antonino Gagliano, Luigi Augugliaro, Claudia Coronnello
{"title":"Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics.","authors":"Nicolina Sciaraffa, Antonino Gagliano, Luigi Augugliaro, Claudia Coronnello","doi":"10.3389/fbinf.2025.1562410","DOIUrl":"10.3389/fbinf.2025.1562410","url":null,"abstract":"<p><strong>Introduction: </strong>The accurate clustering of cell subpopulations is a crucial aspect of single-cell RNA sequencing. The ability to correctly subdivide cell subpopulations hinges on the efficacy of unsupervised clustering. Despite the advancements and numerous adaptations of clustering algorithms, the correct clustering of cells remains a challenging endeavor that is dependent on the data in question and on the parameters selected for the clustering process. In this context, the present study aimed to predict the accuracy of clustering methods when varying different parameters by exploiting the intrinsic goodness metrics.</p><p><strong>Methods: </strong>This study utilized three datasets, each originating from a distinct anatomical district and with a ground truth cell annotation. Moreover, the investigation employed two clustering methods: the Leiden and the Deep Embedding for Single-cell Clustering (DESC) algorithm. Firstly, a robust linear mixed regression model has been implemented in order to analyze the impact of clustering parameters on the accuracy. Consequently, fifteen intrinsic measures have been calculated and used to train an ElasticNet regression model in both intra- and cross-dataset approaches to evaluate the possibility of predicting the clustering accuracy.</p><p><strong>Results and discussion: </strong>The first-order interactions demonstrated that the use of the UMAP method for the generation of the neighborhood graph and an increase in resolution has a beneficial impact on accuracy. The impact of the resolution parameter is accentuated by the reduced number of nearest neighbors, resulting in sparser and more locally sensitive graphs, which better preserve fine-grained cellular relationships. Furthermore, it is advisable to test different numbers of principal components, given that this parameter is highly affected by data complexity. This procedure has enabled the effective prediction of clustering accuracy through the utilization of intrinsic metrics. The findings demonstrated that the within-cluster dispersion and the Banfield-Raftery index could be effectively used as proxies for accuracy, for an immediate comparison of different clustering parameter configurations.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1562410"},"PeriodicalIF":2.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12187673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}