{"title":"Accelerating drug discovery targeting dihydroorotate dehydrogenase using machine learning and generative AI approaches","authors":"Gayathri Krishnamurthy Ganga","doi":"10.1016/j.compbiolchem.2025.108443","DOIUrl":"10.1016/j.compbiolchem.2025.108443","url":null,"abstract":"<div><div>Dihydroorotate dehydrogenase (DHODH) is a key enzyme in pyrimidine biosynthesis, making it an attractive drug target for cancer, autoimmune diseases, and infections. Traditional DHODH inhibitor discovery is slow and costly. Our study integrated machine learning (ML) and generative artificial intelligence (AI) to accelerate this process, enhancing efficiency and reducing costs. We employed Random Forest (RF), XGBoost (XGB), and Logistic Regression (LR) to predict pIC50 values, with RF achieving the highest accuracy (93 % test accuracy, 81 % on unseen molecules), demonstrating superior generalization. Using a Graph Convolutional Network-based Variational Autoencoder (GCN-VAE), we generated 59 unique drug-like molecules, five with pIC50 > 7, expanding the chemical space beyond conventional screening.</div><div>Docking studies confirmed strong binding affinities, with the most promising newly generated molecule showing a binding energy of –11.1 kcal/mol and an inhibition constant (Ki) of 269.8 nM. Key interactions with residues such as ALA59, PHE36, TYR38, GLN47, and ARG36 further validated stability and inhibitory potential. This AI-driven workflow accelerates DHODH inhibitor discovery by significantly reducing screening time, enhancing molecular diversity, and improving predictive accuracy. Our approach presents a scalable, cost-effective strategy for developing novel therapeutics, offering a transformative shift in drug discovery.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108443"},"PeriodicalIF":2.6,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143783041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orkid Coskuner-Weber , Semih Alpsoy , Ozgur Yolcu , Egehan Teber , Ario de Marco , Spase Shumka
{"title":"Metagenomics studies in aquaculture systems: Big data analysis, bioinformatics, machine learning and quantum computing","authors":"Orkid Coskuner-Weber , Semih Alpsoy , Ozgur Yolcu , Egehan Teber , Ario de Marco , Spase Shumka","doi":"10.1016/j.compbiolchem.2025.108444","DOIUrl":"10.1016/j.compbiolchem.2025.108444","url":null,"abstract":"<div><div>The burgeoning field of aquaculture has become a pivotal contributor to global food security and economic growth, presently surpassing capture fisheries in aquatic animal production as evidenced by recent statistics. However, the dense fish populations inherent in aquaculture systems exacerbate abiotic stressors and promote pathogenic spread, posing a risk to sustainability and yield. This study delves into the transformative potential of metagenomics, a method that directly retrieves genetic material from environmental samples, in elucidating microbial dynamics within aquaculture ecosystems. Our findings affirm that metagenomics, bolstered by tools in big data analytics, bioinformatics, and machine learning, can significantly enhance the precision of microbial assessment and pathogen detection. Furthermore, we explore quantum computing’s emergent role, which promises unparalleled efficiency in data processing and model construction, poised to address the limitations of conventional computational techniques. Distinct from metabarcoding, metagenomics offers an expansive, unbiased profile of microbial biodiversity, revolutionizing our capacity to monitor, predict, and manage aquaculture systems with high accuracy and adaptability. Despite the challenges of computational demands and variability in data standardization, this study advocates for continued technological integration, thereby fostering resilient and sustainable aquaculture practices in a climate of escalating global food requirements.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108444"},"PeriodicalIF":2.6,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143776323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahid , Maqsood Hayat , Ali Raza , Shahid Akbar , Wajdi Alghamdi , Nadeem Iqbal , Quan Zou
{"title":"pACPs-DNN: Predicting anticancer peptides using novel peptide transformation into evolutionary and structure matrix-based images with self-attention deep learning model","authors":"Shahid , Maqsood Hayat , Ali Raza , Shahid Akbar , Wajdi Alghamdi , Nadeem Iqbal , Quan Zou","doi":"10.1016/j.compbiolchem.2025.108441","DOIUrl":"10.1016/j.compbiolchem.2025.108441","url":null,"abstract":"<div><div>Globally, cancer remains a major health challenge due to its high mortality rates. Traditional experimental approaches and therapies are resource-intensive and often cause significant side effects. Anticancer peptides (ACPs) have emerged as alternative therapeutic agents owing to their selectivity, safety, and potential to mitigate drug resistance. In this paper, we propose pACPs-DNN, a novel attention mechanism-based deep learning model developed for the accurate prediction of ACPs and non-ACPs. The pACPs-DNN model transforms input peptides into image representations using residue-wise energy contact matrix (RECM), substitution Matrix Representation (SMR), and Position Specific Scoring Matrix (PSSM) embeddings, followed by local binary pattern (LBP)-based decomposition to capture enhanced structural and local semantic features. These transformations generate novel feature sets, including RECM_LBP, LBP_SMR, and LBP_PSSM. Subsequently, a two-tier feature selection approach is employed to identify a high-ranking optimal feature set, which is then used to train an attention-based deep neural network. The proposed pACPs-DNN model achieves an impressive training accuracy of 96.91 % and an AUC of 0.98. To evaluate its generalization capability, the model was validated on independent datasets, demonstrating significant improvements of 5 % and 3.5 % in accuracy over existing models on the Ind-I and Ind-II datasets, respectively. The demonstrated efficacy and robustness of pACPs-DNN highlight its potential as a valuable tool for advancing drug discovery and academic research in cancer-related therapeutic development.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108441"},"PeriodicalIF":2.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lichao Zhang , Xue Wang , Ge Gao , Zhengyan Bian , Liang Kong
{"title":"SSE-Net: A novel network based on sequence spatial equation for Camellia sinensis lysine acetylation identification","authors":"Lichao Zhang , Xue Wang , Ge Gao , Zhengyan Bian , Liang Kong","doi":"10.1016/j.compbiolchem.2025.108442","DOIUrl":"10.1016/j.compbiolchem.2025.108442","url":null,"abstract":"<div><div>Lysine acetylation (Kace) is one of the most important post-translational modifications. It is key to identify Kace sites for understanding regulation mechanisms in Camellia sinensis. In this study, we defined a mathematical formula, named sequence spatial equation (SSE), which could give each amino acid coordinate in 3-D space by rotating and translating. Based on SSE, an optional network SSE-Net was constructed for representing spatial structure information. Centrality metrics of SSE-Net were used to design structure feature vectors for reflecting the importance of sites. The optimal features were fed into classifier to construct model SSE-ET. The results showed that SSE-ET outperformed the other classifiers. Meanwhile, all MCC results were higher than 0.7 for different machine learning, which indicated that SSE-Net was effective for representing Kace sites in Camellia sinensis. Moreover, we implemented the other models on our dataset. The results of comparison showed that SSE-ET was much more powerful than the others. Specifically, the result of SN was nearly 20 % higher than the other models. These results showed that the proposed SSE was a valuable mathematics concept for reflecting 3-D space Kace site information in Camellia sinensis, and SSE-Net may be an essential complementary for biology and bioinformatics research.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108442"},"PeriodicalIF":2.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renu Kumari , Divya Gupta , Preetom Regon , Kocsy Gábor , Sanjib Kumar Panda
{"title":"Genome-Wide Analysis for TLDc domain-containing genes in Oryza sativa","authors":"Renu Kumari , Divya Gupta , Preetom Regon , Kocsy Gábor , Sanjib Kumar Panda","doi":"10.1016/j.compbiolchem.2025.108428","DOIUrl":"10.1016/j.compbiolchem.2025.108428","url":null,"abstract":"<div><div>OXidation Resistance (OXR) is a family of eukaryotic proteins characterized by the presence of the highly conserved TLDc (TBC (Tre2/Bub2/Cdc16), LysM (lysine motif), domain catalytic) domain at the C-terminal half which plays a crucial role in cellular defense mechanisms, particularly in response to oxidative stress. TLDc (TBC/LysM domain catalytic) domain-containing proteins are essential regulators of oxidative stress responses in plants, a key juncture for various stress signaling pathways. This study identified six putative TLDc genes in the rice (<em>Oryza sativa</em> L.) genome through a comprehensive in silico analysis. These genes were characterized by their conserved TLDc domain, with gene expression analysis via qRT-PCR confirming their significant upregulation under drought and salt stress conditions. These findings suggest a potential role for TLDc genes in enhancing stress tolerance through oxidative stress regulation, making them promising miRNA targets for modulating stress responses. Comparative phylogenetic analysis reveals that rice TLDc genes share close evolutionary bonds with <em>Wheat</em>, <em>Maize</em>, and <em>Arabidopsis thaliana</em>, suggesting a conserved role across species. Especially, the study finds that gene duplications contribute to the diversity of TLDc genes, and examines how these duplications may influence protein subcellular localization, primarily in the plasma membrane, nucleus, and chloroplast, which are crucial for stress signaling pathways. This work builds on existing research by expanding our understanding of TLDc genes in <em>Oryza sativa</em>, addressing gaps in the functional characterization of the gene family in stress responses, and offering valuable insights for further exploration of their roles in plant resilience.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108428"},"PeriodicalIF":2.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shahid Mohammad Ganie , Pijush Kanti Dutta Pramanik
{"title":"Interpretable lung cancer risk prediction using ensemble learning and XAI based on lifestyle and demographic data","authors":"Shahid Mohammad Ganie , Pijush Kanti Dutta Pramanik","doi":"10.1016/j.compbiolchem.2025.108438","DOIUrl":"10.1016/j.compbiolchem.2025.108438","url":null,"abstract":"<div><div>Lung cancer is a leading cause of cancer-related death worldwide. The early and accurate detection of lung cancer is crucial for improving patient outcomes. Traditional predictive models often lack the accuracy and interpretability required in clinical settings. This study aims to enhance lung cancer prediction accuracy using ensemble learning methods while integrating explainable AI (XAI) techniques to ensure model interpretability. Advanced ensemble learning techniques, such as Voting and Stacking, have been implemented to improve the predictive accuracy compared to traditional models. The models are implemented on three real lung cancer datasets, comprising lifestyle data of the patients, and assessed using various performance metrics, highlighting their reliability in clinical diagnosis. XAI methods are incorporated to ensure the models are interpretable, fostering trust among clinicians. SHAP (SHapley Additive exPlanations) values are utilized to identify and prioritize clinical and demographic factors influencing risk predictions. The ensemble models demonstrate superior performance metrics, significantly improving lung cancer prediction accuracy. Specifically, the Stacking ensemble model achieves the average prediction accuracy of 99.59 %, precision of 100 %, recall of 97.64 %, F1-score 98.65 %, AUC of 100 %, Kappa 98.40 %, and MCC of 98.44 % across three datasets. We employed the Friedman aligned ranks test and Holm post hoc analysis to validate performance, showing that the Stacking ensemble consistently outperformed others with higher accuracy and reliable predictions. Feature importance analysis reveals critical risk factors, providing insights into their interconnectivity and enhancing risk assessment frameworks. Integrating XAI techniques ensures the models are interpretable, promoting their potential adoption in clinical practices. The findings support the development of targeted interventions and effective risk management strategies, aiming to improve patient outcomes in lung cancer diagnosis and treatment.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108438"},"PeriodicalIF":2.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PayloadGenX, a multi-stage hybrid virtual screening approach for payload design: A microtubule inhibitor case study","authors":"Faheem Ahmed , Anupama Samantasinghar , Naina Sunildutt , Kyung Hyun Choi","doi":"10.1016/j.compbiolchem.2025.108439","DOIUrl":"10.1016/j.compbiolchem.2025.108439","url":null,"abstract":"<div><div>Due to the rapid emergence of treatment-resistant cancers, there is a growing need to discover new anticancer therapies. Antibody-drug conjugates (ADCs) are aimed at solving this problem by specifically targeting and delivering cytotoxic payloads directly to cancer cells, thereby minimizing damage to healthy cells and enhancing treatment efficacy. Therefore, it is highly important to find an effective cytotoxic payload to ensure maximum therapeutic benefit and overcome cancer resistance. To address this challenge, we have developed a multi-stage hybrid virtual screening (VS) approach for payload design. We collected approximately 900 million molecules from databases such as ZINC12, ChEMBL, PubChem, and QM9. Additionally, 220 approved small molecule anticancer drugs were collected. Initially, these molecules were screened based on the Lipinski Rule of Five (RO5) criteria, resulting in 20 million molecules that met the drug-like properties criteria. Subsequently, fragments being key factor in this approach were generated from approved small molecule cancer drugs. This fragment-based screening approach resulted in identifying 6500, 36770, and 150,000 anticancer-like drugs with a similarity threshold greater than 0.6, 0.5, and 0.4. Similarity threshold when increased near to 1 bears better chance of discovering cancer like drugs. Further molecular docking of these anticancer-like drugs with β-tubulin resulted in identifying the top 1000 ranked drugs as microtubule inhibitors. ADMET analysis and synthetic validation followed by cell cytotoxicity further helps in shortlisting the 5 most effective payloads for further confirmation in preclinical setting. Additionally, molecular dynamics simulation was performed to confirm the structural stability and conformational dynamics of the Beta-tubulin-ligand complexes over a 100 ns simulation. In conclusion, this study effectively utilizes extensive compound databases and multi-stage screening methods to identify potent payloads, demonstrating promising advancements in discovering effective anticancer therapies.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108439"},"PeriodicalIF":2.6,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An insight into in silico strategies used for exploration of medicinal utility and toxicology of nanomaterials","authors":"Tahmeena Khan","doi":"10.1016/j.compbiolchem.2025.108435","DOIUrl":"10.1016/j.compbiolchem.2025.108435","url":null,"abstract":"<div><div>Nanomaterials (NMs) and the exploration of their comprehensive uses is an emerging research area of interest. They have improved physicochemical and biological properties and diverse functionality owing to their unique shape and size and therefore they are being explored for their enormous uses, particularly as medicinal and therapeutic agents. Nanoparticles (NPs) including metal and metal oxide-based NPs have received substantial consideration because of their biological applications. Computer-aided drug design (CADD) involving different strategies like homology modelling, molecular docking, virtual screening (VS), quantitative structure-activity relationship (QSAR) etc. and virtual screening hold significant importance in CADD used for lead identification and target identification. Despite holding importance, there are very few computational studies undertaken so far to explore their binding to the target proteins and macromolecules. Although the structural properties of nanomaterials are well documented, it is worthwhile to know how they interact with the target proteins making it a pragmatic issue for comprehension. This review discusses some important computational strategies like molecular docking and simulation, Nano-QSAR, quantum chemical calculations based on Density functional Theory (DFT) and computational nanotoxicology. Nano-QSAR modelling, based on semiempirical calculations and computational simulation can be useful for biomedical applications, whereas the DFT calculations make it possible to know about the behaviour of the material by calculations based on quantum mechanics, without the requirement of higher-order material properties. Other than the beneficial interactions, it is also important to know the hazardous consequences of engineered nanostructures and NPs can penetrate more deeply into the human body, and computational nanotoxicology has emerged as a potential strategy to predict the delirious effects of NMs. Although computational tools are helpful, yet more studies like <em>in vitro</em> assays are still required to get the complete picture, which is essential in the development of potent and safe drug entities.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108435"},"PeriodicalIF":2.6,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functionalized p-cymene and pyrazine derivatives: Physicochemical, ADMT, drug-likeness, and DFT studies","authors":"Goncagül Serdaroğlu","doi":"10.1016/j.compbiolchem.2025.108434","DOIUrl":"10.1016/j.compbiolchem.2025.108434","url":null,"abstract":"<div><div>The <em>p</em>-cymene and pyrazine derivatives functionalized with the hydroxy and methoxy group(s) were under the focus to explore the electronic structural properties, which would play a critical role in the biochemical reactivity features via performing systematic computational analyses. The DFT computations of the data set were performed by B3LYP/6–311 G* * level to predict the structural and electronic properties as well as the physicochemical values. The physicochemical properties such as lipophilicity and water solubility features were determined because these values should be in balance with each other in early-stage-drug-design research. The averaged lipophilicity of the <em>p-</em>cymene and pyrazine derivatives were calculated as CYM3 (2.39)< CYM1 (2.82)< CYM4 (3.11)< CYM2 (3.21)< CYM (3.50) and PYZ3 (1.22)< PYZ (1.28)< PYZ1 (1.40)< PYZ2 (1.79)< PYZ4 (2.00), respectively. According to the ESOL approach, the water solubility (mg/mL)x10<sup>−2</sup> values of the <em>p-</em>cymene and pyrazine compounds were changed in the following orders of CYM3 (15.6)> CYM4 (10.2)> CYM1 (7.40)>CYM2 (5.16)> CYM (3.12) and PYZ (512)> PYZ1 (170)> PYZ3 (166)> PYZ2 (118)> PYZ4 (77.3), respectively. The ADMT properties of the data set were dealt with in detail to estimate the structural advantage or disadvantage because the possible side effects on human-health and the environment have to be considered in designing the novel agent in addition to the possible potencies. All compounds would be promising agents in terms of the Caco-2 and MDCK penetration and Pgp-inhibition potencies. According to the IGC<sub>50</sub>, LC<sub>50</sub>FM, and LC<sub>50</sub>DM results, the <em>p</em>-cymene compounds could have lower (or no) risk than the glyphosate and pyrazine derivatives like being for BCF scores. The FMO analyses were performed to estimate the possible reactive region for nucleophilic or electrophilic attacks.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108434"},"PeriodicalIF":2.6,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanfei Mo , Yaoqi Ge , Dan Wang , Jizheng Wang , Rihua Zhang , Yifang Hu , Xiaoxuan Qin , Yanyan Hu , Shan Lu , Yun Liu , Wen-Song Zhang
{"title":"Comprehensive analysis of single-cell and bulk transcriptome unravels immune landscape of atherosclerosis and develops a S100 family based-diagnostic model","authors":"Yanfei Mo , Yaoqi Ge , Dan Wang , Jizheng Wang , Rihua Zhang , Yifang Hu , Xiaoxuan Qin , Yanyan Hu , Shan Lu , Yun Liu , Wen-Song Zhang","doi":"10.1016/j.compbiolchem.2025.108436","DOIUrl":"10.1016/j.compbiolchem.2025.108436","url":null,"abstract":"<div><h3>Background</h3><div>The S100 family of calcium-binding proteins (S100s) had been tightly related to the biological processes of various cardiovascular diseases. This study aims to investigate the expression of S100s in Atherosclerosis (AS) and explore their potential as diagnostic biomarkers and therapeutic targets.</div></div><div><h3>Methods</h3><div>We analyzed multiple sequencing datasets from the GEO database to compare the expression profiles of S100s in AS tissues versus normal samples. Employing unsupervised clustering techniques, AS subtypes were discerned based on the intricate variations in S100-related gene expression profiles. Subsequent analyses delved into immune cell infiltration and GSVA pathway enrichment, shedding light on the nuanced immune landscape characterizing diverse AS subtypes. Machine learning techniques were employed to develop a diagnostic model for AS. Single-cell RNA analysis was utilized to investigate the cellular distribution of S100 hub genes in AS.</div></div><div><h3>Results</h3><div>Unsupervised clustering analysis identified two distinct AS subtypes (C1 and C2), characterized by specific S100 gene expression patterns. The RF-based diagnostic model exhibited the highest efficacy (AUC=0.881), and the top five genes (S100A4, S100A10, S100A11, S100A13, S100Z) were used to construct a diagnostic nomogram.</div></div><div><h3>Conclusion</h3><div>This study systematically elucidates the roles of S100s in AS, offering insights into molecular subtyping, immune characteristics, and diagnostic model construction. The findings provide valuable implications for the precise treatment and prognosis assessment of AS and pave the way for further research into related mechanisms.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"117 ","pages":"Article 108436"},"PeriodicalIF":2.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}