Biodata Mining最新文献

An intelligent healthcare system for rare disease diagnosis utilizing electronic health records based on a knowledge-guided multimodal transformer framework. 基于知识引导的多模态变压器框架，利用电子健康记录进行罕见病诊断的智能医疗保健系统。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-10-07 DOI: 10.1186/s13040-025-00487-0

Ahed Abugabah, Prashant Kumar Shukla, Piyush Kumar Shukla, Ankur Pandey

{"title":"An intelligent healthcare system for rare disease diagnosis utilizing electronic health records based on a knowledge-guided multimodal transformer framework.","authors":"Ahed Abugabah, Prashant Kumar Shukla, Piyush Kumar Shukla, Ankur Pandey","doi":"10.1186/s13040-025-00487-0","DOIUrl":"https://doi.org/10.1186/s13040-025-00487-0","url":null,"abstract":"Rare diseases are a common problem with millions of patients globally, but their diagnosis is difficult because of varied clinical presentations, small sample size, and disparate biomedical data sources. Current diagnostic tools are not able to combine multimodal information effectively, which results in a timely or wrong diagnosis. To fill this gap, this paper suggests a smart multimodal healthcare framework integrating electronic health records (EHRs), genomic sequences, and medical imaging to improve the detection of rare diseases. The framework uses Swin Transformer to extract hierarchical visual features in radiographic scans, Med-BERT and Transformer-XL to learn semantic and long-term temporal relations in longitudinal electronic health record narratives, and a Graph Neural Network (GNN)-based encoder to learn functional and structural relations in genomic sequences. The alignment of the cross-modal representation is further boosted with a Knowledge-Guided Contrastive Learning (KGCL) mechanism, which takes advantage of rare disease ontologies in Orphanet to improve the interpretability of the model and infusion of knowledge. To achieve strong performance, the Nutcracker Optimization Algorithm (NOA) is proposed to optimize hyperparameters, calibrate attention mechanisms, and enhance multimodal fusion. Experimental results on MIMIC-IV (EHR), ClinVar (genomics), and CheXpert (imaging) datasets show that the proposed framework significantly outperforms the state-of-the-art multimodal baselines in terms of accuracy and robustness of early rare disease diagnosis. This paper presents the opportunity to integrate hierarchical vision transformers, domain-specific language models, graph-based genomic encoders, and knowledge-directed optimization to make explainable, accurate, and clinically applicable healthcare decisions in rare disease settings.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"70"},"PeriodicalIF":6.1,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A graph-theoretic framework for quantitative analysis of angiogenic networks. 血管生成网络定量分析的图论框架。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-10-02 DOI: 10.1186/s13040-025-00478-1

Goodluck Okoro, Pawel Wityk, Michael B Nelappana, Karl A Jackiewicz, Veronica Z Kucharczyk, Annie Tigranyan, Catherine C Applegate, Iwona T Dobrucki, Lawrence W Dobrucki

{"title":"A graph-theoretic framework for quantitative analysis of angiogenic networks.","authors":"Goodluck Okoro, Pawel Wityk, Michael B Nelappana, Karl A Jackiewicz, Veronica Z Kucharczyk, Annie Tigranyan, Catherine C Applegate, Iwona T Dobrucki, Lawrence W Dobrucki","doi":"10.1186/s13040-025-00478-1","DOIUrl":"10.1186/s13040-025-00478-1","url":null,"abstract":"The endothelial tube formation assay is an established in vitro model for evaluating angiogenesis. Although widely used, quantification of angiogenic behavior in such assays remains semi-empirical and often lacks spatial, topological, and structural context. Here, we present a graph-theoretic framework to quantify network morphology, temporal dynamics, and spatial heterogeneity in tube formation assays. We simulated two distinct angiogenic network morphologies using human umbilical vein endothelial cells (HUVECs) seeded at two densities and imaged at 2, 4, and 18 h post-seeding. Skeletonized images were converted to mathematical graphs from which 11 graph-based metrics were extracted. This framework captured both morphological differences and temporal progression. Sparse networks exhibited significantly higher average node degree (p = 0.00079), clustering coefficient (p = 0.00109), and tortuosity (p = 0.0171), whereas dense networks showed greater node and edges counts (p = 0.00109). Over time, networks evolved from fragmented forms at 2 h to integrated structures at 18 h, as reflected by increased largest component size (p = 0.00216), connectivity index (p = 0.00216), and efficiency (p = 0.0152). ROC AUC analysis revealed that metrics such as average degree (AUC = 0.98) and clustering coefficient (AUC = 0.96) effectively distinguished between sparse and dense morphologies, while component-based metrics perfectly separated 2- and 18-hour networks (AUC = 1.00). Radial zone analysis revealed that vascular distribution becomes more compartmentalized over time, with increasing standard deviation and coefficient of variation. This approach provides a sensitive and scalable method for quantifying angiogenic dynamics, offering insight into both therapeutic efficacy and disease-related vascular remodeling.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"69"},"PeriodicalIF":6.1,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-regional radiomics: a novel framework for relationship-based feature extraction with validation in Parkinson's disease motor subtyping. 跨区域放射组学：一种基于关系的特征提取的新框架，并在帕金森病运动亚型中得到验证。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-29 DOI: 10.1186/s13040-025-00483-4

Mahboube Sadat Hosseini, Seyyed Mahmoud Reza Aghamiri, Mehdi Panahi

{"title":"Cross-regional radiomics: a novel framework for relationship-based feature extraction with validation in Parkinson's disease motor subtyping.","authors":"Mahboube Sadat Hosseini, Seyyed Mahmoud Reza Aghamiri, Mehdi Panahi","doi":"10.1186/s13040-025-00483-4","DOIUrl":"10.1186/s13040-025-00483-4","url":null,"abstract":"Traditional radiomics approaches focus on single-region feature extraction, limiting their ability to capture complex inter-regional relationships crucial for understanding pathophysiological mechanisms in complex diseases. This study introduces a novel cross-regional radiomics framework that systematically extracts relationship-based features between anatomically and functionally connected brain regions. We analyzed T1-weighted magnetic resonance imaging (MRI) data from 140 early-stage Parkinson's disease patients (70 tremor-dominant, 70 postural instability gait difficulty) from the Parkinson's Progression Markers Initiative (PPMI) database across multiple imaging centers. Eight bilateral motor circuit regions (putamen, caudate nucleus, globus pallidus, substantia nigra) were segmented using standardized atlases. Two feature sets were developed: 48 traditional single-region of interest (ROI) features and 60 novel motor-circuit features capturing cross-regional ratios, asymmetry indices, volumetric relationships, and shape distributions. Six feature engineering scenarios were evaluated using center-based 5-fold cross-validation with six machine learning classifiers to ensure robust generalization across different imaging centers. Motor-circuit features demonstrated superior performance compared to single-ROI features across enhanced preprocessing scenarios. Peak performance was achieved with area under the curve (AUC) of 0.821 ± 0.117 versus 0.650 ± 0.220 for single-ROI features (p = 0.0012, Cohen's d = 0.665). Cross-regional ratios, particularly putamen-substantia nigra relationships, dominated the most discriminative features. Motor-circuit features showed superior generalization across multi-center data and better clinical utility through decision curve analysis and calibration curves. The proposed cross-regional radiomics framework significantly outperforms traditional single-region approaches for Parkinson's disease motor subtype classification. This methodology provides a foundation for advancing radiomics applications in complex diseases where inter-regional connectivity patterns are fundamental to pathophysiology.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"67"},"PeriodicalIF":6.1,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proteome mining of Yersinia Enterocolitica for drug targets and computational inhibitor identification with ADMET, anti-inflammation potential and formulation characteristics. 小肠结肠炎耶尔森菌的蛋白质组挖掘药物靶点和ADMET计算抑制剂鉴定，抗炎潜力和配方特征。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-29 DOI: 10.1186/s13040-025-00482-5

Zarrin Basharat, Youssef Saeed Alghamdi, Mutaib M Mashraqi, Hanan A Ogaly, Fatimah A M Al-Zahrani, Calvin R Wei, Ibrar Ahmed, Seil Kim

{"title":"Proteome mining of Yersinia Enterocolitica for drug targets and computational inhibitor identification with ADMET, anti-inflammation potential and formulation characteristics.","authors":"Zarrin Basharat, Youssef Saeed Alghamdi, Mutaib M Mashraqi, Hanan A Ogaly, Fatimah A M Al-Zahrani, Calvin R Wei, Ibrar Ahmed, Seil Kim","doi":"10.1186/s13040-025-00482-5","DOIUrl":"10.1186/s13040-025-00482-5","url":null,"abstract":"Yersinia enterocolitica infection can manifest as self-limiting gastroenteritis and may lead to more severe conditions, such as mesenteric lymphadenitis, reactive arthritis, or rare systemic infections. Fluoroquinolones and third-generation cephalosporins are the most effective treatment options but tetracyclines and co-trimoxazole effectiveness may vary based on resistance patterns. To explore new therapeutic options in case of antibiotic resistance, we initially mined drug targets from the Yersinia enterocolitica proteome using a subtractive proteomics approach. Subsequently, we repurposed FDA approved & Traditional Chinese Medicinal (TCM) compounds against its cell wall synthesis mechanism by targeting DD-transpeptidase. DrugRep screening prioritized FDA-approved hits (Digitoxin, Irinotecan, Acetyldigitoxin; ≤ -9.4 kcal/mol) and TCM hits (Vaccarin, Narirutin, Hinokiflavone; ≤ -9.5 kcal/mol). Machine learning-based validation identified Hinokiflavone and Acetyldigitoxin as most potent binders. Molecular dynamics simulations (100 ns) revealed RMSD values < 1 nm for all complexes, indicating stable binding. ADMET profiling predicted all compounds as non-allergenic and TCM compounds having poor absorption. SBE-β-cyclodextrin coupling with FormulationAI showed improved compound solubility and oral bioavailability. InflamNat predicted strong anti-inflammatory potential for Hinokiflavone, highlighting its dual role in antibacterial and host-directed immunomodulatory activity. These computational insights mark an initial step in drug discovery, prompting comprehensive testing of prioritized compounds against Yersinia enterocolitica.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"68"},"PeriodicalIF":6.1,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoding ancestry-specific genetic risk: interpretable deep feature selection reveals prostate cancer SNP disparities in diverse populations. 解码祖先特异性遗传风险：可解释的深度特征选择揭示了前列腺癌SNP在不同人群中的差异。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-29 DOI: 10.1186/s13040-025-00470-9

Zhong Chen, Zichen Lao, You Lu, Wensheng Zhang, Andrea Edwards, Kun Zhang

{"title":"Decoding ancestry-specific genetic risk: interpretable deep feature selection reveals prostate cancer SNP disparities in diverse populations.","authors":"Zhong Chen, Zichen Lao, You Lu, Wensheng Zhang, Andrea Edwards, Kun Zhang","doi":"10.1186/s13040-025-00470-9","DOIUrl":"10.1186/s13040-025-00470-9","url":null,"abstract":"Background: The clinical potential of single nucleotide polymorphisms (SNPs) in prostate cancer (PCa) diagnosis has been extensively explored using conventional statistical and machine learning approaches. However, the predictive power and interpretability of these methods remain inadequate for clinical translation, primarily due to limited generalization across high-dimensional SNP datasets. This study addresses the contested diagnostic utility of SNPs by integrating interpretable feature selection with deep learning to enhance both classification performance and biological relevance.Methods: We propose an interpretable deep feature selection framework designed to enhance both the classification performance and biological relevance of SNP markers in distinguishing between benign and malignant prostate cancer samples. This study specifically investigates the debated diagnostic value of SNPs in PCa classification by integrating feature selection with deep learning to uncover actionable insights. Specifically, our framework comprises four key components: (1) Heuristic feature reduction, which eliminates irrelevant SNPs during gradient computation for training deep neural networks (DNNs); (2) Iterative SNP subset optimization, aiming at maximizing classification AUC during model training; (3) Gradient variance minimization, mitigating instability caused by limited sample sizes; and (4) Nonlinear interaction modeling, which extracts high-level SNP interactions through hierarchical representations.Results: Evaluated on the PLCO, BPC3, and MEC-AA datasets, our method achieved mean AUC scores of 0.747, 0.751, and 0.559, respectively, demonstrating statistically significant improvements (p < 0.05, a paired t-test) over existing approaches. Notably, the lower AUC for MEC-AA may reflect inherent population-specific complexities, as this dataset focuses on African American men, a group historically underrepresented in genomic studies. For interpretability, our framework identified 345, 373, and 437 consensus SNP markers across the PLCO, BPC3, and MEC-AA cohorts, respectively. Key SNPs were further validated against prior research on PCa racial disparities: rs10086908 and rs2273669 (PLCO); rs12284087, rs902774, rs9364554, and rs7611694 (BPC3); and rs3123078 and rs1447295 (MEC-AA) exhibited strong concordance with established loci linked to ethnic-specific risk profiles. For instance, rs1447295 on chromosome 8q24, recurrently associated with African ancestry, underscores the method's ability to recover population-relevant variants.Conclusion: By synergizing interpretable feature selection with deep learning, this work advances the translation of SNP-based biomarkers into clinically actionable tools while clarifying their contested diagnostic role in PCa.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"66"},"PeriodicalIF":6.1,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12481780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MoRFs_TransFuse: a MoRFs predictor based on multimodal feature fusion and the lightweight Transformer network. MoRFs_TransFuse：一个基于多模态特征融合和轻量级Transformer网络的morf预测器。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-29 DOI: 10.1186/s13040-025-00481-6

Lele Zhang, Hao He, Xuesen Shi

{"title":"MoRFs_TransFuse: a MoRFs predictor based on multimodal feature fusion and the lightweight Transformer network.","authors":"Lele Zhang, Hao He, Xuesen Shi","doi":"10.1186/s13040-025-00481-6","DOIUrl":"10.1186/s13040-025-00481-6","url":null,"abstract":"Molecular recognition features (MoRFs) can facilitate specific protein-protein interactions by undergoing disorder-to-order transitions when binding to their protein partners. Thus, it is essential to accurately predict MoRFs. In this paper, we propose an innovative MoRFs prediction method, named MoRFs_TransFuse, based on multimodal feature fusion and a lightweight Transformer network. To construct high-quality biological features, MoRFs_TransFuse innovatively integrates physicochemical properties, evolutionary features, and pre-trained model embeddings, while retaining optimal feature combinations through multi-window extraction and Random Forest secondary screening. In terms of architecture, MoRFs_TransFuse overcomes the limitations of modeling long-range dependencies by using a self-attention mechanism to accurately capture long-range residue associations in protein sequences. Comparative experiments on benchmark datasets show that MoRFs_TransFuse significantly outperforms existing single component and combined component predictors. Additionally, the lightweight design greatly improves computational efficiency while ensuring prediction accuracy.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"65"},"PeriodicalIF":6.1,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering. 通过纵向聚类分析脓毒症患者的时间表型和预后分层。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-26 DOI: 10.1186/s13040-025-00480-7

Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini

{"title":"Temporal phenotyping and prognostic stratification of patients with sepsis through longitudinal clustering.","authors":"Patrizia Ribino, Maria Mannone, Claudia Di Napoli, Giovanni Paragliola, Davide Chicco, Francesca Gasparini","doi":"10.1186/s13040-025-00480-7","DOIUrl":"10.1186/s13040-025-00480-7","url":null,"abstract":"Sepsis is a critical medical condition characterized by a highly variable and rapidly evolving clinical course, often necessitating early intervention and tailored treatment plans to improve patient outcomes. Due to its complexity and heterogeneity, understanding the progression of sepsis across different patient populations remains a significant challenge. In this study, we exploit a sophisticated analytical framework based on k-means multivariate longitudinal clustering to capture the diverse trajectories of sepsis. We do so by analyzing multiple clinical parameters tracked over time, providing a nuanced view of disease progression. By incorporating Dynamic Time Warping (DTW) as the distance metric, the proposed method effectively accounts for temporal misalignments and variability in the rate of disease progression, an essential capability given the unpredictable and heterogeneous nature of sepsis. This integration enhances the model's ability to detect distinct temporal patterns and phenotypic subgroups that may remain undetected using conventional analytical approaches. By leveraging sepsis-related electronic health records (EHRs), which provide rich time-series data on laboratory results along with patient demographics and underlying health conditions, the proposed method reveals distinct sepsis phenotypes that reflect variations in disease progression. We perform several experiments varying the number of clusters and clinical variable combinations, evaluating the clustering performances using Silhouette score, Caliski-Harabasz Index, and Davies-Bouldin Index, as reference quality metrics. Our results confirm the prognostic role of the Thrombin-Antigen complex and the Prothrombin Time-International Normalized Ratio for septic patients. Furthermore, to evaluate the relevance of subjects' stratification, the Adjusted Rand Index metric is used to quantify the survival prediction capability of our longitudinal clustering method, considering the 28-day death feature as the target variable. The same metric demonstrates that our proposal outperforms other longitudinal clustering algorithms available in the literature.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"64"},"PeriodicalIF":6.1,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145179832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study. 基于机器学习的预测失代偿肝硬化患者早期再入院模型的构建和验证：一项前瞻性双中心队列研究。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-24 DOI: 10.1186/s13040-025-00479-0

Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun

{"title":"Construction and validation of a machine learning-based model predicting early readmission in patients with decompensated cirrhosis: a prospective two-center cohort study.","authors":"Fang Yang, Jia Li, Ziyi Yang, Liping Wu, Han Wang, Chao Sun","doi":"10.1186/s13040-025-00479-0","DOIUrl":"10.1186/s13040-025-00479-0","url":null,"abstract":"Background: Early 30-day readmission remains a significant burden on the socioeconomic and healthcare system in the context of decompensated cirrhosis. Early recognition and accurate identification are crucial. However, current evidence is elusive and traditional scores concerning liver disease severity are lacking specificity and sensitivity. We sought to construct and validate an explainable machine learning (ML)-based prediction model, and evaluate its prognostic implementation in patients readmitted due to acute episodes. The prediction model for discovery and validation was based on a two-center prospective investigation. Our discovery sample, comprising 636 patients with cirrhosis, was divided into a training set and a test set, with an additional cohort of 150 patients serving as an external validation. Eleven ML methods were performed to establish an indicative model based on a variety of easily accessible and obtainable variables from the electronic health record. The area under the ROC curve (AUC), alongside several evaluation parameters, was used for comparison regarding predictive performance. Considering feature importance and final model explanation, we adopted the SHapley Additive exPlanation method for ranking. Furthermore, prognostic implementation was verified by subgrouping according to the final model and clinical outcomes during follow-up.Results: Among all 11 ML algorithms, the random forest (RF) algorithm represented the best discriminatory capability. Processing feature reduction generated a final 7-feature RF model with explainability based on the importance ranking. Our constructed model was of moderately accurate prediction pertaining to internal and external validations, with respective AUCs of 0.853 and 0.838, which was further transformed into an online tool to facilitate daily practice. Patients positively adjudged by the prediction model had aggravated underlying disease severity and poor psychophysiologic reservation.Conclusions: The final explainable ML model was capable of predicting early readmission and was closely connected with adverse outcomes in individual patients experiencing decompensated cirrhosis. Notably, it allayed the \"black-box\" concerns inherent to ML techniques with an indirect interpretation.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"63"},"PeriodicalIF":6.1,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145139194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating causal effects of HDL-C on cognitive function through cross-sectional and Mendelian randomization analyses: concentration-response patterns and clues for Alzheimer's disease prevention. 通过横断面和孟德尔随机化分析调查HDL-C对认知功能的因果影响：浓度-反应模式和阿尔茨海默病预防的线索

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-22 DOI: 10.1186/s13040-025-00484-3

Longmin Fan, Haitao Jiang, Zheyu Zhang

{"title":"Investigating causal effects of HDL-C on cognitive function through cross-sectional and Mendelian randomization analyses: concentration-response patterns and clues for Alzheimer's disease prevention.","authors":"Longmin Fan, Haitao Jiang, Zheyu Zhang","doi":"10.1186/s13040-025-00484-3","DOIUrl":"10.1186/s13040-025-00484-3","url":null,"abstract":"Background: Disrupted cholesterol homeostasis may accelerate cognitive aging. This study investigated the relationship between serum HDL-C levels and cognitive function, utilizing cross-sectional data and Mendelian randomization (MR) analysis.Methods: A cross-sectional study was conducted using data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014, including 19,931 participants. Among them, 2,777 individuals aged 60 years and older with complete HDL-C levels and cognitive function data were included. Cognitive function was assessed using tests such as the Consortium to Establish a Registry for Alzheimer's Disease Immediate and Delayed Recall, the Animal Fluency Test, and the Digit Symbol Substitution Test. Additionally, MR analysis was employed to assess the causal relationship between genetically predicted HDL-C and dementia.Results: Gender-stratified analyses revealed sex-specific patterns in the relationship between HDL-C and cognitive function. In fully adjusted linear models, men showed consistently positive associations across all cognitive domains, including delayed recall (β = 0.10, 95% CI 0.04-0.17, p < 0.001), immediate recall (β = 0.06, 95% CI 0.00-0.12, p = 0.047), verbal fluency (β = 0.20, 95% CI 0.14-0.26, p < 0.001), processing speed (β = 0.09, 95% CI 0.05-0.14, p < 0.001), and overall composite score (β = 0.45, 95% CI 0.29-0.62, p < 0.001). In women, these associations were attenuated or non-significant for immediate recall, delayed recall, and composite cognition, suggesting non-linearity. Further concentration-response analyses revealed a linear positive association in men and an inverted U-shaped relationship in women. MR analyses indicated a protective association between genetically predicted HDL-C and Alzheimer's disease risk (OR = 0.51, 95% CI 0.29-0.89, p = 0.019). However, sensitivity analyses revealed attenuation after MR-PRESSO outlier correction (β=-0.013, p = 0.756), and inconsistent estimates across methods, with significant heterogeneity (Q-test p < 0.001) and evidence of pleiotropy. In multivariable analysis, adjusting for LDL-C and TG, IVW (β = 0.290, p = 0.048) and Lasso regression (β = 0.752, p = 0.008) indicated weak positive correlations. However, MR-Egger (β = 0.752, p = 0.008) revealed potential pleiotropic interference (intercept p = 0.050).Conclusions: Our findings suggest that maintaining optimal serum HDL-C levels may help preserve cognitive function in older adults. Notably, sex-specific associations were observed, warranting further investigation into underlying mechanisms.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"62"},"PeriodicalIF":6.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12455801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach. 基于密度的聚类方法识别SARS-CoV-2严重相关突变热点

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-09-01 DOI: 10.1186/s13040-025-00476-3

Sohyun Youn, Dabin Jeong, Hwijun Kwon, Eonyong Han, Sun Kim, Inuk Jung

{"title":"Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach.","authors":"Sohyun Youn, Dabin Jeong, Hwijun Kwon, Eonyong Han, Sun Kim, Inuk Jung","doi":"10.1186/s13040-025-00476-3","DOIUrl":"10.1186/s13040-025-00476-3","url":null,"abstract":"Background: The immune response to SARS-CoV-2 varies greatly among individuals yielding highly varying severity levels among the patients. While there are various methods to spot severity associated biomarkers in COVID-19 patients, we investigated highly mutated regions, or mutation hotspots, within the SARS-CoV-2 genome that correlate with patient severity levels. SARS-CoV-2 mutation hotspots were searched in the GISAID database using a density based clustering algorithm, Mutclust, that searches for loci with high mutation density and diversity.Results: Using Mutclust, 477 mutation hotspots were searched in the SARS-CoV-2 genome, of which 28 showed significant association with severity levels in a multi-omics COVID-19 cohort comprised of 387 infected patients. The patients were further stratified into moderate and severe patient groups based on the 28 severity related mutation hotspots that showed distinctive cytokine and gene expression levels in both cytokine profile and single-cell RNA-seq samples. The effect of the SARS-CoV-2 mutation hotspots on human genes was further investigated by network propagation analysis, where two mutation hotspots specific to the severe group showed association with NK cell activity. One of them showed to decrease the affinity between the viral epitope of the hotspot region and its binding HLA when compared to the non-mutated epitope.Conclusion: Genes related to the immunological function of NK cells, especially the NK cell receptor and co-activating receptor genes, were significantly dysregulated in the severe patient group in both cytokine and single-cell levels. Collectively, mutation hotspots associated with severity and their related NK cell associated gene expression regulation were identified.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"61"},"PeriodicalIF":6.1,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400602/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144975608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0