Biodata Mining最新文献_第2页

Improving classification on imbalanced genomic data via KDE-based synthetic sampling. 基于kde的合成采样改进不平衡基因组数据分类。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-29 DOI: 10.1186/s13040-025-00474-5

Edoardo Taccaliti, Jesus S Aguilar-Ruiz

{"title":"Improving classification on imbalanced genomic data via KDE-based synthetic sampling.","authors":"Edoardo Taccaliti, Jesus S Aguilar-Ruiz","doi":"10.1186/s13040-025-00474-5","DOIUrl":"10.1186/s13040-025-00474-5","url":null,"abstract":"Class imbalance poses a serious challenge in biomedical machine learning, particularly in genomics, where datasets are characterized by extremely high dimensionality and very limited sample sizes. In such settings, standard classifiers tend to favor the majority class, leading to biased predictions - an especially problematic issue in clinical diagnostics where rare conditions must not be overlooked. In this study, we introduce a Kernel Density Estimation (KDE)-based oversampling approach to rebalance imbalanced genomic datasets by generating synthetic minority class samples. Unlike conventional methods such as SMOTE, KDE estimates the global probability distribution of the minority class and resamples accordingly, avoiding local interpolation pitfalls. We evaluate our method on 15 real-world genomic datasets using three classifiers -Naïve Bayes, Decision Trees, and Random Forests- and compare it to SMOTE and baseline training. Experimental results demonstrate that KDE oversampling consistently improves classification performance, especially in metrics robust to imbalance, such as AUC of the IMCP curve. Notably, KDE achieves superior results in tree-based models while dramatically simplifying the sampling process. This approach offers a statistically grounded and effective solution for balancing genomic datasets, with strong potential for improving fairness and accuracy in high-stakes medical decision-making.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"60"},"PeriodicalIF":6.1,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144975628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of an AI-powered AR glasses system for real-time first aid guidance in emergency situations. 开发人工智能增强现实眼镜系统，用于紧急情况下的实时急救指导。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-26 DOI: 10.1186/s13040-025-00473-6

Mohammed Abo-Zahhad, Mostafa N Zakaria, Farida M Sharaf, May M Ismaiel, Habiba Hafrag, Yousef M Amer

引用次数: 0

Mapping the evolving trend of research on efferocytosis: a comprehensive data-mining-based study. 绘制出红细胞增生研究的发展趋势：一项基于数据挖掘的综合研究。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-25 DOI: 10.1186/s13040-025-00475-4

Yanpeng Jian, Shijia Dong, Weijie Liu, Genfeng Li, Xiaoyu Lian, Yigong Wang

{"title":"Mapping the evolving trend of research on efferocytosis: a comprehensive data-mining-based study.","authors":"Yanpeng Jian, Shijia Dong, Weijie Liu, Genfeng Li, Xiaoyu Lian, Yigong Wang","doi":"10.1186/s13040-025-00475-4","DOIUrl":"10.1186/s13040-025-00475-4","url":null,"abstract":"Background: Efferocytosis, the process by which apoptotic cells are recognized and removed by phagocytes, plays a critical role in maintaining tissue homeostasis and modulating inflammatory responses. Over recent decades, an increasing number of studies have investigated the molecular mechanisms and clinical implications of efferocytosis. This bibliometric analysis aims to map the evolving trends, identify key contributors, and outline emerging research themes in this field.Methods: A comprehensive search was conducted in Web of Science database, to collect literature related to efferocytosis from 2006 to 2024. The dataset was analyzed using several tools such as CiteSpace and VOSviewer. Analyses included evaluation of publication trends, citation networks, keyword co-occurrence, and co-cited references. Key metrics such as the most prolific authors, top contributing countries, and major research clusters were identified to understand the field's evolution and interdisciplinary collaborations.Results: The final dataset comprised 1549 scholarly works, consisting of 1166 original research articles and 383 review papers. The analysis revealed a steady increase in the number of publications concerning efferocytosis, particularly in the past decade. Geographically, China and the United States emerged as dominant contributors, representing over 64.4% of total publications. Among institutions, Harvard University demonstrated the highest research output in this field. Keyword analysis demonstrated the current research focus including molecular mechanisms and signaling regulation of efferocytosis, macrophage polarization and inflammatory modulation, pathological implications and therapeutic potential of efferocytosis in diseases. Inflammation, atherosclerosis, cardiovascular disease, myocardial infarction, and COPD are diseases that has received the most attention in this field. Several research topics including nanoparticle, neuroinflammation, fibrosis, immunometabolism, exosomes, apoptotic bodies, mesenchymal stem cells, aging, microglia, reactive oxygen species, CD47, lipid metabolism, immunotherapy, mitochondria, ferroptosis, may have great potential to be hot topics in the near future. Gene-focused investigations identified TNF, MERTK, IL10, LI6, and IL1b as the most extensively studied genetic elements in efferocytosis research.Conclusions: This bibliometric study provides a comprehensive overview of the evolving research landscape in efferocytosis. These insights not only highlight the current milestones but also serve as a valuable guide for future research and policy-making aimed at harnessing efferocytosis for therapeutic innovations.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"58"},"PeriodicalIF":6.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12376401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144975622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The application of artificial intelligence models in predicting the risk of diabetic foot: a multicenter study. 人工智能模型在预测糖尿病足风险中的应用：一项多中心研究。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-21 DOI: 10.1186/s13040-025-00477-2

Yao Li, Siyuan Zhou, Bichen Ren, Shuai Ju, Xiaoyan Li, Wenqiang Li, Bingzhe Li, Yunmin Cai, Chunlei Chang, Lihong Huang, Zhihui Dong

{"title":"The application of artificial intelligence models in predicting the risk of diabetic foot: a multicenter study.","authors":"Yao Li, Siyuan Zhou, Bichen Ren, Shuai Ju, Xiaoyan Li, Wenqiang Li, Bingzhe Li, Yunmin Cai, Chunlei Chang, Lihong Huang, Zhihui Dong","doi":"10.1186/s13040-025-00477-2","DOIUrl":"10.1186/s13040-025-00477-2","url":null,"abstract":"This study explores diabetic foot (DF), a severe complication in diabetes, by combining deep learning (DL) and machine learning (ML) to develop a multi-model prediction tool. Early identification of high-risk DF patients can reduce disability and mortality. The research also aims to create an integrated application to assist clinicians in precise, efficient risk assessment for early intervention. In this multicenter retrospective study, 6,180 elderly diabetic patients (aged 60-85) were enrolled from 11 community hospitals in Shanghai in 2024. Lasso regression was used to identify 16 key DF risk factors, including age, MMSE score, lower limb discomfort, ABI, and hematocrit. Fourteen ML models (RF, XGBoost, CART, MLP, etc.) and three DL models (DNN, CNN, Transformer) were trained, with hyperparameters optimized via cross-validation and grid search. An application was developed integrating these models, offering both single and batch prediction options with visualization tools for clinical use.Experimental results showed the Logistic regression ensemble model achieved robust performance, with AUC values of 0.943 (validation set, 95% CI: 0.935-0.951) and 0.938 (test set, 95% CI: 0.929-0.947), along with high accuracy, precision, recall, and F1 scores. SHAP analysis revealed key predictive features including ABI results, lower limb discomfort, and MMSE score. The developed app integrates multiple models, compares their predictions for different clinical scenarios, and enhances prediction transparency and reliability.The multi-model approach demonstrates strong predictive performance for DF risk, offering clinicians an intuitive and accurate assessment tool tailored to individual patients. By combining multiple models, we enhance result stability and clinical applicability compared to single-model approaches. Future work will focus on algorithm optimization, expanded datasets, and real-time monitoring integration to enable more precise, dynamic risk evaluation for improved DF prevention and early intervention.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"57"},"PeriodicalIF":6.1,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372307/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144975599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics. 生物统计学中学生t检验、Mann-Whitney U检验、卡方检验和Kruskal-Wallis检验的简单使用指南。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-20 DOI: 10.1186/s13040-025-00465-6

Davide Chicco, Andrea Sichenze, Giuseppe Jurman

{"title":"A simple guide to the use of Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test in biostatistics.","authors":"Davide Chicco, Andrea Sichenze, Giuseppe Jurman","doi":"10.1186/s13040-025-00465-6","DOIUrl":"10.1186/s13040-025-00465-6","url":null,"abstract":"In an age when machine learning and artificial intelligence are broadly employed, traditional statistics can still provide insightful information and results quickly and at a low computational cost. Statistics, in fact, offers many useful tools to researchers, including a series of univariate statistical tests that can identify relationships between pairs of numeric samples: Student's t-test, Mann-Whitney U test, Chi-squared test, and Kruskal-Wallis test. These tests generate several outcomes, including probability values (p-values) that can express a numerical quantity which accepts or rejects the null hypothesis, based on a certain threshold used. Although effective, these tests are often misused or employed in the wrong contexts, especially among biostatistics studies. Many scientific researchers do not seem to know how to choose one test over the others, and this misuse can lead to incorrect results and wrong conclusions. Here we present a simple theoretical and practical guide to the use of these four tests, first describing their theoretical properties and then displaying the results obtained by applying these tests to real-world medical datasets. Eventually, we explain when and how to use each test based on the data types of the samples considered. Our study can have a strong impact on scientific research by potentially influencing future studies involving these tests. Our recommendations, in turn, can help researchers produce more reliable and sound scientific results, thus increasing the quality of multiple scientific studies across various fields.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"56"},"PeriodicalIF":6.1,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144975644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skin in the game: a review of computational models of the skin. 游戏中的皮肤：皮肤的计算模型回顾。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-19 DOI: 10.1186/s13040-025-00471-8

Seda Ceylan, Didem Demir, Cayla Harris, Semih Latif İpek, Vasileios Vavourakis, Marco Manca, Sandrine Dubrac, Roman Bauer

{"title":"Skin in the game: a review of computational models of the skin.","authors":"Seda Ceylan, Didem Demir, Cayla Harris, Semih Latif İpek, Vasileios Vavourakis, Marco Manca, Sandrine Dubrac, Roman Bauer","doi":"10.1186/s13040-025-00471-8","DOIUrl":"10.1186/s13040-025-00471-8","url":null,"abstract":"With the vast advances in computing technology, computational (or in silico) modelling has emerged as a transformative tool in dermatology. These findings can provide novel insights into complex biological processes and aid in the development of innovative therapeutic and regenerative strategies for the skin. Modelling combines experimental data and knowledge across multiple disciplines, serving as a common framework to elucidate the workings of the skin. From a biomedical perspective, the mechanisms of skin diseases can be studied by simulating cellular interactions and signalling pathways. Computational investigations of these mechanisms can be categorised into two distinct approaches: data-driven and model-based. Data-driven approaches allow the diagnosis of skin diseases on the basis of data collection via imaging or feedback from portable sensors, often yielding performance exceeding that of their human counterparts. Model-based methods are well suited to address topics such as skin cell biology and biomechanics, contributing to wound healing and skin cancer research. Furthermore, such modelling has found utility in the development of virtual skin models and skin-on-chip devices, enabling the prediction of skin responses to various substances, including cosmetics and drugs. In the realm of dermatological surgery, computational tools have been instrumental in optimizing surgical planning and improving clinical outcomes. While significant advancements have been made, challenges such as data availability, model validation, and interdisciplinary collaboration persist. This review highlights the current state-of-the-art in computational modeling in dermatology, identifies key challenges, and outlines its prospects.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"55"},"PeriodicalIF":6.1,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366154/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the common genetic basis of metabolic syndrome-related diseases and chronic kidney disease: insights from extensive genome-wide cross-trait analyses. 探索代谢综合征相关疾病和慢性肾脏疾病的共同遗传基础：来自广泛的全基因组交叉性状分析的见解

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-17 DOI: 10.1186/s13040-025-00472-7

Yu Yin, Chenkai Zhao, Yibo Hua, Fei Yang, Dandan Qiu, Jiasheng Yan, Xiaodong Jin

{"title":"Exploring the common genetic basis of metabolic syndrome-related diseases and chronic kidney disease: insights from extensive genome-wide cross-trait analyses.","authors":"Yu Yin, Chenkai Zhao, Yibo Hua, Fei Yang, Dandan Qiu, Jiasheng Yan, Xiaodong Jin","doi":"10.1186/s13040-025-00472-7","DOIUrl":"10.1186/s13040-025-00472-7","url":null,"abstract":"Background: Chronic kidney disease (CKD) is a globally prevalent chronic condition characterized by progressive renal function decline, imposing significant economic and psychological burdens on patients. Metabolic syndrome (MetS), characterized by obesity, hypertension, hyperglycemia, and dyslipidemia, is a significant risk factor for CKD. A strong epidemiological association exists between CKD and MetS. This study explores the genetic connections between MetS-related diseases and CKD, focusing on identifying shared risk loci, key tissues, and underlying genetic mechanisms.Methods: We performed a cross-trait pleiotropy analysis using summary-level GWAS data from ten MetS-related diseases and CKD obtained from the IEU database to detect shared pleiotropic loci and genes. Functional annotation and tissue-specific analyses were conducted to reveal potential associations between CKD and MetS. Additionally, we used metabolite colocalization methods to explore the metabolic perspective of these diseases' associations. Finally, Mendelian randomization (MR) was employed for further association analysis.Results: The study identified shared genetic mechanisms between mental disorders and prostatitis, revealing 1,437 pleiotropic loci at genome-wide significance. Forty-four dominant risk SNP loci were annotated, with 11 loci confirmed through causal colocalization analysis. Further gene-level analysis identified eight unique pleiotropic genes, including APOC1, APOE, BICC1, and PDILT. Pathway analysis identified the significant involvement of the Metabolism of Fat-Soluble Vitamins, Positive Regulation of Plasma Membrane-Bounded Cell Projection Assembly, and Positive Regulation of RNA Metabolic Process pathways in these diseases. Tissue enrichment analyses at the SNP and gene levels indicated that pleiotropic mechanisms play crucial roles in the Adipose Visceral Omentum, Brain Cerebellum, and Testis. Ultimately, phenotypic-level metabolite colocalization analysis revealed a metabolic intermediary mechanism linking MetS-related diseases and CKD.Conclusion: This study uncovers the complex genetic interactions between CKD and MetS-related diseases, identifying shared genetic loci and biological pathways, providing novel insights for future therapeutic strategies.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"54"},"PeriodicalIF":6.1,"publicationDate":"2025-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12359972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Short- and long-term weekly patient-reported outcomes prediction undergoing radiotherapy: single-patient time series model vs. transformer-based multi-patient time series model. 放疗患者每周报告的短期和长期预后预测：单患者时间序列模型与基于变压器的多患者时间序列模型

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-12 DOI: 10.1186/s13040-025-00464-7

Yang Yan, Zhong Chen, Xinglei Shen, Ronald C Chen, Hao Gao

{"title":"Short- and long-term weekly patient-reported outcomes prediction undergoing radiotherapy: single-patient time series model vs. transformer-based multi-patient time series model.","authors":"Yang Yan, Zhong Chen, Xinglei Shen, Ronald C Chen, Hao Gao","doi":"10.1186/s13040-025-00464-7","DOIUrl":"10.1186/s13040-025-00464-7","url":null,"abstract":"Background: Patient-reported outcomes (PROs) are direct reports from patients on health status, symptoms, quality of life, or treatment satisfaction, offering critical insights into subjective experiences that clinical metrics may overlook. Accurately predicting personalized short- and long-term weekly PROs during radiotherapy is essential for monitoring health status, optimizing treatment efficacy, and enabling timely interventions to manage side effects.Methods: Based on the well-documented prostate cancer PRO dataset with 17 patients after pre-processing, this study evaluates single-patient time series models (i.e., vector autoregression (VAR) and VAR with incremental ground truth PRO data (VAR-Inc)) and a transformer-based multi-patient model (i.e., Temporal Fusion Transformer (TFT)) for short- and long-term weekly PRO prediction. VAR-Inc integrates follow-up PRO data to refine predictions, while TFT leverages multi-patient heterogeneous information to capture complex temporal patterns.Results: Key experimental results on prostate cancer patients demonstrate that (1) VAR-Inc demonstrated superior performance (lower MAE/RMSE) over VAR, highlighting the importance of incremental PRO updates. (2) TFT significantly outperformed both VAR models in long-term prediction, with statistical significance, by utilizing multi-patient data. (3) TFT effectively captured weekly PRO trends and variations, aligning closely with ground truth. (4) Unlike single-patient models, TFT built robust predictive frameworks by integrating cross-patient similarities and complementary patients' PRO information. VAR-Inc's performance deteriorated with missing follow-up PROs, whereas TFT remained stable, overcoming this limitation. On average, TFT outperforms VAR and VAR-Inc by achieving a lowest MAE 0.7715, while the MAE of VAR and VAR-Inc are 1.1329 and 0.8089, respectively. Furthermore, TFT is superior to VAR and VAR-Inc by achieving a lowest RMSE 0.9586, while the RMSE of VAR and VAR-Inc are 1.4817 and 1.0693, respectively.Conclusion: TFT emerges as a reliable approach for PRO prediction, excelling in long-term accuracy, trend capture, and resilience to data gaps by leveraging multi-patient information. Its ability to synthesize heterogeneous PRO data offers advantages over single-patient models, supporting personalized treatment adaptation and informed clinical decision-making. This underscores the potential of transformer-based models in enhancing PRO-driven radiotherapy management.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"53"},"PeriodicalIF":6.1,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341308/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exo-Tox: Identifying Exotoxins from secreted bacterial proteins. 外毒素：从分泌的细菌蛋白中鉴定外毒素。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-08 DOI: 10.1186/s13040-025-00469-2

Tanja Krueger, Damla A Durmaz, Luisa F Jimenez-Soto

{"title":"Exo-Tox: Identifying Exotoxins from secreted bacterial proteins.","authors":"Tanja Krueger, Damla A Durmaz, Luisa F Jimenez-Soto","doi":"10.1186/s13040-025-00469-2","DOIUrl":"10.1186/s13040-025-00469-2","url":null,"abstract":"Background: Bacterial exotoxins are secreted proteins able to affect target cells, and associated with diseases. Their accurate identification can enhance drug discovery and ensure the safety of bacteria-based medical applications. However, current toxin predictors prioritize broad coverage by mixing toxins from multiple biological kingdoms and diverse control sets. This general approach has proven sub-optimal for identifying niche toxins, such as bacterial exotoxins. Recent Protein Language Models offer an opportunity to improve toxin prediction by capturing global sequence context and biochemical properties from protein sequences.Results: We introduce Exo-Tox, a specialized predictor trained exclusively on curated datasets of bacterial exotoxins and secreted non-toxic bacterial proteins, represented as embeddings by Protein Language Models. Compared to Basic Local Alignment Search Tool (BLAST)-based methods and generalized toxin predictors, Exo-Tox outperforms across multiple metrics, achieving a Matthews correlation coefficient > 0.9. Notably, Exo-Tox's performance remains robust regardless of protein length or the presence of signal peptides. We analyze its limited transferability to bacteriophage proteins and non-secreted proteins.Conclusion: Exo-Tox reliably identifies bacterial exotoxins, filling a niche overlooked by generalized predictors. Our findings highlight the importance of domain-specific training data and emphasize that specialized predictors are necessary for accurate classification. We provide open access to the model, training data, and usage guidelines via the LMU Munich Open Data repository.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"52"},"PeriodicalIF":6.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12333140/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144805140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Drug repurposing for Alzheimer's disease using a graph-of-thoughts based large language model to infer drug-disease relationships in a comprehensive knowledge graph. 使用基于思想图的大型语言模型在综合知识图中推断药物-疾病关系，对阿尔茨海默病进行药物再利用。

IF 6.1 3区生物学

Biodata Mining Pub Date : 2025-08-05 DOI: 10.1186/s13040-025-00466-5

Zhiping Paul Wang, Xi Li, Nicholas Matsumoto, Mythreye Venkatesan, Jui-Hsuan Chang, Jay Moran, Hyunjun Choi, Binglan Li, Yufei Meng, Miguel E Hernandez, Jason H Moore

{"title":"Drug repurposing for Alzheimer's disease using a graph-of-thoughts based large language model to infer drug-disease relationships in a comprehensive knowledge graph.","authors":"Zhiping Paul Wang, Xi Li, Nicholas Matsumoto, Mythreye Venkatesan, Jui-Hsuan Chang, Jay Moran, Hyunjun Choi, Binglan Li, Yufei Meng, Miguel E Hernandez, Jason H Moore","doi":"10.1186/s13040-025-00466-5","DOIUrl":"10.1186/s13040-025-00466-5","url":null,"abstract":"Drug repurposing (DR) offers a promising alternative to the high cost and low success rate of traditional drug development, especially for complex diseases like Alzheimer's disease (AD). This study addressed DR for AD from three key angles: (1) demonstrating how disease-specific knowledge graphs can improve DR performance, (2) evaluating the role of large language models (LLMs) in enhancing the usability and efficiency of these graphs, and (3) assessing whether Graph-of-Thoughts (GoT)-enhanced LLMs, when integrated with AD knowledge graphs, can outperform traditional machine learning and LLM-based approaches. We tested five distinct DR strategies (DR1-DR5) for AD: DR1, a machine learning method using TxGNN; DR2, a machine learning model leveraging the Alzheimer's KnowledgeBase (AlzKB); DR3, an LLM-based chatbot built on AlzKB; DR4, our ESCARGOT framework combining GoT-enhanced LLMs with AlzKB; and DR5, a general reasoning-driven LLM approach. Results showed that AlzKB significantly improved DR outcomes. ESCARGOT further enhanced performance while reducing the need for coding or advanced expertise in knowledge graph analysis. Because the architecture of AlzKB is easily adaptable to other diseases and ESCARGOT can integrate with various knowledge graph platforms, this framework offers a broadly applicable, innovative tool for accelerating drug discovery through repurposing.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"51"},"PeriodicalIF":6.1,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12326721/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144790506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0