Biodata MiningPub Date : 2025-01-04DOI: 10.1186/s13040-024-00414-9
Xingyu Li, Lu Peng, Yu-Ping Wang, Weihua Zhang
{"title":"Open challenges and opportunities in federated foundation models towards biomedical healthcare.","authors":"Xingyu Li, Lu Peng, Yu-Ping Wang, Weihua Zhang","doi":"10.1186/s13040-024-00414-9","DOIUrl":"10.1186/s13040-024-00414-9","url":null,"abstract":"<p><p>This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) in biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions. The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for healthcare innovations.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"2"},"PeriodicalIF":4.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142928515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-30DOI: 10.1186/s13040-024-00417-6
Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa
{"title":"Correction: Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.","authors":"Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa","doi":"10.1186/s13040-024-00417-6","DOIUrl":"10.1186/s13040-024-00417-6","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"62"},"PeriodicalIF":4.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-28DOI: 10.1186/s13040-024-00413-w
Zhendong Sha, Philip J Freda, Priyanka Bhandary, Attri Ghosh, Nicholas Matsumoto, Jason H Moore, Ting Hu
{"title":"Distinct network patterns emerge from Cartesian and XOR epistasis models: a comparative network science analysis.","authors":"Zhendong Sha, Philip J Freda, Priyanka Bhandary, Attri Ghosh, Nicholas Matsumoto, Jason H Moore, Ting Hu","doi":"10.1186/s13040-024-00413-w","DOIUrl":"10.1186/s13040-024-00413-w","url":null,"abstract":"<p><strong>Background: </strong>Epistasis, the phenomenon where the effect of one gene (or variant) is masked or modified by one or more other genes, significantly contributes to the phenotypic variance of complex traits. Traditionally, epistasis has been modeled using the Cartesian epistatic model, a multiplicative approach based on standard statistical regression. However, a recent study investigating epistasis in obesity-related traits has identified potential limitations of the Cartesian epistatic model, revealing that it likely only detects a fraction of the genetic interactions occurring in natural systems. In contrast, the exclusive-or (XOR) epistatic model has shown promise in detecting a broader range of epistatic interactions and revealing more biologically relevant functions associated with interacting variants. To investigate whether the XOR epistatic model also forms distinct network structures compared to the Cartesian model, we applied network science to examine genetic interactions underlying body mass index (BMI) in rats (Rattus norvegicus).</p><p><strong>Results: </strong>Our comparative analysis of XOR and Cartesian epistatic models in rats reveals distinct topological characteristics. The XOR model exhibits enhanced sensitivity to epistatic interactions between the network communities found in the Cartesian epistatic network, facilitating the identification of novel trait-related biological functions via community-based enrichment analysis. Additionally, the XOR network features triangle network motifs, indicative of higher-order epistatic interactions. This research also evaluates the impact of linkage disequilibrium (LD)-based edge pruning on network-based epistasis analysis, finding that LD-based edge pruning may lead to increased network fragmentation, which may hinder the effectiveness of network analysis for the investigation of epistasis. We confirmed through network permutation analysis that most XOR and Cartesian epistatic networks derived from the data display distinct structural properties compared to randomly shuffled networks.</p><p><strong>Conclusions: </strong>Collectively, these findings highlight the XOR model's ability to uncover meaningful biological associations and higher-order epistasis derived from lower-order network topologies. The introduction of community-based enrichment analysis and motif-based epistatic discovery emphasize network science as a critical approach for advancing epistasis research and understanding complex genetic architectures.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"61"},"PeriodicalIF":4.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11681696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142899656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-24DOI: 10.1186/s13040-024-00416-7
Dani Livne, Sol Efroni
{"title":"Pathway metrics accurately stratify T cells to their cells states.","authors":"Dani Livne, Sol Efroni","doi":"10.1186/s13040-024-00416-7","DOIUrl":"10.1186/s13040-024-00416-7","url":null,"abstract":"<p><p>Pathway analysis is a powerful approach for elucidating insights from gene expression data and associating such changes with cellular phenotypes. The overarching objective of pathway research is to identify critical molecular drivers within a cellular context and uncover novel signaling networks from groups of relevant biomolecules. In this work, we present PathSingle, a Python-based pathway analysis tool tailored for single-cell data analysis. PathSingle employs a unique graph-based algorithm to enable the classification of diverse cellular states, such as T cell subtypes. Designed to be open-source, extensible, and computationally efficient, PathSingle is available at https://github.com/zurkin1/PathSingle under the MIT license. This tool provides researchers with a versatile framework for uncovering biologically meaningful insights from high-dimensional single-cell transcriptomics data, facilitating a deeper understanding of cellular regulation and function.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"60"},"PeriodicalIF":4.0,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11668091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-18DOI: 10.1186/s13040-024-00411-y
Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes
{"title":"Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification.","authors":"Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes","doi":"10.1186/s13040-024-00411-y","DOIUrl":"10.1186/s13040-024-00411-y","url":null,"abstract":"<p><p>Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"56"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prognostic feature based on androgen-responsive genes in bladder cancer and screening for potential targeted drugs.","authors":"Jiang Zhao, Qian Zhang, Cunle Zhu, Wu Yuqi, Guohui Zhang, Qianliang Wang, Xingyou Dong, Benyi Li, Xiangwei Wang","doi":"10.1186/s13040-024-00377-x","DOIUrl":"10.1186/s13040-024-00377-x","url":null,"abstract":"<p><strong>Objectives: </strong>Bladder cancer (BLCA) is a tumor that affects men more than women. The biological function and prognostic value of androgen-responsive genes (ARGs) in BLCA are currently unknown. To address this, we established an androgen signature to determine the prognosis of BLCA.</p><p><strong>Methods: </strong>Sequencing data for BLCA from the TCGA and GEO datasets were used for research. The tumor microenvironment (TME) was measured using Cibersort and ssGSEA. Prognosis-related genes were identified and a risk score model was constructed using univariate Cox regression, LASSO regression, and multivariate Cox regression. Drug sensitivity analysis was performed using Genomics of drug sensitivity in cancer (GDSC). Real-time quantitative PCR was performed to assess the expression of representative genes in clinical samples.</p><p><strong>Results: </strong>ARGs (especially the CDK6, FADS1, PGM3, SCD, PTK2B, and TPD52) might regulate the progression of BLCA. The different expression patterns of ARGs may lead to different immune cell infiltration. The risk model indicates that patients with higher risk scores have a poorer prognosis, more stromal infiltration, and an enrichment of biological functions. Single-cell RNA analysis, bulk RNA data, and PCR analysis support the reliability of this risk model, and a nomogram was also established for clinical use. Drug prediction analysis showed that high-risk patients had a better response to fludarabine, AZD8186, and carmustine.</p><p><strong>Conclusion: </strong>ARGs played an important role in the progression, immune infiltration, and prognosis of BLCA. The ARGs model has high accuracy in predicting the prognosis of BLCA patients and provides more effective medication guidelines.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"59"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-18DOI: 10.1186/s13040-024-00408-7
Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman
{"title":"Comparing new tools of artificial intelligence to the authentic intelligence of our global health students.","authors":"Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman","doi":"10.1186/s13040-024-00408-7","DOIUrl":"10.1186/s13040-024-00408-7","url":null,"abstract":"<p><strong>Introduction: </strong>The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.</p><p><strong>Methods: </strong>After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.</p><p><strong>Results: </strong>Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.</p><p><strong>Discussion: </strong>Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed a","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"58"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-18DOI: 10.1186/s13040-024-00410-z
Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa
{"title":"Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.","authors":"Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa","doi":"10.1186/s13040-024-00410-z","DOIUrl":"10.1186/s13040-024-00410-z","url":null,"abstract":"<p><strong>Background: </strong>Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs.</p><p><strong>Results: </strong>We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges.</p><p><strong>Conclusions: </strong>Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"57"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-06DOI: 10.1186/s13040-024-00409-6
Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He
{"title":"TGNet: tensor-based graph convolutional networks for multimodal brain network analysis.","authors":"Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He","doi":"10.1186/s13040-024-00409-6","DOIUrl":"10.1186/s13040-024-00409-6","url":null,"abstract":"<p><p>Multimodal brain network analysis enables a comprehensive understanding of neurological disorders by integrating information from multiple neuroimaging modalities. However, existing methods often struggle to effectively model the complex structures of multimodal brain networks. In this paper, we propose a novel tensor-based graph convolutional network (TGNet) framework that combines tensor decomposition with multi-layer GCNs to capture both the homogeneity and intricate graph structures of multimodal brain networks. We evaluate TGNet on four datasets-HIV, Bipolar Disorder (BP), and Parkinson's Disease (PPMI), Alzheimer's Disease (ADNI)-demonstrating that it significantly outperforms existing methods for disease classification tasks, particularly in scenarios with limited sample sizes. The robustness and effectiveness of TGNet highlight its potential for advancing multimodal brain network analysis. The code is available at https://github.com/rongzhou7/TGNet .</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"55"},"PeriodicalIF":4.0,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-12-02DOI: 10.1186/s13040-024-00399-5
Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman
{"title":"Predictive modeling of ALS progression: an XGBoost approach using clinical features.","authors":"Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman","doi":"10.1186/s13040-024-00399-5","DOIUrl":"10.1186/s13040-024-00399-5","url":null,"abstract":"<p><p>This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"54"},"PeriodicalIF":4.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}