Biodata MiningPub Date : 2025-02-27DOI: 10.1186/s13040-025-00434-z
Patrick Maximilian Schwehn, Pascal Falter-Braun
{"title":"Inferring protein from transcript abundances using convolutional neural networks.","authors":"Patrick Maximilian Schwehn, Pascal Falter-Braun","doi":"10.1186/s13040-025-00434-z","DOIUrl":"10.1186/s13040-025-00434-z","url":null,"abstract":"<p><strong>Background: </strong>Although transcript abundance is often used as a proxy for protein abundance, it is an unreliable predictor. As proteins execute biological functions and their expression levels influence phenotypic outcomes, we developed a convolutional neural network (CNN) to predict protein abundances from mRNA abundances, protein sequence, and mRNA sequence in Homo sapiens (H. sapiens) and the reference plant Arabidopsis thaliana (A. thaliana).</p><p><strong>Results: </strong>After hyperparameter optimization and initial data exploration, we implemented distinct training modules for value-based and sequence-based data. By analyzing the learned weights, we revealed common and organism-specific sequence features that influence protein-to-mRNA ratios (PTRs), including known and putative sequence motifs. Adding condition-specific protein interaction information identified genes correlated with many PTRs but did not improve predictions, likely due to insufficient data. The integrated model predicted protein abundance on unseen genes with a coefficient of determination (r<sup>2</sup>) of 0.30 in H. sapiens and 0.32 in A. thaliana.</p><p><strong>Conclusions: </strong>For H. sapiens, our model improves prediction performance by nearly 50% compared to previous sequence-based approaches, and for A. thaliana it represents the first model of its kind. The model's learned motifs recapitulate known regulatory elements, supporting its utility in systems-level and hypothesis-driven research approaches related to protein regulation.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"18"},"PeriodicalIF":4.0,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11866710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143525013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-02-18DOI: 10.1186/s13040-025-00429-w
Tayo Obafemi-Ajayi, Steven F Jennings, Yu Zhang, Kara Li Liu, Joan Peckham, Jason H Moore
{"title":"AI as an accelerator for defining new problems that transcends boundaries.","authors":"Tayo Obafemi-Ajayi, Steven F Jennings, Yu Zhang, Kara Li Liu, Joan Peckham, Jason H Moore","doi":"10.1186/s13040-025-00429-w","DOIUrl":"10.1186/s13040-025-00429-w","url":null,"abstract":"<p><p>Interdisciplinary, transdisciplinary, convergence, and No-Boundary Thinking (NBT) research are methodology and technology-agnostic approaches to problem solving. The focus is on defining problems informed by access to multiple knowledge sources and expert perspectives across different domains, with the goal of accessing all available knowledge sources and perspectives. While access to all available knowledge sources and perspectives could be seen as a difficult to attain objective, with the recent rise of AI we might be closer to approaching this goal. We review several examples of methodologies and technologies that have been used to put these strategies into action, but the primary focus of this paper is on how recent advances in AI now enable a quantum leap forward in defining new problems. By leveraging the capacity of AI to synthesize knowledge from multiple domains, these tools can be used to propose multiple candidate problem definitions. AI is uniquely able to draw upon many more knowledge sources than any individual-or even a very large team-could. Coupled with human intelligence, better problems can be defined to address complex scholarly or societal challenges.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"17"},"PeriodicalIF":4.0,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837601/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143450623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-02-17DOI: 10.1186/s13040-025-00431-2
Arezoo Abasi, Ahmad Nazari, Azar Moezy, Seyed Ali Fatemi Aghda
{"title":"Machine learning models for reinjury risk prediction using cardiopulmonary exercise testing (CPET) data: optimizing athlete recovery.","authors":"Arezoo Abasi, Ahmad Nazari, Azar Moezy, Seyed Ali Fatemi Aghda","doi":"10.1186/s13040-025-00431-2","DOIUrl":"10.1186/s13040-025-00431-2","url":null,"abstract":"<p><strong>Background: </strong>Cardiopulmonary Exercise Testing (CPET) provides detailed insights into athletes' cardiovascular and pulmonary function, making it a valuable tool in assessing recovery and injury risks. However, traditional statistical models often fail to leverage the full potential of CPET data in predicting reinjury. Machine learning (ML) algorithms offer promising capabilities in uncovering complex patterns within this data, allowing for more accurate injury risk assessment.</p><p><strong>Objective: </strong>This study aimed to develop machine learning models to predict reinjury risk among elite soccer players using CPET data. Specifically, we sought to identify key physiological and performance variables that correlate with reinjury and to evaluate the performance of various ML algorithms in generating accurate predictions.</p><p><strong>Methods: </strong>A dataset of 256 elite soccer players from 16 national and top-tier teams in Iran was analyzed, incorporating physiological variables and categorical data. Several machine learning models, including CatBoost, SVM, Random Forest, and XGBoost, were employed to predict reinjury risk. Model performance was assessed using metrics such as accuracy, precision, recall, F1-score, AUC, and SHAP values to ensure robust evaluation and interpretability.</p><p><strong>Results: </strong>CatBoost and SVM exhibited the best performance, with CatBoost achieving the highest accuracy (0.9138) and F1-score (0.9148), and SVM achieving the highest AUC (0.9725). A significant association was found between a history of concussion and reinjury risk (χ² = 13.0360, p = 0.0015), highlighting the importance of neurological recovery in preventing future injuries. Heart rate metrics, particularly HRmax and HR2, were also significantly lower in players who experienced reinjury, indicating reduced cardiovascular capacity in this group.</p><p><strong>Conclusion: </strong>Machine learning models, particularly CatBoost and SVM, provide promising tools for predicting reinjury risk using CPET data. These models offer clinicians more precise, data-driven insights into athlete recovery and risk management. Future research should explore the integration of external factors such as training load and psychological readiness to further refine these predictions and enhance injury prevention protocols.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"16"},"PeriodicalIF":4.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143442544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-02-15DOI: 10.1186/s13040-025-00430-3
Christel Sirocchi, Martin Urschler, Bastian Pfeifer
{"title":"Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping.","authors":"Christel Sirocchi, Martin Urschler, Bastian Pfeifer","doi":"10.1186/s13040-025-00430-3","DOIUrl":"10.1186/s13040-025-00430-3","url":null,"abstract":"<p><p>Explainable and interpretable machine learning has emerged as essential in leveraging artificial intelligence within high-stakes domains such as healthcare to ensure transparency and trustworthiness. Feature importance analysis plays a crucial role in improving model interpretability by pinpointing the most relevant input features, particularly in disease subtyping applications, aimed at stratifying patients based on a small set of signature genes and biomarkers. While clustering methods, including unsupervised random forests, have demonstrated good performance, approaches for evaluating feature contributions in an unsupervised regime are notably scarce. To address this gap, we introduce a novel methodology to enhance the interpretability of unsupervised random forests by elucidating feature contributions through the construction of feature graphs, both over the entire dataset and individual clusters, that leverage parent-child node splits within the trees. Feature selection strategies to derive effective feature combinations from these graphs are presented and extensively evaluated on synthetic and benchmark datasets against state-of-the-art methods, standing out for performance, computational efficiency, reliability, versatility and ability to provide cluster-specific insights. In a disease subtyping application, clustering kidney cancer gene expression data over a feature subset selected with our approach reveals three patient groups with different survival outcomes. Cluster-specific analysis identifies distinctive feature contributions and interactions, essential for devising targeted interventions, conducting personalised risk assessments, and enhancing our understanding of the underlying molecular complexities.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"15"},"PeriodicalIF":4.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11829558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143426202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-02-10DOI: 10.1186/s13040-025-00428-x
Maryam Ramezani, Mohammadreza Mobinizadeh, Ahad Bakhtiari, Hamid R Rabiee, Maryam Ramezani, Hakimeh Mostafavi, Alireza Olyaeemanesh, Ali Akbar Fazaeli, Alireza Atashi, Saharnaz Sazgarnejad, Efat Mohamadi, Amirhossein Takian
{"title":"Agenda setting for health equity assessment through the lenses of social determinants of health using machine learning approach: a framework and preliminary pilot study.","authors":"Maryam Ramezani, Mohammadreza Mobinizadeh, Ahad Bakhtiari, Hamid R Rabiee, Maryam Ramezani, Hakimeh Mostafavi, Alireza Olyaeemanesh, Ali Akbar Fazaeli, Alireza Atashi, Saharnaz Sazgarnejad, Efat Mohamadi, Amirhossein Takian","doi":"10.1186/s13040-025-00428-x","DOIUrl":"10.1186/s13040-025-00428-x","url":null,"abstract":"<p><strong>Introduction: </strong>The integration of Artificial Intelligence (AI) and Machine Learning (ML) is transforming public health by enhancing the assessment and mitigation of health inequities. As the use of AI tools, especially ML techniques, rises, they play a pivotal role in informing policies that promote a more equitable society. This study aims to develop a framework utilizing ML to analyze health system data and set agendas for health equity interventions, focusing on social determinants of health (SDH).</p><p><strong>Method: </strong>This study utilized the CRISP-ML(Q) model to introduce a platform for health equity assessment, facilitating its design and implementation in health systems. Initially, a conceptual model was developed through a comprehensive literature review and document analysis. A pilot implementation was conducted to test the feasibility and effectiveness of using ML algorithms in assessing health equity. Life expectancy was chosen as the health outcome for this pilot; data from 2000 to 2020 with 140 features was cleaned, transformed, and prepared for modeling. Multiple ML models were developed and evaluated using SPSS Modeler software version 18.0.</p><p><strong>Results: </strong>ML algorithms effectively identified key SDH influencing life expectancy. Among algorithms, the Linear Discriminant algorithm as classification model was selected as the best model due to its high accuracy in both testing and training phases, its strong performance in identifying key features, and its good generalizability to new data. Additionally, CHAID in numeric models was the best for predicting the actual value of life expectancy based on various features. These models highlighted the importance of features like current health expenditure, domestic general government health expenditure, and GDP in predicting life expectancy.</p><p><strong>Conclusion: </strong>The findings underscore the significance of employing innovative methods like CRISP-ML(Q) and ML algorithms to enhance health equity. Integrating this platform into health systems can help countries better prioritize and address health inequities. The pilot implementation demonstrated these methods' practical applicability and effectiveness, aiding policymakers in making informed decisions to improve health equity.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"14"},"PeriodicalIF":4.0,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11808983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Immune cell profiles and predictive modeling in osteoporotic vertebral fractures using XGBoost machine learning algorithms.","authors":"Yi-Chou Chen, Hui-Chen Su, Shih-Ming Huang, Ching-Hsiao Yu, Jen-Huei Chang, Yi-Lin Chiu","doi":"10.1186/s13040-025-00427-y","DOIUrl":"10.1186/s13040-025-00427-y","url":null,"abstract":"<p><strong>Background: </strong>Osteoporosis significantly increases the risk of vertebral fractures, particularly among postmenopausal women, decreasing their quality of life. These fractures, often undiagnosed, can lead to severe health consequences and are influenced by bone mineral density and abnormal loads. Management strategies range from non-surgical interventions to surgical treatments. Moreover, the interaction between immune cells and bone cells plays a crucial role in bone repair processes, highlighting the importance of osteoimmunology in understanding and treating bone pathologies.</p><p><strong>Methods: </strong>This study aims to investigate the xCell signature-based immune cell profiles in osteoporotic patients with and without vertebral fractures, utilizing advanced predictive modeling through the XGBoost algorithm.</p><p><strong>Results: </strong>Our findings reveal an increased presence of CD4 + naïve T cells and central memory T cells in VF patients, indicating distinct adaptive immune responses. The XGBoost model identified Th1 cells, CD4 memory T cells, and hematopoietic stem cells as key predictors of VF. Notably, VF patients exhibited a reduction in Th1 cells and an enrichment of Th17 cells, which promote osteoclastogenesis and bone resorption. Gene expression analysis further highlighted an upregulation of osteoclast-related genes and a downregulation of osteoblast-related genes in VF patients, emphasizing the disrupted balance between bone formation and resorption. These findings underscore the critical role of immune cells in the pathogenesis of osteoporotic fractures and highlight the potential of XGBoost in identifying key biomarkers and therapeutic targets for mitigating fracture risk in osteoporotic patients.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"13"},"PeriodicalIF":4.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-02-03DOI: 10.1186/s13040-024-00415-8
Salman Khan, Sumaiya Noor, Tahir Javed, Afshan Naseem, Fahad Aslam, Salman A AlQahtani, Nijad Ahmad
{"title":"XGBoost-enhanced ensemble model using discriminative hybrid features for the prediction of sumoylation sites.","authors":"Salman Khan, Sumaiya Noor, Tahir Javed, Afshan Naseem, Fahad Aslam, Salman A AlQahtani, Nijad Ahmad","doi":"10.1186/s13040-024-00415-8","DOIUrl":"10.1186/s13040-024-00415-8","url":null,"abstract":"<p><p>Posttranslational modifications (PTMs) are essential for regulating protein localization and stability, significantly affecting gene expression, biological functions, and genome replication. Among these, sumoylation a PTM that attaches a chemical group to protein sequences-plays a critical role in protein function. Identifying sumoylation sites is particularly important due to their links to Parkinson's and Alzheimer's. This study introduces XGBoost-Sumo, a robust model to predict sumoylation sites by integrating protein structure and sequence data. The model utilizes a transformer-based attention mechanism to encode peptides and extract evolutionary features through the PsePSSM-DWT approach. By fusing word embeddings with evolutionary descriptors, it applies the SHapley Additive exPlanations (SHAP) algorithm for optimal feature selection and uses eXtreme Gradient Boosting (XGBoost) for classification. XGBoost-Sumo achieved an impressive accuracy of 99.68% on benchmark datasets using 10-fold cross-validation and 96.08% on independent samples. This marks a significant improvement, outperforming existing models by 10.31% on training data and 2.74% on independent tests. The model's reliability and high performance make it a valuable resource for researchers, with strong potential for applications in pharmaceutical development.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"12"},"PeriodicalIF":4.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143123566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-01-30DOI: 10.1186/s13040-025-00422-3
Hyunwook Koh, Jihun Kim, Hyojung Jang
{"title":"MiCML: a causal machine learning cloud platform for the analysis of treatment effects using microbiome profiles.","authors":"Hyunwook Koh, Jihun Kim, Hyojung Jang","doi":"10.1186/s13040-025-00422-3","DOIUrl":"10.1186/s13040-025-00422-3","url":null,"abstract":"<p><strong>Background: </strong>The treatment effects are heterogenous across patients due to the differences in their microbiomes, which in turn implies that we can enhance the treatment effect by manipulating the patient's microbiome profile. Then, the coadministration of microbiome-based dietary supplements/therapeutics along with the primary treatment has been the subject of intensive investigation. However, for this, we first need to comprehend which microbes help (or prevent) the treatment to cure the patient's disease.</p><p><strong>Results: </strong>In this paper, we introduce a cloud platform, named microbiome causal machine learning (MiCML), for the analysis of treatment effects using microbiome profiles on user-friendly web environments. MiCML is in particular unique with the up-to-date features of (i) batch effect correction to mitigate systematic variation in collective large-scale microbiome data due to the differences in their underlying batches, and (ii) causal machine learning to estimate treatment effects with consistency and then discern microbial taxa that enhance (or lower) the efficacy of the primary treatment. We also stress that MiCML can handle the data from either randomized controlled trials or observational studies.</p><p><strong>Conclusion: </strong>We describe MiCML as a useful analytic tool for microbiome-based personalized medicine. MiCML is freely available on our web server ( http://micml.micloud.kr ). MiCML can also be implemented locally on the user's computer through our GitHub repository ( https://github.com/hk1785/micml ).</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"10"},"PeriodicalIF":4.0,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep learning approach for classifying and predicting children's nutritional status in Ethiopia using LSTM-FC neural networks.","authors":"Getnet Bogale Begashaw, Temesgen Zewotir, Haile Mekonnen Fenta","doi":"10.1186/s13040-025-00425-0","DOIUrl":"10.1186/s13040-025-00425-0","url":null,"abstract":"<p><strong>Background: </strong>This study employs a LSTM-FC neural networks to address the critical public health issue of child undernutrition in Ethiopia. By employing this method, the study aims classify children's nutritional status and predict transitions between different undernutrition states over time. This analysis is based on longitudinal data extracted from the Young Lives cohort study, which tracked 1,997 Ethiopian children across five survey rounds conducted from 2002 to 2016. This paper applies rigorous data preprocessing, including handling missing values, normalization, and balancing, to ensure optimal model performance. Feature selection was performed using SHapley Additive exPlanations to identify key factors influencing nutritional status predictions. Hyperparameter tuning was thoroughly applied during model training to optimize performance. Furthermore, this paper compares the performance of LSTM-FC with existing baseline models to demonstrate its superiority. We used Python's TensorFlow and Keras libraries on a GPU-equipped system for model training.</p><p><strong>Results: </strong>LSTM-FC demonstrated superior predictive accuracy and long-term forecasting compared to baseline models for assessing child nutritional status. The classification and prediction performance of the model showed high accuracy rates above 93%, with perfect predictions for Normal (N) and Stunted & Wasted (SW) categories, minimal errors in most other nutritional statuses, and slight over- or underestimations in a few instances. The LSTM-FC model demonstrates strong generalization performance across multiple folds, with high recall and consistent F1-scores, indicating its robustness in predicting nutritional status. We analyzed the prevalence of children's nutritional status during their transition from late adolescence to early adulthood. The results show a notable decline in normal nutritional status among males, decreasing from 58.3% at age 5 to 33.5% by age 25. At the same time, the risk of severe undernutrition, including conditions of being underweight, stunted, and wasted (USW), increased from 1.3% to 9.4%.</p><p><strong>Conclusions: </strong>The LSTM-FC model outperforms baseline methods in classifying and predicting Ethiopian children's nutritional statuses. The findings reveal a critical rise in undernutrition, emphasizing the need for urgent public health interventions.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"11"},"PeriodicalIF":4.0,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generative deep neural network for pan-digestive tract cancer survival analysis.","authors":"Lekai Xu, Tianjun Lan, Yiqian Huang, Liansheng Wang, Junqi Lin, Xinpeng Song, Hui Tang, Haotian Cao, Hua Chai","doi":"10.1186/s13040-025-00426-z","DOIUrl":"10.1186/s13040-025-00426-z","url":null,"abstract":"<p><strong>Background: </strong>The accurate identification of molecular subtypes in digestive tract cancer (DTC) is crucial for making informed treatment decisions and selecting potential biomarkers. With the rapid advancement of artificial intelligence, various machine learning algorithms have been successfully applied in this field. However, the complexity and high dimensionality of the data features may lead to overlapping and ambiguous subtypes during clustering.</p><p><strong>Results: </strong>In this study, we propose GDEC, a multi-task generative deep neural network designed for precise digestive tract cancer subtyping. The network optimization process involves employing an integrated loss function consisting of two modules: the generative-adversarial module facilitates spatial data distribution understanding for extracting high-quality information, while the clustering module aids in identifying disease subtypes. The experiments conducted on digestive tract cancer datasets demonstrate that GDEC exhibits exceptional performance compared to other advanced methodologies and can separate different cancer molecular subtypes that possess both statistical and biological significance. Subsequently, 21 hub genes related to pan-DTC heterogeneity and prognosis were identified based on the subtypes clustered by GDEC. The following drug analysis suggested Dasatinib and YM155 as potential therapeutic agents for improving the prognosis of patients in pan-DTC immunotherapy, thereby contributing to the enhancement of cancer patient survival.</p><p><strong>Conclusions: </strong>The experiment indicate that GDEC outperforms better than other deep-learning-based methods, and the interpretable algorithm can select biologically significant genes and potential drugs for DTC treatment.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"9"},"PeriodicalIF":4.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143054000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}