Biodata MiningPub Date : 2024-12-02DOI: 10.1186/s13040-024-00399-5
Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman
{"title":"Predictive modeling of ALS progression: an XGBoost approach using clinical features.","authors":"Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman","doi":"10.1186/s13040-024-00399-5","DOIUrl":"10.1186/s13040-024-00399-5","url":null,"abstract":"<p><p>This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"54"},"PeriodicalIF":4.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning-based Emergency Department In-hospital Cardiac Arrest Score (Deep EDICAS) for early prediction of cardiac arrest and cardiopulmonary resuscitation in the emergency department.","authors":"Yuan-Xiang Deng, Jyun-Yi Wang, Chia-Hsin Ko, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu","doi":"10.1186/s13040-024-00407-8","DOIUrl":"10.1186/s13040-024-00407-8","url":null,"abstract":"<p><strong>Background: </strong>Timely identification of deteriorating patients is crucial to prevent the progression to cardiac arrest. However, current methods predicting emergency department cardiac arrest are primarily static, rule-based with limited precision and cannot accommodate time-series data. Deep learning has the potential to continuously update data and provide more precise predictions throughout the emergency department stay.</p><p><strong>Methods: </strong>We developed and internally validated a deep learning-based scoring system, the Deep EDICAS for early prediction of cardiac arrest and a subset of arrest, cardiopulmonary resuscitation (CPR), in the emergency department. Our proposed model effectively integrates tabular and time series data to enhance predictive accuracy. To address data imbalance and bolster early prediction capabilities, we implemented data augmentation techniques.</p><p><strong>Results: </strong>Our system achieved an AUPRC of 0.5178 and an AUROC of 0.9388 on on data from the National Taiwan University Hospital. For early prediction, our system achieved an AUPRC of 0.2798 and an AUROC of 0.9046, demonstrating superiority over other early warning scores. Moerover, Deep EDICAS offers interpretability through feature importance analysis.</p><p><strong>Conclusion: </strong>Our study demonstrates the effectiveness of deep learning in predicting cardiac arrest in emergency department. Despite the higher clinical value associated with detecting patients requiring CPR, there is a scarcity of literature utilizing deep learning in CPR detection tasks. Therefore, this study embarks on an initial exploration into the task of CPR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"52"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-11-23DOI: 10.1186/s13040-024-00406-9
Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean
{"title":"Supervised multiple kernel learning approaches for multi-omics data integration.","authors":"Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean","doi":"10.1186/s13040-024-00406-9","DOIUrl":"10.1186/s13040-024-00406-9","url":null,"abstract":"<p><strong>Background: </strong>Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.</p><p><strong>Results: </strong>We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.</p><p><strong>Conclusion: </strong>Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"53"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-11-14DOI: 10.1186/s13040-024-00404-x
Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua
{"title":"Transcriptome-based network analysis related to regulatory T cells infiltration identified RCN1 as a potential biomarker for prognosis in clear cell renal cell carcinoma.","authors":"Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua","doi":"10.1186/s13040-024-00404-x","DOIUrl":"10.1186/s13040-024-00404-x","url":null,"abstract":"<p><strong>Background: </strong>Regulatory T cells (Tregs) play a critical role in shaping the immunosuppressive microenvironment within tumors. Investigating the role of Tregs in Clear cell renal cell carcinoma (ccRCC) is crucial for identifying prognostic markers and therapeutic targets for ccRCC.</p><p><strong>Methods: </strong>Weighted gene co-expression network analysis (WGCNA) was utilized to pinpoint modules related to Treg infiltration in TCGA-KIRC samples. Following this, consensus clustering was employed to derive two clusters associated with Treg infiltration in ccRCC. A prognostic model was then developed using the gene module associated with Treg infiltration. We then evaluated the ability of the prognostic model to predict ccRCC overall survival and demonstrated that RCN1 can be used as a target to predict ccRCC prognosis.</p><p><strong>Results: </strong>We deduce that the two clusters associated with Treg infiltration exhibit distinct compositions of the immune microenvironment, pathway activations, prognosis, and drug sensitivities commonly utilized in ccRCC treatment. Furthermore, a 7-gene model risk score, developed based on ccRCC Treg infiltration, proved to be a reliable prognostic marker in both training and validation cohorts. Additionally, survival analysis indicated that RCN1 serves as a reliable prognostic factor for ccRCC. Single-cell sequencing analysis revealed that RCN1 is predominantly expressed in tumor cells. A pan-cancer analysis highlighted that RCN1 is linked with poor prognosis and the activation of inflammatory response pathways across various cancers.</p><p><strong>Conclusion: </strong>We developed a prognostic model associated with Treg infiltration, which facilitates the clinical categorization of ccRCC progression. Moreover, our findings underscore the significant potential of RCN1 as a ccRCC biomarker.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"51"},"PeriodicalIF":4.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-11-13DOI: 10.1186/s13040-024-00400-1
Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan
{"title":"Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation.","authors":"Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan","doi":"10.1186/s13040-024-00400-1","DOIUrl":"10.1186/s13040-024-00400-1","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a set of SNPs significantly associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits around APOE region on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"50"},"PeriodicalIF":4.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating potential drug targets for IgA nephropathy and membranous nephropathy through multi-queue plasma protein analysis: a Mendelian randomization study based on SMR and co-localization analysis.","authors":"Xinyi Xu, Changhong Miao, Shirui Yang, Lu Xiao, Ying Gao, Fangying Wu, Jianbo Xu","doi":"10.1186/s13040-024-00405-w","DOIUrl":"10.1186/s13040-024-00405-w","url":null,"abstract":"<p><strong>Background: </strong>Membranous nephropathy (MN) and IgA nephropathy (IgAN) pose challenges in clinical treatment with existing therapies primarily focusing on symptom relief and often yielding unsatisfactory outcomes. The search for novel drug targets remains crucial to address the shortcomings in managing both kidney diseases.</p><p><strong>Methods: </strong>Utilizing GWAS data for MN (ncase = 2150, ncontrol = 5829) and IgAN (ncase = 15587, ncontrol = 462197), instrumental variables for plasma proteins were derived from recent GWAS. Sensitivity analysis involved bidirectional Mendelian randomization analysis, MR Steiger, Bayesian co-localization, and Phenotype scanning. The SMR analysis using eQTL data from the eQTLGen Consortium was conducted to assess the availability of selected protein targets. The PPI network was constructed to reveal potential associations with existing drug treatment targets.</p><p><strong>Results: </strong>The study, subjected to the stringent Bonferroni correction, revealed significant associations: four proteins with MN and three proteins with IgAN. In plasma protein cis-pQTL data from two cohorts, an increase in one standard deviation in PLA2R1 (OR = 2.01, 95%CI = 1.83-2.21), AIF1 (OR = 9.04, 95%CI = 4.69-17.41), MLN (OR = 3.79, 95%CI = 2.12-6.78), and NFKB1 (OR = 29.43, 95%CI = 7.73-112.0) was associated with an increased risk of MN. Additionally, in plasma protein cis-pQTL data, a standard deviation increase in FCGR3B (OR = 1.15, 95%CI = 1.09-1.22) and BTN3A1 (OR = 4.05, 95%CI = 2.65-6.19) correlated with elevated IgAN risk, while AIF1 (OR = 0.58, 95%CI = 0.46-0.73) exhibited IgAN protection. Bayesian co-localization indicated that PLA2R1 (coloc.abf-PPH4 = 0.695), NFKB1 (coloc.abf-PPH4 = 0.949), FCGR3B (coloc.abf-PPH4 = 0.909), and BTN3A1 (coloc.abf-PPH4 = 0.685) share the same variants associated with MN and IgAN. The SMR analysis indicated a causal link between NFKB1 and BTN3A1 plasma protein eQTL in both conditions, and BTN3A1 was validated externally.</p><p><strong>Conclusion: </strong>Genetically influenced plasma levels of PLA2R1 and NFKB1 impact MN risk, while FCGR3B and BTN3A1 levels are causally linked to IgAN risk, suggesting potential drug targets for further clinical exploration, notably BTN3A1 for IgAN.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"49"},"PeriodicalIF":4.0,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11545554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep joint learning diagnosis of Alzheimer's disease based on multimodal feature fusion.","authors":"Jingru Wang, Shipeng Wen, Wenjie Liu, Xianglian Meng, Zhuqing Jiao","doi":"10.1186/s13040-024-00395-9","DOIUrl":"10.1186/s13040-024-00395-9","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is an advanced and incurable neurodegenerative disease. Genetic variations are intrinsic etiological factors contributing to the abnormal expression of brain function and structure in AD patients. A new multimodal feature fusion called \"magnetic resonance imaging (MRI)-p value\" was proposed to construct 3D fusion images by introducing genes as a priori knowledge. Moreover, a new deep joint learning diagnostic model was constructed to fully learn images features. One branch trained a residual network (ResNet) to learn the features of local pathological regions. The other branch learned the position information of brain regions with different changes in the different categories of subjects' brains by introducing attention convolution, and then obtained the discriminative probability information from locations via convolution and global average pooling. The feature and position information of the two branches were linearly interacted to acquire the diagnostic basis for classifying the different categories of subjects. The diagnoses of AD and health control (HC), AD and mild cognitive impairment (MCI), HC and MCI were performed with data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The results showed that the proposed method achieved optimal results in AD-related diagnosis. The classification accuracy (ACC) and area under the curve (AUC) of the three experimental groups were 93.44% and 96.67%, 89.06% and 92%, and 84% and 81.84%, respectively. Moreover, a total of six novel genes were found to be significantly associated with AD, namely NTM, MAML2, NAALADL2, FHIT, TMEM132D and PCSK5, which provided new targets for the potential treatment of neurodegenerative diseases.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"48"},"PeriodicalIF":4.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-11-01DOI: 10.1186/s13040-024-00403-y
Amani Almohaimeed, Ishag Adam
{"title":"Modeling heterogeneity of Sudanese hospital stay in neonatal and maternal unit: non-parametric random effect models with Gamma distribution.","authors":"Amani Almohaimeed, Ishag Adam","doi":"10.1186/s13040-024-00403-y","DOIUrl":"10.1186/s13040-024-00403-y","url":null,"abstract":"<p><strong>Objective: </strong>Studies looking into patient and institutional variables linked to extended hospital stays have arisen as a result of the increased focus on severe maternal morbidity and mortality. Understanding the length of hospitalization of patients after delivery is important to gain insights into when hospitals will reach capacity and to predict corresponding staffing or equipment requirements. In Sudan, the distribution of the length of stay during delivery hospitalizations is heavily skewed, with the average length of stay of 2 to 3 days. This study aimed to investigate the use of non-parametric random effect model with Gamma distributed response for analyzing skewed hospital length of stay data in Sudan in neonatal and maternal unit.</p><p><strong>Methods: </strong>We applied Gamma regression models with unknown random effects, estimated using the non-parametric maximum likelihood (NPML) technique [5]. The NPML reduces the heterogeneity in the distribution of the response and produce a robust estimation since it does not require any assumptions on the distribution. The same applies to the log-Gamma link that does not require any transformation for the data distribution and it can handle the outliers in the data points. In this study, the models are fitted with and without covariates and compared using AIC and BIC values.</p><p><strong>Results: </strong>The findings imply that in the context of health care database investigations, Gamma regression models with non-parametric random effect consistently reduce heterogeneity and improve model accuracy. The generalized linear model with covariates and random effect (k = 4) had the best fit, indicating that Sudanese hospital length of stay data could be classified into four groups with varying average stays influenced by maternal, neonatal, and obstetrics data.</p><p><strong>Conclusion: </strong>Identifying factors contributing to longer stays allows hospitals to implement strategies for improvement. Non-parametric random effect model with Gamma distributed response effectively accounts for unobserved heterogeneity and individual-level variability, leading to more accurate inferences and improved patient care. Including random effects can significantly affect variable significance in statistical models, emphasizing the need to consider unobserved heterogeneity when analyzing data containing potential individual-level variability. The findings emphasise the importance of making robust methodological choices in healthcare research in order to inform accurate policy decisions.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"47"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142565124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-10-30DOI: 10.1186/s13040-024-00397-7
Vanesa Gómez-Martínez, David Chushig-Muzo, Marit B Veierød, Conceição Granja, Cristina Soguero-Ruiz
{"title":"Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability.","authors":"Vanesa Gómez-Martínez, David Chushig-Muzo, Marit B Veierød, Conceição Granja, Cristina Soguero-Ruiz","doi":"10.1186/s13040-024-00397-7","DOIUrl":"10.1186/s13040-024-00397-7","url":null,"abstract":"<p><strong>Background: </strong>Cutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented.</p><p><strong>Methods: </strong>In this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations.</p><p><strong>Results: </strong>The combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features.</p><p><strong>Conclusions: </strong>Our results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"46"},"PeriodicalIF":4.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2024-10-29DOI: 10.1186/s13040-024-00401-0
Laila Musib, Roberta Coletti, Marta B Lopes, Helena Mouriño, Eunice Carrasquinha
{"title":"Priority-Elastic net for binary disease outcome prediction based on multi-omics data.","authors":"Laila Musib, Roberta Coletti, Marta B Lopes, Helena Mouriño, Eunice Carrasquinha","doi":"10.1186/s13040-024-00401-0","DOIUrl":"10.1186/s13040-024-00401-0","url":null,"abstract":"<p><strong>Background: </strong>High-dimensional omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential to improve predictive models. However, the data integration process faces several challenges, including data heterogeneity, priority sequence in which data blocks are prioritized for rendering predictive information contained in multiple blocks, assessing the flow of information from one omics level to the other and multicollinearity.</p><p><strong>Methods: </strong>We propose the Priority-Elastic net algorithm, a hierarchical regression method extending Priority-Lasso for the binary logistic regression model by incorporating a priority order for blocks of variables while fitting Elastic-net models sequentially for each block. The fitted values from each step are then used as an offset in the subsequent step. Additionally, we considered the adaptive elastic-net penalty within our priority framework to compare the results.</p><p><strong>Results: </strong>The Priority-Elastic net and Priority-Adaptive Elastic net algorithms were evaluated on a brain tumor dataset available from The Cancer Genome Atlas (TCGA), accounting for transcriptomics, proteomics, and clinical information measured over two glioma types: Lower-grade glioma (LGG) and glioblastoma (GBM).</p><p><strong>Conclusion: </strong>Our findings suggest that the Priority-Elastic net is a highly advantageous choice for a wide range of applications. It offers moderate computational complexity, flexibility in integrating prior knowledge while introducing a hierarchical modeling perspective, and, importantly, improved stability and accuracy in predictions, making it superior to the other methods discussed. This evolution marks a significant step forward in predictive modeling, offering a sophisticated tool for navigating the complexities of multi-omics datasets in pursuit of precision medicine's ultimate goal: personalized treatment optimization based on a comprehensive array of patient-specific data. This framework can be generalized to time-to-event, Cox proportional hazards regression and multicategorical outcomes. A practical implementation of this method is available upon request in R script, complete with an example to facilitate its application.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"45"},"PeriodicalIF":4.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11523883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142548496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}