Biodata Mining最新文献_第10页

From COVID-19 to monkeypox: a novel predictive model for emerging infectious diseases. 从 COVID-19 到猴痘：新出现传染病的新型预测模型。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-22 DOI: 10.1186/s13040-024-00396-8

Deren Xu, Weng Howe Chan, Habibollah Haron, Hui Wen Nies, Kohbalan Moorthy

{"title":"From COVID-19 to monkeypox: a novel predictive model for emerging infectious diseases.","authors":"Deren Xu, Weng Howe Chan, Habibollah Haron, Hui Wen Nies, Kohbalan Moorthy","doi":"10.1186/s13040-024-00396-8","DOIUrl":"https://doi.org/10.1186/s13040-024-00396-8","url":null,"abstract":"The outbreak of emerging infectious diseases poses significant challenges to global public health. Accurate early forecasting is crucial for effective resource allocation and emergency response planning. This study aims to develop a comprehensive predictive model for emerging infectious diseases, integrating the blending framework, transfer learning, incremental learning, and the biological feature Rt to increase prediction accuracy and practicality. By transferring features from a COVID-19 dataset to a monkeypox dataset and introducing dynamically updated incremental learning techniques, the model's predictive capability in data-scarce scenarios was significantly improved. The research findings demonstrate that the blending framework performs exceptionally well in short-term (7-day) predictions. Furthermore, the combination of transfer learning and incremental learning techniques significantly enhanced the adaptability and precision, with a 91.41% improvement in the RMSE and an 89.13% improvement in the MAE. In particular, the inclusion of the Rt feature enabled the model to more accurately reflect the dynamics of disease spread, further improving the RMSE by 1.91% and the MAE by 2.17%. This study underscores the significant application potential of multimodel fusion and real-time data updates in infectious disease prediction, offering new theoretical perspectives and technical support. This research not only enriches the theoretical foundation of infectious disease prediction models but also provides reliable technical support for public health emergency responses. Future research should continue to explore integrating data from multiple sources and enhancing model generalization capabilities to further enhance the practicality and reliability of predictive tools.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"42"},"PeriodicalIF":4.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142511163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies. PAGER：一种新的基因型编码策略，用于对复杂性状关联研究中的加性偏差进行建模。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-11 DOI: 10.1186/s13040-024-00393-x

Philip J Freda, Attri Ghosh, Priyanka Bhandary, Nicholas Matsumoto, Apurva S Chitre, Jiayan Zhou, Molly A Hall, Abraham A Palmer, Tayo Obafemi-Ajayi, Jason H Moore

{"title":"PAGER: A novel genotype encoding strategy for modeling deviations from additivity in complex trait association studies.","authors":"Philip J Freda, Attri Ghosh, Priyanka Bhandary, Nicholas Matsumoto, Apurva S Chitre, Jiayan Zhou, Molly A Hall, Abraham A Palmer, Tayo Obafemi-Ajayi, Jason H Moore","doi":"10.1186/s13040-024-00393-x","DOIUrl":"10.1186/s13040-024-00393-x","url":null,"abstract":"Background: The additive model of inheritance assumes that heterozygotes (Aa) are exactly intermediate in respect to homozygotes (AA and aa). While this model is commonly used in single-locus genetic association studies, significant deviations from additivity are well-documented and contribute to phenotypic variance across many traits and systems. This assumption can introduce type I and type II errors by overestimating or underestimating the effects of variants that deviate from additivity. Alternative genotype encoding strategies have been explored to account for different inheritance patterns, but they often incur significant computational or methodological costs. To address these challenges, we introduce PAGER (Phenotype Adjusted Genotype Encoding and Ranking), an efficient pre-processing method that encodes each genetic variant based on normalized mean phenotypic differences between diallelic genotype classes (AA, Aa, and aa). This approach more accurately reflects each variant's true inheritance model, improving model precision while minimizing the costs associated with alternative encoding strategies.Results: Through extensive benchmarking on SNPs simulated with both binary and continuous phenotypes, we demonstrate that PAGER accurately represents various inheritance patterns (including additive, dominant, recessive, and heterosis), achieves levels of statistical power that meet or exceed other encoding strategies, and attains computation speeds up to 55 times faster than a similar method, EDGE. We also apply PAGER to publicly available real-world data and identify a novel, relevant putative QTL associated with body mass index in rats (Rattus norvegicus) that is not detected with the additive model.Conclusions: Overall, we show that PAGER is an efficient genotype encoding approach that can uncover sources of missing heritability and reveal novel insights in the study of complex traits while incurring minimal costs.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"41"},"PeriodicalIF":4.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11468469/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142407082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoding the genetic comorbidity network of Alzheimer's disease. 解码阿尔茨海默病的遗传合并症网络。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-09 DOI: 10.1186/s13040-024-00394-w

Xueli Zhang, Dantong Li, Siting Ye, Shunming Liu, Shuo Ma, Min Li, Qiliang Peng, Lianting Hu, Xianwen Shang, Mingguang He, Lei Zhang

引用次数: 0

MDVarP: modifier ~ disease-causing variant pairs predictor. MDVarP：修饰符 ~ 致病变异对预测器。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-08 DOI: 10.1186/s13040-024-00392-y

Hong Sun, Yunqin Chen, Liangxiao Ma

{"title":"MDVarP: modifier ~ disease-causing variant pairs predictor.","authors":"Hong Sun, Yunqin Chen, Liangxiao Ma","doi":"10.1186/s13040-024-00392-y","DOIUrl":"10.1186/s13040-024-00392-y","url":null,"abstract":"Background: Modifiers significantly impact disease phenotypes by modulating the effects of disease-causing variants, resulting in varying disease manifestations among individuals. However, identifying genetic interactions between modifier and disease-causing variants is challenging.Results: We developed MDVarP, an ensemble model comprising 1000 random forest predictors, to identify modifier ~ disease-causing variant combinations. MDVarP achieves high accuracy and precision, as verified using an independent dataset with published evidence of genetic interactions. We identified 25 novel modifier ~ disease-causing variant combinations and obtained supporting evidence for these associations. MDVarP outputs a class label (\"Associated-pair\" or \"Nonrelevant-pair\") and two prediction scores indicating the probability of a true association.Conclusions: MDVarP prioritizes variant pairs associated with phenotypic modulations, enabling more effective mapping of functional contributions from disease-causing and modifier variants. This framework interprets genetic interactions underlying phenotypic variations in human diseases, with potential applications in personalized medicine and disease prevention.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"39"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11460193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142394497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning-based approaches for multi-omics data integration and analysis. 基于深度学习的多组学数据整合与分析方法。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-02 DOI: 10.1186/s13040-024-00391-z

Jenna L Ballard, Zexuan Wang, Wenrui Li, Li Shen, Qi Long

{"title":"Deep learning-based approaches for multi-omics data integration and analysis.","authors":"Jenna L Ballard, Zexuan Wang, Wenrui Li, Li Shen, Qi Long","doi":"10.1186/s13040-024-00391-z","DOIUrl":"10.1186/s13040-024-00391-z","url":null,"abstract":"Background: The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration.Method: In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration.Results: Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data.Conclusion: We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"38"},"PeriodicalIF":4.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446004/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142367123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the limitations of relief-based algorithms in detecting higher-order interactions. 评估基于浮雕的算法在检测高阶交互作用方面的局限性。

IF 4 3区生物学

Biodata Mining Pub Date : 2024-10-01 DOI: 10.1186/s13040-024-00390-0

Philip J Freda, Suyu Ye, Robert Zhang, Jason H Moore, Ryan J Urbanowicz

{"title":"Assessing the limitations of relief-based algorithms in detecting higher-order interactions.","authors":"Philip J Freda, Suyu Ye, Robert Zhang, Jason H Moore, Ryan J Urbanowicz","doi":"10.1186/s13040-024-00390-0","DOIUrl":"10.1186/s13040-024-00390-0","url":null,"abstract":"Background: Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as \"interaction-sensitive\" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information.Results: Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features.Conclusions: These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"37"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443793/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142362274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers 通过对免疫标记物采用联合降维和聚类方法识别全身性自身免疫疾病的异质亚组

IF 4.5 3区生物学

Biodata Mining Pub Date : 2024-09-16 DOI: 10.1186/s13040-024-00389-7

Chia-Wei Chang, Hsin-Yao Wang, Wan-Ying Lin, Yu-Chiang Wang, Wei-Lin Lo, Ting-Wei Lin, Jia-Ruei Yu, Yi-Ju Tseng

{"title":"Identifying heterogeneous subgroups of systemic autoimmune diseases by applying a joint dimension reduction and clustering approach to immunomarkers","authors":"Chia-Wei Chang, Hsin-Yao Wang, Wan-Ying Lin, Yu-Chiang Wang, Wei-Lin Lo, Ting-Wei Lin, Jia-Ruei Yu, Yi-Ju Tseng","doi":"10.1186/s13040-024-00389-7","DOIUrl":"https://doi.org/10.1186/s13040-024-00389-7","url":null,"abstract":"The high complexity of systemic autoimmune diseases (SADs) has hindered precise management. This study aims to investigate heterogeneity in SADs. We applied a joint cluster analysis, which jointed multiple correspondence analysis and k-means, to immunomarkers and measured the heterogeneity of clusters by examining differences in immunomarkers and clinical manifestations. The electronic health records of patients who received an antinuclear antibody test and were diagnosed with SADs, namely systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS), were retrieved between 2001 and 2016 from hospitals in Taiwan. With distinctive patterns of immunomarkers, a total of 11,923 patients with the three SADs were grouped into six clusters. None of the clusters was composed only of a single SAD, and these clusters demonstrated considerable differences in clinical manifestation. Both patients with SLE and SS had a more dispersed distribution in the six clusters. Among patients with SLE, the occurrence of renal compromise was higher in Clusters 3 and 6 (52% and 51%) than in the other clusters (p < 0.001). Cluster 3 also had a high proportion of patients with discoid lupus (60%) than did Cluster 6 (39%; p < 0.001). Patients with SS in Cluster 3 were the most distinctive because of the high occurrence of immunity disorders (63%) and other and unspecified benign neoplasm (58%) with statistical significance compared with the other clusters (all p < 0.05). The immunomarker-driven clustering method could recognise more clinically relevant subgroups of the SADs and would provide a more precise diagnosis basis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"117 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development, evaluation and comparison of machine learning algorithms for predicting in-hospital patient charges for congestive heart failure exacerbations, chronic obstructive pulmonary disease exacerbations and diabetic ketoacidosis 开发、评估和比较用于预测充血性心力衰竭加重、慢性阻塞性肺病加重和糖尿病酮症酸中毒住院患者费用的机器学习算法

IF 4.5 3区生物学

Biodata Mining Pub Date : 2024-09-12 DOI: 10.1186/s13040-024-00387-9

Monique Arnold, Lathan Liou, Mary Regina Boland

{"title":"Development, evaluation and comparison of machine learning algorithms for predicting in-hospital patient charges for congestive heart failure exacerbations, chronic obstructive pulmonary disease exacerbations and diabetic ketoacidosis","authors":"Monique Arnold, Lathan Liou, Mary Regina Boland","doi":"10.1186/s13040-024-00387-9","DOIUrl":"https://doi.org/10.1186/s13040-024-00387-9","url":null,"abstract":"Hospitalizations for exacerbations of congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD) and diabetic ketoacidosis (DKA) are costly in the United States. The purpose of this study was to predict in-hospital charges for each condition using machine learning (ML) models. We conducted a retrospective cohort study on national discharge records of hospitalized adult patients from January 1st, 2016, to December 31st, 2019. We constructed six ML models (linear regression, ridge regression, support vector machine, random forest, gradient boosting and extreme gradient boosting) to predict total in-hospital cost for admission for each condition. Our models had good predictive performance, with testing R-squared values of 0.701-0.750 (mean of 0.713) for CHF; 0.694-0.724 (mean 0.709) for COPD; and 0.615-0.729 (mean 0.694) for DKA. We identified important key features driving costs, including patient age, length of stay, number of procedures, and elective/nonelective admission. ML methods may be used to accurately predict costs and identify drivers of high cost for COPD exacerbations, CHF exacerbations and DKA. Overall, our findings may inform future studies that seek to decrease the underlying high patient costs for these conditions.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"40 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Private pathological assessment via machine learning and homomorphic encryption 通过机器学习和同态加密进行私人病理评估

IF 4.5 3区生物学

Biodata Mining Pub Date : 2024-09-10 DOI: 10.1186/s13040-024-00379-9

Ahmad Al Badawi, Mohd Faizal Bin Yusof

{"title":"Private pathological assessment via machine learning and homomorphic encryption","authors":"Ahmad Al Badawi, Mohd Faizal Bin Yusof","doi":"10.1186/s13040-024-00379-9","DOIUrl":"https://doi.org/10.1186/s13040-024-00379-9","url":null,"abstract":"The objective of this research is to explore the applicability of machine learning and fully homomorphic encryption (FHE) in the private pathological assessment, with a focus on the inference phase of support vector machines (SVM) for the classification of confidential medical data. A framework is introduced that utilizes the Cheon-Kim-Kim-Song (CKKS) FHE scheme, facilitating the execution of SVM inference on encrypted datasets. This framework ensures the privacy of patient data and negates the necessity of decryption during the analytical process. Additionally, an efficient feature extraction technique is presented for the transformation of medical imagery into vectorial representations. The system’s evaluation across various datasets substantiates its practicality and efficacy. The proposed method delivers classification accuracy and performance on par with traditional, non-encrypted SVM inference, while upholding a 128-bit security level against established cryptographic attacks targeting the CKKS scheme. The secure inference process is executed within a temporal span of mere seconds. The findings of this study underscore the viability of FHE in enhancing the security and efficiency of bioinformatics analyses, potentially benefiting fields such as cardiology, oncology, and medical imagery. The implications of this research are significant for the future of privacy-preserving machine learning, promoting progress in diagnostic procedures, tailored medical treatments, and clinical investigations.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"71 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data 针对高维数据和小样本量的知识倾斜随机森林方法与基因表达数据的特征选择应用

IF 4.5 3区生物学

Biodata Mining Pub Date : 2024-09-10 DOI: 10.1186/s13040-024-00388-8

Erika Cantor, Sandra Guauque-Olarte, Roberto León, Steren Chabert, Rodrigo Salas

{"title":"Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection application for gene expression data","authors":"Erika Cantor, Sandra Guauque-Olarte, Roberto León, Steren Chabert, Rodrigo Salas","doi":"10.1186/s13040-024-00388-8","DOIUrl":"https://doi.org/10.1186/s13040-024-00388-8","url":null,"abstract":"The use of prior knowledge in the machine learning framework has been considered a potential tool to handle the curse of dimensionality in genetic and genomics data. Although random forest (RF) represents a flexible non-parametric approach with several advantages, it can provide poor accuracy in high-dimensional settings, mainly in scenarios with small sample sizes. We propose a knowledge-slanted RF that integrates biological networks as prior knowledge into the model to improve its performance and explainability, exemplifying its use for selecting and identifying relevant genes. knowledge-slanted RF is a combination of two stages. First, prior knowledge represented by graphs is translated by running a random walk with restart algorithm to determine the relevance of each gene based on its connection and localization on a protein-protein interaction network. Then, each relevance is used to modify the selection probability to draw a gene as a candidate split-feature in the conventional RF. Experiments in simulated datasets with very small sample sizes $$(n le 30)$$ comparing knowledge-slanted RF against conventional RF and logistic lasso regression, suggest an improved precision in outcome prediction compared to the other methods. The knowledge-slanted RF was completed with the introduction of a modified version of the Boruta feature selection algorithm. Finally, knowledge-slanted RF identified more relevant biological genes, offering a higher level of explainability for users than conventional RF. These findings were corroborated in one real case to identify relevant genes to calcific aortic valve stenosis.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"10 1","pages":""},"PeriodicalIF":4.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142222264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0