Gideon Vos , Liza van Eijk , Zoltan Sarnyai , Mostafa Rahimi Azghadi
{"title":"稳定机器学习以获得可重复和可解释的结果:一种针对特定主题的新验证方法","authors":"Gideon Vos , Liza van Eijk , Zoltan Sarnyai , Mostafa Rahimi Azghadi","doi":"10.1016/j.cmpb.2025.108899","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction:</h3><div>Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, environment, and lifestyle, often limits their effectiveness. This has driven a shift towards subject-specific models that incorporate individual biological and clinical data for more precise predictions and personalized care. However, developing these models presents significant practical and financial challenges. Additionally, ML models initialized through stochastic processes with random seeds can suffer from reproducibility issues when those seeds are changed, leading to variations in predictive performance and feature importance. To address this, this study introduces a novel validation approach to enhance model interpretability, stabilizing predictive performance and feature importance at both the group and subject-specific levels.</div></div><div><h3>Methods:</h3><div>We conducted initial experiments using a single Random Forest (RF) model initialized with a random seed for key stochastic processes, on nine datasets that varied in domain problems, sample size, and demographics. Different validation techniques were applied to assess model accuracy and reproducibility while evaluating feature importance consistency. Next, the experiment was repeated for each dataset for up to 400 trials per subject, randomly seeding the machine learning algorithm between each trial. This introduced variability in the initialization of model parameters, thus providing a more comprehensive evaluation of the machine learning model’s features and performance consistency. The repeated trials generated up to 400 feature sets per subject. By aggregating feature importance rankings across trials, our method identified the most consistently important features, reducing the impact of noise and random variation in feature selection. The top subject-specific feature importance set across all trials was then identified. Finally, using all subject-specific feature sets, the top group-specific feature importance set was also created. This process resulted in stable, reproducible feature rankings, enhancing both subject-level and group-level model explainability.</div></div><div><h3>Results:</h3><div>We found that machine learning models with stochastic initialization were particularly susceptible to variations in reproducibility, predictive accuracy, and feature importance due to random seed selection and validation techniques during training. Changes in random seeds altered weight initialization, optimization paths, and feature rankings, leading to fluctuations in test accuracy and interpretability. These findings align with prior research on the sensitivity of stochastic models to initialization randomness. This study builds on that understanding by introducing a novel repeated trials validation approach with random seed variation, significantly reducing variability in feature rankings and improving the consistency of model performance metrics. The method enabled robust identification of key features for each subject using a single, generic machine learning model, making predictions more interpretable and stable across experiments.</div></div><div><h3>Conclusion:</h3><div>Subject-specific models improve generalization by addressing variability in human biology but are often costly and impractical for clinical trials. In this study, we introduce a novel validation technique for determining both group- and subject-specific feature importance within a general machine learning model, achieving greater stability in feature selection, higher predictive accuracy, and improved model interpretability. Our proposed approach ensures reproducible accuracy metrics and reliable feature rankings when using models incorporating stochastic processes, making machine learning models more robust and clinically applicable.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"269 ","pages":"Article 108899"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights\",\"authors\":\"Gideon Vos , Liza van Eijk , Zoltan Sarnyai , Mostafa Rahimi Azghadi\",\"doi\":\"10.1016/j.cmpb.2025.108899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction:</h3><div>Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, environment, and lifestyle, often limits their effectiveness. This has driven a shift towards subject-specific models that incorporate individual biological and clinical data for more precise predictions and personalized care. However, developing these models presents significant practical and financial challenges. Additionally, ML models initialized through stochastic processes with random seeds can suffer from reproducibility issues when those seeds are changed, leading to variations in predictive performance and feature importance. To address this, this study introduces a novel validation approach to enhance model interpretability, stabilizing predictive performance and feature importance at both the group and subject-specific levels.</div></div><div><h3>Methods:</h3><div>We conducted initial experiments using a single Random Forest (RF) model initialized with a random seed for key stochastic processes, on nine datasets that varied in domain problems, sample size, and demographics. Different validation techniques were applied to assess model accuracy and reproducibility while evaluating feature importance consistency. Next, the experiment was repeated for each dataset for up to 400 trials per subject, randomly seeding the machine learning algorithm between each trial. This introduced variability in the initialization of model parameters, thus providing a more comprehensive evaluation of the machine learning model’s features and performance consistency. The repeated trials generated up to 400 feature sets per subject. By aggregating feature importance rankings across trials, our method identified the most consistently important features, reducing the impact of noise and random variation in feature selection. The top subject-specific feature importance set across all trials was then identified. Finally, using all subject-specific feature sets, the top group-specific feature importance set was also created. This process resulted in stable, reproducible feature rankings, enhancing both subject-level and group-level model explainability.</div></div><div><h3>Results:</h3><div>We found that machine learning models with stochastic initialization were particularly susceptible to variations in reproducibility, predictive accuracy, and feature importance due to random seed selection and validation techniques during training. Changes in random seeds altered weight initialization, optimization paths, and feature rankings, leading to fluctuations in test accuracy and interpretability. These findings align with prior research on the sensitivity of stochastic models to initialization randomness. This study builds on that understanding by introducing a novel repeated trials validation approach with random seed variation, significantly reducing variability in feature rankings and improving the consistency of model performance metrics. The method enabled robust identification of key features for each subject using a single, generic machine learning model, making predictions more interpretable and stable across experiments.</div></div><div><h3>Conclusion:</h3><div>Subject-specific models improve generalization by addressing variability in human biology but are often costly and impractical for clinical trials. In this study, we introduce a novel validation technique for determining both group- and subject-specific feature importance within a general machine learning model, achieving greater stability in feature selection, higher predictive accuracy, and improved model interpretability. Our proposed approach ensures reproducible accuracy metrics and reliable feature rankings when using models incorporating stochastic processes, making machine learning models more robust and clinically applicable.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"269 \",\"pages\":\"Article 108899\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725003165\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725003165","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights
Introduction:
Machine Learning (ML) is transforming medical research by enhancing diagnostic accuracy, predicting disease progression, and personalizing treatments. While general models trained on large datasets identify broad patterns across populations, the diversity of human biology, shaped by genetics, environment, and lifestyle, often limits their effectiveness. This has driven a shift towards subject-specific models that incorporate individual biological and clinical data for more precise predictions and personalized care. However, developing these models presents significant practical and financial challenges. Additionally, ML models initialized through stochastic processes with random seeds can suffer from reproducibility issues when those seeds are changed, leading to variations in predictive performance and feature importance. To address this, this study introduces a novel validation approach to enhance model interpretability, stabilizing predictive performance and feature importance at both the group and subject-specific levels.
Methods:
We conducted initial experiments using a single Random Forest (RF) model initialized with a random seed for key stochastic processes, on nine datasets that varied in domain problems, sample size, and demographics. Different validation techniques were applied to assess model accuracy and reproducibility while evaluating feature importance consistency. Next, the experiment was repeated for each dataset for up to 400 trials per subject, randomly seeding the machine learning algorithm between each trial. This introduced variability in the initialization of model parameters, thus providing a more comprehensive evaluation of the machine learning model’s features and performance consistency. The repeated trials generated up to 400 feature sets per subject. By aggregating feature importance rankings across trials, our method identified the most consistently important features, reducing the impact of noise and random variation in feature selection. The top subject-specific feature importance set across all trials was then identified. Finally, using all subject-specific feature sets, the top group-specific feature importance set was also created. This process resulted in stable, reproducible feature rankings, enhancing both subject-level and group-level model explainability.
Results:
We found that machine learning models with stochastic initialization were particularly susceptible to variations in reproducibility, predictive accuracy, and feature importance due to random seed selection and validation techniques during training. Changes in random seeds altered weight initialization, optimization paths, and feature rankings, leading to fluctuations in test accuracy and interpretability. These findings align with prior research on the sensitivity of stochastic models to initialization randomness. This study builds on that understanding by introducing a novel repeated trials validation approach with random seed variation, significantly reducing variability in feature rankings and improving the consistency of model performance metrics. The method enabled robust identification of key features for each subject using a single, generic machine learning model, making predictions more interpretable and stable across experiments.
Conclusion:
Subject-specific models improve generalization by addressing variability in human biology but are often costly and impractical for clinical trials. In this study, we introduce a novel validation technique for determining both group- and subject-specific feature importance within a general machine learning model, achieving greater stability in feature selection, higher predictive accuracy, and improved model interpretability. Our proposed approach ensures reproducible accuracy metrics and reliable feature rankings when using models incorporating stochastic processes, making machine learning models more robust and clinically applicable.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.