Mohamad Zulfikrie Abas, Ken Li, Noran Naqiah Hairi, Wan Yuen Choo, Kim Sui Wan
{"title":"Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol.","authors":"Mohamad Zulfikrie Abas, Ken Li, Noran Naqiah Hairi, Wan Yuen Choo, Kim Sui Wan","doi":"10.1177/22799036241231786","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.</p><p><strong>Design and methods: </strong>This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, <i>k</i>-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified <i>k</i>-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.</p><p><strong>Expected impact of the study on public health: </strong>The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.</p>","PeriodicalId":45958,"journal":{"name":"Journal of Public Health Research","volume":"13 1","pages":"22799036241231786"},"PeriodicalIF":1.6000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906050/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Public Health Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/22799036241231786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.
Design and methods: This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified k-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.
Expected impact of the study on public health: The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.
期刊介绍:
The Journal of Public Health Research (JPHR) is an online Open Access, peer-reviewed journal in the field of public health science. The aim of the journal is to stimulate debate and dissemination of knowledge in the public health field in order to improve efficacy, effectiveness and efficiency of public health interventions to improve health outcomes of populations. This aim can only be achieved by adopting a global and multidisciplinary approach. The Journal of Public Health Research publishes contributions from both the “traditional'' disciplines of public health, including hygiene, epidemiology, health education, environmental health, occupational health, health policy, hospital management, health economics, law and ethics as well as from the area of new health care fields including social science, communication science, eHealth and mHealth philosophy, health technology assessment, genetics research implications, population-mental health, gender and disparity issues, global and migration-related themes. In support of this approach, JPHR strongly encourages the use of real multidisciplinary approaches and analyses in the manuscripts submitted to the journal. In addition to Original research, Systematic Review, Meta-analysis, Meta-synthesis and Perspectives and Debate articles, JPHR publishes newsworthy Brief Reports, Letters and Study Protocols related to public health and public health management activities.