Junmo Kim , Su Hyun Park , Hyesu Lee , Su Kyoung Lee , Jihye Kim , Suhyun Kim , Yong Jin Kwon , Kwangsoo Kim
{"title":"使用机器学习识别潜在的医疗援助受益人:韩国全国队列研究。","authors":"Junmo Kim , Su Hyun Park , Hyesu Lee , Su Kyoung Lee , Jihye Kim , Suhyun Kim , Yong Jin Kwon , Kwangsoo Kim","doi":"10.1016/j.ijmedinf.2024.105775","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To identify potential medical aid beneficiaries using demographic and medical history of individuals and analyzing important features qualitatively.</div></div><div><h3>Methods</h3><div>This retrospective, national cohort, case-control study included data from the National Health Insurance Service (NHIS) in Korea between January 1, 2002 and December 31, 2019. Potential medical aid beneficiaries were classified using several machine learning models (linear models and tree-based models). Demographic data such as age, sex, region, insurance type, insurance fee, and medical history such as diagnosis, operation, statement, visits, and costs were collected. Those data were transformed into a one-dimensional vector for each individual, allowing machine learning models to learn. For feature importance calculation, we used the average gain across all splits for each feature.</div></div><div><h3>Results</h3><div>274,635 individuals were finally included in the study population, and 62,501 were classified as potential medical aid beneficiaries. XGBoost successfully classified potential medical aid beneficiaries with an AUROC of around 0.891. Assuming predicting before two years, the performance was still significant with an AUROC of around 0.832. Economic variables, such as insurance fees and several costs, turned out to be the most important, but variables regarding medical status, such as the results of blood tests and history of chronic diseases, were also important.</div></div><div><h3>Conclusion</h3><div>Machine learning-based models successfully screened potential medical aid beneficiaries. Qualitative analysis of important features well reflected prior knowledge regarding public health. These findings can contribute to the soundness of healthcare finance and the improvement of public health.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105775"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying potential medical aid beneficiaries using machine learning: A Korean Nationwide cohort study\",\"authors\":\"Junmo Kim , Su Hyun Park , Hyesu Lee , Su Kyoung Lee , Jihye Kim , Suhyun Kim , Yong Jin Kwon , Kwangsoo Kim\",\"doi\":\"10.1016/j.ijmedinf.2024.105775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>To identify potential medical aid beneficiaries using demographic and medical history of individuals and analyzing important features qualitatively.</div></div><div><h3>Methods</h3><div>This retrospective, national cohort, case-control study included data from the National Health Insurance Service (NHIS) in Korea between January 1, 2002 and December 31, 2019. Potential medical aid beneficiaries were classified using several machine learning models (linear models and tree-based models). Demographic data such as age, sex, region, insurance type, insurance fee, and medical history such as diagnosis, operation, statement, visits, and costs were collected. Those data were transformed into a one-dimensional vector for each individual, allowing machine learning models to learn. For feature importance calculation, we used the average gain across all splits for each feature.</div></div><div><h3>Results</h3><div>274,635 individuals were finally included in the study population, and 62,501 were classified as potential medical aid beneficiaries. XGBoost successfully classified potential medical aid beneficiaries with an AUROC of around 0.891. Assuming predicting before two years, the performance was still significant with an AUROC of around 0.832. Economic variables, such as insurance fees and several costs, turned out to be the most important, but variables regarding medical status, such as the results of blood tests and history of chronic diseases, were also important.</div></div><div><h3>Conclusion</h3><div>Machine learning-based models successfully screened potential medical aid beneficiaries. Qualitative analysis of important features well reflected prior knowledge regarding public health. These findings can contribute to the soundness of healthcare finance and the improvement of public health.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"195 \",\"pages\":\"Article 105775\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624004386\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624004386","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Identifying potential medical aid beneficiaries using machine learning: A Korean Nationwide cohort study
Objective
To identify potential medical aid beneficiaries using demographic and medical history of individuals and analyzing important features qualitatively.
Methods
This retrospective, national cohort, case-control study included data from the National Health Insurance Service (NHIS) in Korea between January 1, 2002 and December 31, 2019. Potential medical aid beneficiaries were classified using several machine learning models (linear models and tree-based models). Demographic data such as age, sex, region, insurance type, insurance fee, and medical history such as diagnosis, operation, statement, visits, and costs were collected. Those data were transformed into a one-dimensional vector for each individual, allowing machine learning models to learn. For feature importance calculation, we used the average gain across all splits for each feature.
Results
274,635 individuals were finally included in the study population, and 62,501 were classified as potential medical aid beneficiaries. XGBoost successfully classified potential medical aid beneficiaries with an AUROC of around 0.891. Assuming predicting before two years, the performance was still significant with an AUROC of around 0.832. Economic variables, such as insurance fees and several costs, turned out to be the most important, but variables regarding medical status, such as the results of blood tests and history of chronic diseases, were also important.
Conclusion
Machine learning-based models successfully screened potential medical aid beneficiaries. Qualitative analysis of important features well reflected prior knowledge regarding public health. These findings can contribute to the soundness of healthcare finance and the improvement of public health.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.