Zuhai Hu , Xiaosheng Li , Yuliang Yuan , Qianjie Xu , Wei Zhang , Haike Lei
{"title":"预测结直肠癌患者静脉血栓栓塞的机器学习模型的开发和验证:中国的一项队列研究。","authors":"Zuhai Hu , Xiaosheng Li , Yuliang Yuan , Qianjie Xu , Wei Zhang , Haike Lei","doi":"10.1016/j.ijmedinf.2024.105770","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods.</div></div><div><h3>Objective</h3><div>Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility.</div></div><div><h3>Methods</h3><div>Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer. In the modeling process, LASSO regression was employed to identify and exclude variables not associated with VTE. Additionally, hyperparameter tuning was conducted via 5-fold cross-validation to mitigate overfitting, and 200 bootstrap samples were used to adjust the apparent performance on the training set. The selection of the VTE assessment model was determined by a thorough evaluation of performance criteria, such as the AUC, ACC and F1 score.</div></div><div><h3>Results</h3><div>The RF model exhibits consistent and efficient performance. Specifically, in the internally validation dataset, where generalizability was adjusted, the RF model achieved the highest scores across multiple metrics: AD-AUC (0.895), AD-ACC (0.871), AD-F1 (0.311), AD-MCC (0.316), AD-Precision (0.241), AD-Specificity (0.888). For external validation on unseen colon cancer data, the RF model also performed best in terms of ACC (0.728), F1 (0.292), MCC (0.225), Precision (0.192), and Specificity (0.740), with a suboptimal AUC of 0.745 and a Sensitivity (Recall) of 0.615. Additionally, the RF model demonstrates strong performance not only on the original dataset but also on datasets processed via alternative imbalance handling techniques.</div></div><div><h3>Conclusions</h3><div>Our research successfully established and validated a risk assessment model for assessing the risk of VTE in colorectal cancer patients.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"195 ","pages":"Article 105770"},"PeriodicalIF":3.7000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of machine learning models for predicting venous thromboembolism in colorectal cancer patients: A cohort study in China\",\"authors\":\"Zuhai Hu , Xiaosheng Li , Yuliang Yuan , Qianjie Xu , Wei Zhang , Haike Lei\",\"doi\":\"10.1016/j.ijmedinf.2024.105770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods.</div></div><div><h3>Objective</h3><div>Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility.</div></div><div><h3>Methods</h3><div>Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer. In the modeling process, LASSO regression was employed to identify and exclude variables not associated with VTE. Additionally, hyperparameter tuning was conducted via 5-fold cross-validation to mitigate overfitting, and 200 bootstrap samples were used to adjust the apparent performance on the training set. The selection of the VTE assessment model was determined by a thorough evaluation of performance criteria, such as the AUC, ACC and F1 score.</div></div><div><h3>Results</h3><div>The RF model exhibits consistent and efficient performance. Specifically, in the internally validation dataset, where generalizability was adjusted, the RF model achieved the highest scores across multiple metrics: AD-AUC (0.895), AD-ACC (0.871), AD-F1 (0.311), AD-MCC (0.316), AD-Precision (0.241), AD-Specificity (0.888). For external validation on unseen colon cancer data, the RF model also performed best in terms of ACC (0.728), F1 (0.292), MCC (0.225), Precision (0.192), and Specificity (0.740), with a suboptimal AUC of 0.745 and a Sensitivity (Recall) of 0.615. Additionally, the RF model demonstrates strong performance not only on the original dataset but also on datasets processed via alternative imbalance handling techniques.</div></div><div><h3>Conclusions</h3><div>Our research successfully established and validated a risk assessment model for assessing the risk of VTE in colorectal cancer patients.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"195 \",\"pages\":\"Article 105770\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624004337\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624004337","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Development and validation of machine learning models for predicting venous thromboembolism in colorectal cancer patients: A cohort study in China
Background
With advancements in healthcare, traditional VTE risk assessment tools are increasingly insufficient to meet the demands of high-quality care, underscoring the need for innovative and specialized assessment methods.
Objective
Owing to the remarkable success of machine learning in supervised learning and disease prediction, our objective is to develop a reliable and efficient model for assessing VTE risk by leveraging the fundamental data and clinical characteristics of colorectal cancer patients within our medical facility.
Methods
Six commonly used machine learning algorithms were utilized in our study to predict the occurrence of VTE in patients with rectal cancer. In the modeling process, LASSO regression was employed to identify and exclude variables not associated with VTE. Additionally, hyperparameter tuning was conducted via 5-fold cross-validation to mitigate overfitting, and 200 bootstrap samples were used to adjust the apparent performance on the training set. The selection of the VTE assessment model was determined by a thorough evaluation of performance criteria, such as the AUC, ACC and F1 score.
Results
The RF model exhibits consistent and efficient performance. Specifically, in the internally validation dataset, where generalizability was adjusted, the RF model achieved the highest scores across multiple metrics: AD-AUC (0.895), AD-ACC (0.871), AD-F1 (0.311), AD-MCC (0.316), AD-Precision (0.241), AD-Specificity (0.888). For external validation on unseen colon cancer data, the RF model also performed best in terms of ACC (0.728), F1 (0.292), MCC (0.225), Precision (0.192), and Specificity (0.740), with a suboptimal AUC of 0.745 and a Sensitivity (Recall) of 0.615. Additionally, the RF model demonstrates strong performance not only on the original dataset but also on datasets processed via alternative imbalance handling techniques.
Conclusions
Our research successfully established and validated a risk assessment model for assessing the risk of VTE in colorectal cancer patients.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.