{"title":"基于数据转换的深度学习改善了口腔癌前病变的癌症风险预测","authors":"John Adeoye, Yuxiong Su","doi":"10.1016/j.imed.2024.11.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Oral cancer is the most common head and neck malignancy and may develop from oral leukoplakia (OL) and oral lichenoid disease (OLD). Machine learning classifiers using structured (tabular) data have been employed to predict malignant transformation in OL and OLD. However, current models require improved discrimination, and their frameworks may limit feature fusion and multimodal risk prediction. Therefore, this study investigates whether tabular-to-image data conversion and deep learning (DL) based on convolutional neural networks (CNNs) can improve malignant transformation prediction compared to traditional classifiers.</div></div><div><h3>Methods</h3><div>This study used retrospective data of 1,010 patients with OL and OLD treated at Queen Mary Hospital, Hong Kong, from January 2003 to December 2023, to construct artificial intelligence-based models for oral cancer risk stratification in OL/OLD. Twenty-five input features and information on oral cancer development in OL/OLD were retrieved from electronic health records. Tabular-to-2D image data transformation was achieved by creating a feature matrix from encoded labels of the input variables arranged according to their correlation coefficient. Then, 2D images were used to populate five pre-trained DL models (VGG16, VGG19, MobileNetV2, ResNet50, and EfficientNet-B0). Area under the receiver operating characteristic curve (AUC), Brier scores, and net benefit of the DL models were calculated and compared to five traditional classifiers based on structured data and the binary epithelial dysplasia grading system (current method).</div></div><div><h3>Results</h3><div>This study found that the DL models had better AUC values (0.893–0.955) and Brier scores (0.072–0.106) compared to the traditional classifiers (AUC: 0.887–0.941 and Brier score: 0.074–0.136) during validation. During internal testing, VGG16 and VGG19 had better AUC values and Brier scores than other CNNs (AUC: 0.998–1.00; Brier score: 0.036–0.044) and the best traditional classifier (random forest) (AUC: 0.906; Brier score: 0.153). Additionally, VGG16 and VGG19 models outperformed random forest in discrimination and calibration during external testing (AUC: 1.00 <em>vs</em>. 0.976; Brier score: 0.022–0.034 <em>vs</em>. 0.129). The best CNNs also had better discriminatory performance and calibration than binary dysplasia grading at internal and external testing. Overall, decision curve analysis showed that the optimal DL models with transformed data had a higher net benefit than random forest and binary dysplasia grading.</div></div><div><h3>Conclusion</h3><div>Tabular-to-2D image data transformation may improve the use of structured input features for developing optimal intelligent models for oral cancer risk prediction in OL and OLD using convolutional networks. This approach may have the potential to robustly handle structured data in multimodal DL frameworks for oncological outcome prediction.</div></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"5 2","pages":"Pages 141-150"},"PeriodicalIF":6.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning with data transformation improves cancer risk prediction in oral precancerous conditions\",\"authors\":\"John Adeoye, Yuxiong Su\",\"doi\":\"10.1016/j.imed.2024.11.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Oral cancer is the most common head and neck malignancy and may develop from oral leukoplakia (OL) and oral lichenoid disease (OLD). Machine learning classifiers using structured (tabular) data have been employed to predict malignant transformation in OL and OLD. However, current models require improved discrimination, and their frameworks may limit feature fusion and multimodal risk prediction. Therefore, this study investigates whether tabular-to-image data conversion and deep learning (DL) based on convolutional neural networks (CNNs) can improve malignant transformation prediction compared to traditional classifiers.</div></div><div><h3>Methods</h3><div>This study used retrospective data of 1,010 patients with OL and OLD treated at Queen Mary Hospital, Hong Kong, from January 2003 to December 2023, to construct artificial intelligence-based models for oral cancer risk stratification in OL/OLD. Twenty-five input features and information on oral cancer development in OL/OLD were retrieved from electronic health records. Tabular-to-2D image data transformation was achieved by creating a feature matrix from encoded labels of the input variables arranged according to their correlation coefficient. Then, 2D images were used to populate five pre-trained DL models (VGG16, VGG19, MobileNetV2, ResNet50, and EfficientNet-B0). Area under the receiver operating characteristic curve (AUC), Brier scores, and net benefit of the DL models were calculated and compared to five traditional classifiers based on structured data and the binary epithelial dysplasia grading system (current method).</div></div><div><h3>Results</h3><div>This study found that the DL models had better AUC values (0.893–0.955) and Brier scores (0.072–0.106) compared to the traditional classifiers (AUC: 0.887–0.941 and Brier score: 0.074–0.136) during validation. During internal testing, VGG16 and VGG19 had better AUC values and Brier scores than other CNNs (AUC: 0.998–1.00; Brier score: 0.036–0.044) and the best traditional classifier (random forest) (AUC: 0.906; Brier score: 0.153). Additionally, VGG16 and VGG19 models outperformed random forest in discrimination and calibration during external testing (AUC: 1.00 <em>vs</em>. 0.976; Brier score: 0.022–0.034 <em>vs</em>. 0.129). The best CNNs also had better discriminatory performance and calibration than binary dysplasia grading at internal and external testing. Overall, decision curve analysis showed that the optimal DL models with transformed data had a higher net benefit than random forest and binary dysplasia grading.</div></div><div><h3>Conclusion</h3><div>Tabular-to-2D image data transformation may improve the use of structured input features for developing optimal intelligent models for oral cancer risk prediction in OL and OLD using convolutional networks. This approach may have the potential to robustly handle structured data in multimodal DL frameworks for oncological outcome prediction.</div></div>\",\"PeriodicalId\":73400,\"journal\":{\"name\":\"Intelligent medicine\",\"volume\":\"5 2\",\"pages\":\"Pages 141-150\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667102625000300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102625000300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Deep learning with data transformation improves cancer risk prediction in oral precancerous conditions
Background
Oral cancer is the most common head and neck malignancy and may develop from oral leukoplakia (OL) and oral lichenoid disease (OLD). Machine learning classifiers using structured (tabular) data have been employed to predict malignant transformation in OL and OLD. However, current models require improved discrimination, and their frameworks may limit feature fusion and multimodal risk prediction. Therefore, this study investigates whether tabular-to-image data conversion and deep learning (DL) based on convolutional neural networks (CNNs) can improve malignant transformation prediction compared to traditional classifiers.
Methods
This study used retrospective data of 1,010 patients with OL and OLD treated at Queen Mary Hospital, Hong Kong, from January 2003 to December 2023, to construct artificial intelligence-based models for oral cancer risk stratification in OL/OLD. Twenty-five input features and information on oral cancer development in OL/OLD were retrieved from electronic health records. Tabular-to-2D image data transformation was achieved by creating a feature matrix from encoded labels of the input variables arranged according to their correlation coefficient. Then, 2D images were used to populate five pre-trained DL models (VGG16, VGG19, MobileNetV2, ResNet50, and EfficientNet-B0). Area under the receiver operating characteristic curve (AUC), Brier scores, and net benefit of the DL models were calculated and compared to five traditional classifiers based on structured data and the binary epithelial dysplasia grading system (current method).
Results
This study found that the DL models had better AUC values (0.893–0.955) and Brier scores (0.072–0.106) compared to the traditional classifiers (AUC: 0.887–0.941 and Brier score: 0.074–0.136) during validation. During internal testing, VGG16 and VGG19 had better AUC values and Brier scores than other CNNs (AUC: 0.998–1.00; Brier score: 0.036–0.044) and the best traditional classifier (random forest) (AUC: 0.906; Brier score: 0.153). Additionally, VGG16 and VGG19 models outperformed random forest in discrimination and calibration during external testing (AUC: 1.00 vs. 0.976; Brier score: 0.022–0.034 vs. 0.129). The best CNNs also had better discriminatory performance and calibration than binary dysplasia grading at internal and external testing. Overall, decision curve analysis showed that the optimal DL models with transformed data had a higher net benefit than random forest and binary dysplasia grading.
Conclusion
Tabular-to-2D image data transformation may improve the use of structured input features for developing optimal intelligent models for oral cancer risk prediction in OL and OLD using convolutional networks. This approach may have the potential to robustly handle structured data in multimodal DL frameworks for oncological outcome prediction.