{"title":"基于双参数磁共振成像表观扩散系数图放射组学和机器学习算法堆叠,开发前列腺癌诊断预后模型","authors":"A. I. Kuznetsov","doi":"10.17816/dd626145","DOIUrl":null,"url":null,"abstract":"BACKGROUND: Prostate cancer is one of the most common cancers among men [1, 2]. In recent years, a number of prognostic models based on texture analysis of biparametric magnetic resonance images have been created. The research has shown that radiomics features extracted from apparent diffusion coefficient maps are the most reproducible [3]. However, the models were limited in accuracy, since they are built using a single machine learning algorithm, which takes into account only linear dependences [4–6]. \nAIM: Increasing the accuracy of a prognostic model diagnosing prostate cancer through the use of stacking machine learning algorithms that takes into account not only linear, but also nonlinear dependencies based on radiomics of biparametric magnetic resonance imaging apparent diffusion coefficient maps. \nMATERIALS AND METHODS: A single-center cohort retrospective study of patients with suspected prostate cancer was conducted in the X-ray Diagnostics and Tomography Department of the United Hospital and Polyclinic (Moscow, Russia) from 2017 to 2023. The presence of prostate cancer was confirmed by biopsy or radical prostatectomy. Statistical analyses was performed using Python 3.11. \nRESULTS: The study involved 67 men aged 60 [54; 66] years, of which 57 were diagnosed with prostate cancer, and 10 — with benign prostate formation. The LIFEx software identified 96 radiomic features. \nStatistically significant differences were found for: PARAMS_ZSpatialResampling (the voxel size of the image: Z dimension) (p=0.001), SHAPE_Sphericity[onlyFor3DROI] (how spherical a Volume of Interest is) (p=0.006), SHAPE_Compacity[onlyFor3DROI] (how compact the Volume of Interest is) (p=0.004), GLRLM_HGRE (p=0.039), GLRLM_SRHGE (p=0.041), GLRLM_RLNU (p=0.039), where GLRLM — Grey-Level Run Length Matrix. Univariate logistic regression showed that SHAPE_Compacity[onlyFor3DROI] (R2=15%) and PARAMS_ZSpatialResampling (R2=18%) had a statistically significant effect on the outcome. First, using the multivariate logistic regression method, a prognostic model was built that takes into account only linear dependencies. The model includes 3 features that together have a statistically significant effect on the outcome (R2=23%): SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_RLNU. \nTo describe nonlinear relationships, another model was built based on the “Decision Tree” algorithm. It included 4 indicators (R2=58%): DISCRETIZED_HISTO_Entropy_log10 (the randomness of the distribution), SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_SRE. \nStacking of algorithms, which consists of calculating the arithmetic mean between the predictions of the multivariate logistic regression and “Decision Tree” algorithms, made it possible to construct a model that takes into account both linear and nonlinear dependencies. The model includes 5 features (R2=77%). The constructed model formed the basis of the developed calculator program [7], currently introduced into a radiology practice. \nCONCLUSION: The new model built on the basis of apparent diffusion coefficient maps performs better (area under ROC-curve 99.0% [97.7; 100.0]) than the existing models with area under ROC-curve 83.6% [78.3; 88.9], which also show high heterogeneity (I2=71%). The accuracy of the new model was increased due to the use of stacking machine learning technologies, which made it possible to take into account both linear and nonlinear effects from features on the outcome.","PeriodicalId":34831,"journal":{"name":"Digital Diagnostics","volume":"57 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of a prognostic model for diagnosis of prostate cancer based on radiomics of biparametric magnetic resonance imaging apparent diffusion coefficient maps and stacking of machine learning algorithms\",\"authors\":\"A. I. Kuznetsov\",\"doi\":\"10.17816/dd626145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BACKGROUND: Prostate cancer is one of the most common cancers among men [1, 2]. In recent years, a number of prognostic models based on texture analysis of biparametric magnetic resonance images have been created. The research has shown that radiomics features extracted from apparent diffusion coefficient maps are the most reproducible [3]. However, the models were limited in accuracy, since they are built using a single machine learning algorithm, which takes into account only linear dependences [4–6]. \\nAIM: Increasing the accuracy of a prognostic model diagnosing prostate cancer through the use of stacking machine learning algorithms that takes into account not only linear, but also nonlinear dependencies based on radiomics of biparametric magnetic resonance imaging apparent diffusion coefficient maps. \\nMATERIALS AND METHODS: A single-center cohort retrospective study of patients with suspected prostate cancer was conducted in the X-ray Diagnostics and Tomography Department of the United Hospital and Polyclinic (Moscow, Russia) from 2017 to 2023. The presence of prostate cancer was confirmed by biopsy or radical prostatectomy. Statistical analyses was performed using Python 3.11. \\nRESULTS: The study involved 67 men aged 60 [54; 66] years, of which 57 were diagnosed with prostate cancer, and 10 — with benign prostate formation. The LIFEx software identified 96 radiomic features. \\nStatistically significant differences were found for: PARAMS_ZSpatialResampling (the voxel size of the image: Z dimension) (p=0.001), SHAPE_Sphericity[onlyFor3DROI] (how spherical a Volume of Interest is) (p=0.006), SHAPE_Compacity[onlyFor3DROI] (how compact the Volume of Interest is) (p=0.004), GLRLM_HGRE (p=0.039), GLRLM_SRHGE (p=0.041), GLRLM_RLNU (p=0.039), where GLRLM — Grey-Level Run Length Matrix. Univariate logistic regression showed that SHAPE_Compacity[onlyFor3DROI] (R2=15%) and PARAMS_ZSpatialResampling (R2=18%) had a statistically significant effect on the outcome. First, using the multivariate logistic regression method, a prognostic model was built that takes into account only linear dependencies. The model includes 3 features that together have a statistically significant effect on the outcome (R2=23%): SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_RLNU. \\nTo describe nonlinear relationships, another model was built based on the “Decision Tree” algorithm. It included 4 indicators (R2=58%): DISCRETIZED_HISTO_Entropy_log10 (the randomness of the distribution), SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_SRE. \\nStacking of algorithms, which consists of calculating the arithmetic mean between the predictions of the multivariate logistic regression and “Decision Tree” algorithms, made it possible to construct a model that takes into account both linear and nonlinear dependencies. The model includes 5 features (R2=77%). The constructed model formed the basis of the developed calculator program [7], currently introduced into a radiology practice. \\nCONCLUSION: The new model built on the basis of apparent diffusion coefficient maps performs better (area under ROC-curve 99.0% [97.7; 100.0]) than the existing models with area under ROC-curve 83.6% [78.3; 88.9], which also show high heterogeneity (I2=71%). The accuracy of the new model was increased due to the use of stacking machine learning technologies, which made it possible to take into account both linear and nonlinear effects from features on the outcome.\",\"PeriodicalId\":34831,\"journal\":{\"name\":\"Digital Diagnostics\",\"volume\":\"57 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Diagnostics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17816/dd626145\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Diagnostics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17816/dd626145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a prognostic model for diagnosis of prostate cancer based on radiomics of biparametric magnetic resonance imaging apparent diffusion coefficient maps and stacking of machine learning algorithms
BACKGROUND: Prostate cancer is one of the most common cancers among men [1, 2]. In recent years, a number of prognostic models based on texture analysis of biparametric magnetic resonance images have been created. The research has shown that radiomics features extracted from apparent diffusion coefficient maps are the most reproducible [3]. However, the models were limited in accuracy, since they are built using a single machine learning algorithm, which takes into account only linear dependences [4–6].
AIM: Increasing the accuracy of a prognostic model diagnosing prostate cancer through the use of stacking machine learning algorithms that takes into account not only linear, but also nonlinear dependencies based on radiomics of biparametric magnetic resonance imaging apparent diffusion coefficient maps.
MATERIALS AND METHODS: A single-center cohort retrospective study of patients with suspected prostate cancer was conducted in the X-ray Diagnostics and Tomography Department of the United Hospital and Polyclinic (Moscow, Russia) from 2017 to 2023. The presence of prostate cancer was confirmed by biopsy or radical prostatectomy. Statistical analyses was performed using Python 3.11.
RESULTS: The study involved 67 men aged 60 [54; 66] years, of which 57 were diagnosed with prostate cancer, and 10 — with benign prostate formation. The LIFEx software identified 96 radiomic features.
Statistically significant differences were found for: PARAMS_ZSpatialResampling (the voxel size of the image: Z dimension) (p=0.001), SHAPE_Sphericity[onlyFor3DROI] (how spherical a Volume of Interest is) (p=0.006), SHAPE_Compacity[onlyFor3DROI] (how compact the Volume of Interest is) (p=0.004), GLRLM_HGRE (p=0.039), GLRLM_SRHGE (p=0.041), GLRLM_RLNU (p=0.039), where GLRLM — Grey-Level Run Length Matrix. Univariate logistic regression showed that SHAPE_Compacity[onlyFor3DROI] (R2=15%) and PARAMS_ZSpatialResampling (R2=18%) had a statistically significant effect on the outcome. First, using the multivariate logistic regression method, a prognostic model was built that takes into account only linear dependencies. The model includes 3 features that together have a statistically significant effect on the outcome (R2=23%): SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_RLNU.
To describe nonlinear relationships, another model was built based on the “Decision Tree” algorithm. It included 4 indicators (R2=58%): DISCRETIZED_HISTO_Entropy_log10 (the randomness of the distribution), SHAPE_Sphericity[onlyFor3DROI], PARAMS_ZSpatialResampling and GLRLM_SRE.
Stacking of algorithms, which consists of calculating the arithmetic mean between the predictions of the multivariate logistic regression and “Decision Tree” algorithms, made it possible to construct a model that takes into account both linear and nonlinear dependencies. The model includes 5 features (R2=77%). The constructed model formed the basis of the developed calculator program [7], currently introduced into a radiology practice.
CONCLUSION: The new model built on the basis of apparent diffusion coefficient maps performs better (area under ROC-curve 99.0% [97.7; 100.0]) than the existing models with area under ROC-curve 83.6% [78.3; 88.9], which also show high heterogeneity (I2=71%). The accuracy of the new model was increased due to the use of stacking machine learning technologies, which made it possible to take into account both linear and nonlinear effects from features on the outcome.