Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov
{"title":"Individualized melanoma risk prediction using machine learning with electronic health records","authors":"Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov","doi":"10.1101/2024.07.26.24311080","DOIUrl":null,"url":null,"abstract":"Background:\nMelanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening.\nMethods:\nThis retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed. Results:\nThis study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819-0.832) and 0.823 (95% CI: 0.809-0.837) and AUC-PR scores of 0.841 (95% CI: 0.834-0.848) and 0.822 (95% CI: 0.806-0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature.\nConclusions:\nMachine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.","PeriodicalId":501385,"journal":{"name":"medRxiv - Dermatology","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Dermatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.26.24311080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
Melanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening.
Methods:
This retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed. Results:
This study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819-0.832) and 0.823 (95% CI: 0.809-0.837) and AUC-PR scores of 0.841 (95% CI: 0.834-0.848) and 0.822 (95% CI: 0.806-0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature.
Conclusions:
Machine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.