Individualized melanoma risk prediction using machine learning with electronic health records

medRxiv - Dermatology Pub Date : 2024-07-27 DOI:10.1101/2024.07.26.24311080

Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov

{"title":"Individualized melanoma risk prediction using machine learning with electronic health records","authors":"Guihong Wan, Sara Khattab, Katie Roster, Nga Nguyen, Boshen Yan, Hannah Rashdan, Hossein Estiri, Yevgeniy R. Semenov","doi":"10.1101/2024.07.26.24311080","DOIUrl":null,"url":null,"abstract":"Background:\nMelanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening.\nMethods:\nThis retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed. Results:\nThis study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819-0.832) and 0.823 (95% CI: 0.809-0.837) and AUC-PR scores of 0.841 (95% CI: 0.834-0.848) and 0.822 (95% CI: 0.806-0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature.\nConclusions:\nMachine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.","PeriodicalId":501385,"journal":{"name":"medRxiv - Dermatology","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Dermatology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.26.24311080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Melanoma is a lethal form of skin cancer with a high propensity for metastasizing, making early detection crucial. This study aims to develop a machine learning model using electronic health record data to identify patients at high risk of developing melanoma to prioritize them for dermatology screening. Methods: This retrospective study included patients diagnosed with melanoma (cases), as well as matched patients without melanoma (controls), from Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Dana-Farber Cancer Institute (DFCI), and other hospital centers within the Research Patient Data Registry at Mass General Brigham healthcare system between 1992 and 2022. Patient demographics, family history, diagnoses, medications, procedures, laboratory tests, reasons for visits, and allergy data six months prior to the date of first melanoma diagnosis or date of censoring were extracted. A machine learning framework for health outcomes (MLHO) was utilized to build the model. Performance was evaluated using five-fold cross-validation of the MGH cohort (internal validation) and by using the MGH cohort for model training and the non-MGH cohort for independent testing (external validation). The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and the Area Under the Precision-Recall Curve (AUC-PR), along with 95% Confidence Intervals (CIs), were computed. Results: This study identified 10,778 patients with melanoma and 10,778 matched patients without melanoma, including 8,944 from MGH and 1,834 from non-MGH hospitals in each cohort, both with an average follow-up duration of 9 years. In the internal and external validations, the model achieved AUC-ROC values of 0.826 (95% CI: 0.819-0.832) and 0.823 (95% CI: 0.809-0.837) and AUC-PR scores of 0.841 (95% CI: 0.834-0.848) and 0.822 (95% CI: 0.806-0.839), respectively. Important risk features included a family history of melanoma, a family history of skin cancer, and a prior diagnosis of benign neoplasm of skin. Conversely, medical examination without abnormal findings was identified as a protective feature. Conclusions: Machine learning techniques and electronic health records can be effectively used to predict melanoma risk, potentially aiding in identifying high-risk patients and enabling individualized screening strategies for melanoma.

查看原文本刊更多论文

利用机器学习和电子健康记录进行个性化黑色素瘤风险预测

背景：黑色素瘤是一种致命的皮肤癌，极易转移，因此早期发现至关重要。本研究旨在利用电子健康记录数据开发一种机器学习模型，以识别黑色素瘤高风险患者，并优先安排他们接受皮肤科筛查。方法：这项回顾性研究纳入了麻省总医院（MGH）、布里格姆妇女医院（BWH）、丹娜-法伯癌症研究所（DFCI）以及麻省总医院布里格姆医疗保健系统研究患者数据登记处的其他医院中心在1992年至2022年期间确诊为黑色素瘤的患者（病例）和未患黑色素瘤的匹配患者（对照）。我们提取了患者的人口统计学特征、家族史、诊断、用药、手术、实验室检查、就诊原因以及首次黑色素瘤诊断日期或剔除日期前六个月的过敏数据。利用健康结果机器学习框架（MLHO）来建立模型。通过对MGH队列进行五倍交叉验证（内部验证），以及使用MGH队列进行模型训练和非MGH队列进行独立测试（外部验证）来评估模型的性能。计算了接收者操作特征曲线下面积（AUC-ROC）和精确度-召回曲线下面积（AUC-PR）以及95%置信区间（CI）。结果：这项研究共发现了10778名黑色素瘤患者和10778名匹配的非黑色素瘤患者，每个队列中有8944人来自MGH，1834人来自非MGH医院，平均随访时间均为9年。在内部和外部验证中，该模型的AUC-ROC值分别为0.826（95% CI：0.819-0.832）和0.823（95% CI：0.809-0.837），AUC-PR值分别为0.841（95% CI：0.834-0.848）和0.822（95% CI：0.806-0.839）。重要的风险特征包括黑色素瘤家族史、皮肤癌家族史和曾被诊断为皮肤良性肿瘤。结论：机器学习技术和电子健康记录可有效用于预测黑色素瘤风险，从而帮助识别高危患者，实现黑色素瘤的个体化筛查策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Dermatology

自引率

0.00%

发文量