Melanoma Risk Prediction with Structured Electronic Health Records

Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics Pub Date : 2018-08-15 DOI:10.1145/3233547.3233561

Aaron N. Richter, T. Khoshgoftaar

{"title":"Melanoma Risk Prediction with Structured Electronic Health Records","authors":"Aaron N. Richter, T. Khoshgoftaar","doi":"10.1145/3233547.3233561","DOIUrl":null,"url":null,"abstract":"Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.

查看原文本刊更多论文

结构化电子健康记录的黑色素瘤风险预测

黑色素瘤是世界上增长最快的癌症之一，与大多数其他癌症相比，它可以在生命早期影响患者。因此，必须能够识别黑色素瘤的高风险患者，并将他们纳入筛查计划，以便及早发现癌症。在本研究中，我们利用皮肤科门诊患者的数据来建立疾病的风险模型。使用数百万个患者记录，每个记录中有数千个数据点，我们表明我们可以从现实世界的电子健康记录(EHR)数据中构建黑色素瘤风险模型，而无需任何专家知识或手动设计特征。虽然已经开发了其他黑色素瘤风险模型，但这是第一次使用常规收集的电子病历数据，而不是专门针对黑色素瘤的专家特征。随机森林模型的性能与之前的模型相似或更好(AUC 0.79，灵敏度0.71，特异性0.72)，允许更多的患者进行黑色素瘤风险筛查，而无需进行专门和耗时的数据收集。可以从模型中提取和研究重要特征，并且可以向提供者和患者解释影响特定预测的特征。建立该模型的过程可以进一步改进以提高性能，并用于其他疾病的风险预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

自引率

0.00%

发文量