{"title":"Melanoma Risk Prediction with Structured Electronic Health Records","authors":"Aaron N. Richter, T. Khoshgoftaar","doi":"10.1145/3233547.3233561","DOIUrl":null,"url":null,"abstract":"Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. In this study, we explore data from dermatology outpatients to build a risk model for the disease. Using millions of patient records with thousands of data points in each record, we show that we can build a melanoma risk model from real-world Electronic Health Record (EHR) data without any expert knowledge or manually engineered features. While other risk models for melanoma have been developed, this is the first to use routinely collected EHR data rather than expert features targeted specifically for melanoma. The random forest model achieves similar or better performance than these previous models (AUC 0.79, sensitivity 0.71, specificity 0.72), which allows larger populations of patients to get screened for melanoma risk without having to perform specialized and time-consuming data collection. Important features from the model can be extracted and studied, and features influencing a specific prediction can be explained to providers and patients. The process for building this model can be further refined to improve performance, as well as used for risk prediction of other diseases.