{"title":"Building and Interpreting Risk Models from Imbalanced Clinical Data","authors":"Aaron N. Richter, T. Khoshgoftaar","doi":"10.1109/ICTAI.2018.00031","DOIUrl":null,"url":null,"abstract":"As more clinical data becomes available for research, it is important to be able to build effective models and understand the predictions made from them. In this paper, we present a case study modeling melanoma risk using structured clinical records. Advanced modeling techniques are required as the data set is large, sparse, and imbalanced. We explore the use of logistic regression, decision tree, and random forest classifiers with various feature selection and random undersampling techniques. For clinical models to be used in practice, both providers and patients should have insight into why a certain prediction is made. Therefore, interpretability must be a key factor when choosing a model for a clinical prediction task, and we explore the level of interpretation given by the models compared to their predictive performance.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
As more clinical data becomes available for research, it is important to be able to build effective models and understand the predictions made from them. In this paper, we present a case study modeling melanoma risk using structured clinical records. Advanced modeling techniques are required as the data set is large, sparse, and imbalanced. We explore the use of logistic regression, decision tree, and random forest classifiers with various feature selection and random undersampling techniques. For clinical models to be used in practice, both providers and patients should have insight into why a certain prediction is made. Therefore, interpretability must be a key factor when choosing a model for a clinical prediction task, and we explore the level of interpretation given by the models compared to their predictive performance.