{"title":"Modernizing Analytics for Melanoma with a Large-Scale Research Dataset","authors":"Aaron N. Richter, T. Khoshgoftaar","doi":"10.1109/IRI.2017.45","DOIUrl":null,"url":null,"abstract":"We present the Modernizing Analytics for MELanoma (MAMEL) dataset: a real-world, dermatologyspecific research dataset specifically crafted to advance data mining and machine learning research in the field of melanoma diagnosis, analysis, and treatment. This dataset was collected and curated from Modernizing Medicine’s EMA DermatologyTM application, a cloud-based Electronic Health Record (EHR) platform. A big data processing architecture, built on Apache Hadoop and Apache Spark, was used to collect all patient data, identify patients for the MAMEL dataset, and create and document all data elements. This paper outlines the application and data processing architectures, provides an exploratory analysis of data elements available in MAMEL, and discusses avenues for using this dataset in clinical decision support applications for melanoma.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
We present the Modernizing Analytics for MELanoma (MAMEL) dataset: a real-world, dermatologyspecific research dataset specifically crafted to advance data mining and machine learning research in the field of melanoma diagnosis, analysis, and treatment. This dataset was collected and curated from Modernizing Medicine’s EMA DermatologyTM application, a cloud-based Electronic Health Record (EHR) platform. A big data processing architecture, built on Apache Hadoop and Apache Spark, was used to collect all patient data, identify patients for the MAMEL dataset, and create and document all data elements. This paper outlines the application and data processing architectures, provides an exploratory analysis of data elements available in MAMEL, and discusses avenues for using this dataset in clinical decision support applications for melanoma.