Gunjan Mansingh, Kweku-Muata A. Osei-Bryson, L. Rao, Maurice McNaughton
{"title":"Data preparation: Art or science?","authors":"Gunjan Mansingh, Kweku-Muata A. Osei-Bryson, L. Rao, Maurice McNaughton","doi":"10.1109/ICDSE.2016.7823936","DOIUrl":null,"url":null,"abstract":"Data preparation is often cited as the most time consuming phase of a Knowledge Discovery and Data Mining (KDDM) process. This is attributed to the fact that this phase is highly dependent on the expertise of the analyst. Although process models exist for KDDM the description of their phases of the process focus on outlining what must be done but often do not detail how this should be done. While there is some research in addressing the how of the phases, the data preparation phase is thought to be the most challenging and is often described as an art rather than a science. The tasks defined in this phase are thought to be highly dependent on the expertise of the analyst and the context. While we are of the view that there will always be an art to data preparation we will demonstrate that the science can actually enhance the art. We further contend that as more research of this kind is published, that demonstrates a variety of data preparation techniques that enhance the data mining process, the more effective will be the science of data preparation.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"47 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Data Science and Engineering (ICDSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSE.2016.7823936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Data preparation is often cited as the most time consuming phase of a Knowledge Discovery and Data Mining (KDDM) process. This is attributed to the fact that this phase is highly dependent on the expertise of the analyst. Although process models exist for KDDM the description of their phases of the process focus on outlining what must be done but often do not detail how this should be done. While there is some research in addressing the how of the phases, the data preparation phase is thought to be the most challenging and is often described as an art rather than a science. The tasks defined in this phase are thought to be highly dependent on the expertise of the analyst and the context. While we are of the view that there will always be an art to data preparation we will demonstrate that the science can actually enhance the art. We further contend that as more research of this kind is published, that demonstrates a variety of data preparation techniques that enhance the data mining process, the more effective will be the science of data preparation.