{"title":"Challenges in Enabling Quality of Analytics in the Cloud","authors":"Hong Linh Truong, A. Murguzur, Erica Y. Yang","doi":"10.1145/3138806","DOIUrl":null,"url":null,"abstract":"Currently, domain scientists (DSs) face challenges in managing quality across multiple data analytics contexts (DACs). We identify and define quality of analytics (QoA) in dynamic and diverse environments, e.g., based on cloud computing resources for big data sources, as a composition of quality of data (data quality), performance, and cost, to name just the main factors. QoA is a complex matter and not just about quality of data or performance, which are typically considered separately when evaluating existing data analytics frameworks/algorithms. Frequently, the DS needs to utilize multiple frameworks to run different (sub)analytics, and, at the same time, the sub-analytics executed in these frameworks exchange inputs and outputs each other. In these cases, we observe different DACs, where a DAC refers to a particular situation in which the DS works with a specific framework to run a sub-analytics carried out by pipeline(s) or tasks in a pipeline. Each DAC has a set of interactions in the following categories:","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"6 1","pages":"1 - 4"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3138806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Currently, domain scientists (DSs) face challenges in managing quality across multiple data analytics contexts (DACs). We identify and define quality of analytics (QoA) in dynamic and diverse environments, e.g., based on cloud computing resources for big data sources, as a composition of quality of data (data quality), performance, and cost, to name just the main factors. QoA is a complex matter and not just about quality of data or performance, which are typically considered separately when evaluating existing data analytics frameworks/algorithms. Frequently, the DS needs to utilize multiple frameworks to run different (sub)analytics, and, at the same time, the sub-analytics executed in these frameworks exchange inputs and outputs each other. In these cases, we observe different DACs, where a DAC refers to a particular situation in which the DS works with a specific framework to run a sub-analytics carried out by pipeline(s) or tasks in a pipeline. Each DAC has a set of interactions in the following categories: