{"title":"在云中实现分析质量的挑战","authors":"Hong Linh Truong, A. Murguzur, Erica Y. Yang","doi":"10.1145/3138806","DOIUrl":null,"url":null,"abstract":"Currently, domain scientists (DSs) face challenges in managing quality across multiple data analytics contexts (DACs). We identify and define quality of analytics (QoA) in dynamic and diverse environments, e.g., based on cloud computing resources for big data sources, as a composition of quality of data (data quality), performance, and cost, to name just the main factors. QoA is a complex matter and not just about quality of data or performance, which are typically considered separately when evaluating existing data analytics frameworks/algorithms. Frequently, the DS needs to utilize multiple frameworks to run different (sub)analytics, and, at the same time, the sub-analytics executed in these frameworks exchange inputs and outputs each other. In these cases, we observe different DACs, where a DAC refers to a particular situation in which the DS works with a specific framework to run a sub-analytics carried out by pipeline(s) or tasks in a pipeline. Each DAC has a set of interactions in the following categories:","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"6 1","pages":"1 - 4"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Challenges in Enabling Quality of Analytics in the Cloud\",\"authors\":\"Hong Linh Truong, A. Murguzur, Erica Y. Yang\",\"doi\":\"10.1145/3138806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, domain scientists (DSs) face challenges in managing quality across multiple data analytics contexts (DACs). We identify and define quality of analytics (QoA) in dynamic and diverse environments, e.g., based on cloud computing resources for big data sources, as a composition of quality of data (data quality), performance, and cost, to name just the main factors. QoA is a complex matter and not just about quality of data or performance, which are typically considered separately when evaluating existing data analytics frameworks/algorithms. Frequently, the DS needs to utilize multiple frameworks to run different (sub)analytics, and, at the same time, the sub-analytics executed in these frameworks exchange inputs and outputs each other. In these cases, we observe different DACs, where a DAC refers to a particular situation in which the DS works with a specific framework to run a sub-analytics carried out by pipeline(s) or tasks in a pipeline. Each DAC has a set of interactions in the following categories:\",\"PeriodicalId\":15582,\"journal\":{\"name\":\"Journal of Data and Information Quality (JDIQ)\",\"volume\":\"6 1\",\"pages\":\"1 - 4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Data and Information Quality (JDIQ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3138806\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3138806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenges in Enabling Quality of Analytics in the Cloud
Currently, domain scientists (DSs) face challenges in managing quality across multiple data analytics contexts (DACs). We identify and define quality of analytics (QoA) in dynamic and diverse environments, e.g., based on cloud computing resources for big data sources, as a composition of quality of data (data quality), performance, and cost, to name just the main factors. QoA is a complex matter and not just about quality of data or performance, which are typically considered separately when evaluating existing data analytics frameworks/algorithms. Frequently, the DS needs to utilize multiple frameworks to run different (sub)analytics, and, at the same time, the sub-analytics executed in these frameworks exchange inputs and outputs each other. In these cases, we observe different DACs, where a DAC refers to a particular situation in which the DS works with a specific framework to run a sub-analytics carried out by pipeline(s) or tasks in a pipeline. Each DAC has a set of interactions in the following categories: