{"title":"Finding the Best Box-Cox Transformation in Big Data with Meta-Model Learning: A Case Study on QCT Developer Cloud","authors":"Yuxiang Gao, Tonglin Zhang, B. Yang","doi":"10.1109/CSCloud.2017.53","DOIUrl":null,"url":null,"abstract":"Finding the best model to reveal potential relationships of a given set of data is not an easy job and often requires many iterations of trial and errors for model sections, feature selections and parameters tuning. This problem is greatly complicated in the big data era where the I/O bottlenecks significantly slowed down the time needed to finding the best model. In this article, we examine the case of Box-Cox transformation when assumptions of a regression model are violated. Specifically, we construct and compute a set of summary statistics and transformed the maximum likelihood computation into a per-role operational fashion. The innovative algorithms reduced the big data machine learning problem into a stream based small data learning problem. Once the Box-Cox information array is obtained, the optimal power transformation as well as the corresponding estimates of model parameters can be quickly computed. To evaluate the performance, we implemented the proposed Box-Cox algorithms on QCT developer cloud. Our results showed that by leveraging both the algorithms and the QCT cloud technology, find the fittest model from 101 potential parameters is much faster than the conventional approach.","PeriodicalId":436299,"journal":{"name":"2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCloud.2017.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Finding the best model to reveal potential relationships of a given set of data is not an easy job and often requires many iterations of trial and errors for model sections, feature selections and parameters tuning. This problem is greatly complicated in the big data era where the I/O bottlenecks significantly slowed down the time needed to finding the best model. In this article, we examine the case of Box-Cox transformation when assumptions of a regression model are violated. Specifically, we construct and compute a set of summary statistics and transformed the maximum likelihood computation into a per-role operational fashion. The innovative algorithms reduced the big data machine learning problem into a stream based small data learning problem. Once the Box-Cox information array is obtained, the optimal power transformation as well as the corresponding estimates of model parameters can be quickly computed. To evaluate the performance, we implemented the proposed Box-Cox algorithms on QCT developer cloud. Our results showed that by leveraging both the algorithms and the QCT cloud technology, find the fittest model from 101 potential parameters is much faster than the conventional approach.