Muluken Regas Eressa, Hakim Badis, R. Langar, Dorian Grosso
{"title":"快速可扩展高斯过程的随机稀疏逼近","authors":"Muluken Regas Eressa, Hakim Badis, R. Langar, Dorian Grosso","doi":"10.1109/ISMODE56940.2022.10181004","DOIUrl":null,"url":null,"abstract":"For machine learning algorithms the availability of huge data offers ample opportunity to learn and infer educated generalizations. However, for gaussian process the size of the data presents a challenge for their wider application in the areas of big data domain. Various approaches have been suggested to ensure scalability and computational efficiency. Such as, the kernel approximation and the variational inference are few notable mentions. This paper proposes a random sparse Gaussian approximation method based on a stochastic column sampling. It employs frequency analysis to select subsets of points that would generalize the observed data. Then, applies sparsity and sampling without replacement strategy when building the model. The predictive performance of the model is evaluated using the Variational Gaussian Process (VGA) as a benchmark. We run a Monte Carlo type model building and evaluation scheme using the mean square error (MSE) and R2 score as quality metrics. An ensemble of models were trained and evaluated for different sampling sizes under the same setting. The experiments have shown that the RSGA, on average is 10 times faster and offer better predictive performance compared to VGA. In addition, the RSGA offers a robust response to changes in kernel type compared to the VGA. Hence, for a fast optimal kernel estimation and big data analysis, the RSGA can give an alternative route to model building and inference.","PeriodicalId":335247,"journal":{"name":"2022 2nd International Seminar on Machine Learning, Optimization, and Data Science (ISMODE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random Sparse Approximation for a Fast and Scalable Gaussian Process\",\"authors\":\"Muluken Regas Eressa, Hakim Badis, R. Langar, Dorian Grosso\",\"doi\":\"10.1109/ISMODE56940.2022.10181004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For machine learning algorithms the availability of huge data offers ample opportunity to learn and infer educated generalizations. However, for gaussian process the size of the data presents a challenge for their wider application in the areas of big data domain. Various approaches have been suggested to ensure scalability and computational efficiency. Such as, the kernel approximation and the variational inference are few notable mentions. This paper proposes a random sparse Gaussian approximation method based on a stochastic column sampling. It employs frequency analysis to select subsets of points that would generalize the observed data. Then, applies sparsity and sampling without replacement strategy when building the model. The predictive performance of the model is evaluated using the Variational Gaussian Process (VGA) as a benchmark. We run a Monte Carlo type model building and evaluation scheme using the mean square error (MSE) and R2 score as quality metrics. An ensemble of models were trained and evaluated for different sampling sizes under the same setting. The experiments have shown that the RSGA, on average is 10 times faster and offer better predictive performance compared to VGA. In addition, the RSGA offers a robust response to changes in kernel type compared to the VGA. Hence, for a fast optimal kernel estimation and big data analysis, the RSGA can give an alternative route to model building and inference.\",\"PeriodicalId\":335247,\"journal\":{\"name\":\"2022 2nd International Seminar on Machine Learning, Optimization, and Data Science (ISMODE)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Seminar on Machine Learning, Optimization, and Data Science (ISMODE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISMODE56940.2022.10181004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Seminar on Machine Learning, Optimization, and Data Science (ISMODE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMODE56940.2022.10181004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random Sparse Approximation for a Fast and Scalable Gaussian Process
For machine learning algorithms the availability of huge data offers ample opportunity to learn and infer educated generalizations. However, for gaussian process the size of the data presents a challenge for their wider application in the areas of big data domain. Various approaches have been suggested to ensure scalability and computational efficiency. Such as, the kernel approximation and the variational inference are few notable mentions. This paper proposes a random sparse Gaussian approximation method based on a stochastic column sampling. It employs frequency analysis to select subsets of points that would generalize the observed data. Then, applies sparsity and sampling without replacement strategy when building the model. The predictive performance of the model is evaluated using the Variational Gaussian Process (VGA) as a benchmark. We run a Monte Carlo type model building and evaluation scheme using the mean square error (MSE) and R2 score as quality metrics. An ensemble of models were trained and evaluated for different sampling sizes under the same setting. The experiments have shown that the RSGA, on average is 10 times faster and offer better predictive performance compared to VGA. In addition, the RSGA offers a robust response to changes in kernel type compared to the VGA. Hence, for a fast optimal kernel estimation and big data analysis, the RSGA can give an alternative route to model building and inference.