R. Maas, Jeremy Hyrkas, O. Telford, M. Balazinska, A. Connolly, Bill Howe
{"title":"Gaussian Mixture Models Use-Case: In-Memory Analysis with Myria","authors":"R. Maas, Jeremy Hyrkas, O. Telford, M. Balazinska, A. Connolly, Bill Howe","doi":"10.1145/2803140.2803143","DOIUrl":null,"url":null,"abstract":"In our work with scientists, we find that Gaussian Mixture Modeling is a common type of analysis applied to increasingly large datasets. We implement this algorithm in the Myria shared-nothing relational data management system, which performs the computation in memory. We study resulting memory utilization challenges and implement several optimizations that yield an efficient and scalable solution. Empirical evaluations on large astronomy and oceanography datasets confirm that our Myria approach scales well and performs up to an order of magnitude faster than Hadoop.","PeriodicalId":175654,"journal":{"name":"Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2803140.2803143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
In our work with scientists, we find that Gaussian Mixture Modeling is a common type of analysis applied to increasingly large datasets. We implement this algorithm in the Myria shared-nothing relational data management system, which performs the computation in memory. We study resulting memory utilization challenges and implement several optimizations that yield an efficient and scalable solution. Empirical evaluations on large astronomy and oceanography datasets confirm that our Myria approach scales well and performs up to an order of magnitude faster than Hadoop.