{"title":"异构集群上数据集的统计分析框架","authors":"R. Cariño, I. Banicescu","doi":"10.1109/CLUSTR.2005.347019","DOIUrl":null,"url":null,"abstract":"This paper proposes a framework for the statistical analysis of multiple related datasets on heterogeneous clusters. The set of processors assigned to the framework are partitioned into groups according to rack locations, with the group sizes being chosen to match the degree of concurrency in the analysis procedure. The datasets are initially divided among the groups. Dynamic loop scheduling is employed to address load imbalance arising from the differences in computational powers of groups, the variability of dataset sizes, and the unpredictable irregularities in the cluster environment. Results of preliminary tests indicate the effectiveness of the framework in fitting gamma-ray burst datasets with vector functional coefficient autoregressive time series models on 64 processors of a heterogeneous general-purpose Linux cluster","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters\",\"authors\":\"R. Cariño, I. Banicescu\",\"doi\":\"10.1109/CLUSTR.2005.347019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a framework for the statistical analysis of multiple related datasets on heterogeneous clusters. The set of processors assigned to the framework are partitioned into groups according to rack locations, with the group sizes being chosen to match the degree of concurrency in the analysis procedure. The datasets are initially divided among the groups. Dynamic loop scheduling is employed to address load imbalance arising from the differences in computational powers of groups, the variability of dataset sizes, and the unpredictable irregularities in the cluster environment. Results of preliminary tests indicate the effectiveness of the framework in fitting gamma-ray burst datasets with vector functional coefficient autoregressive time series models on 64 processors of a heterogeneous general-purpose Linux cluster\",\"PeriodicalId\":255312,\"journal\":{\"name\":\"2005 IEEE International Conference on Cluster Computing\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2005.347019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2005.347019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters
This paper proposes a framework for the statistical analysis of multiple related datasets on heterogeneous clusters. The set of processors assigned to the framework are partitioned into groups according to rack locations, with the group sizes being chosen to match the degree of concurrency in the analysis procedure. The datasets are initially divided among the groups. Dynamic loop scheduling is employed to address load imbalance arising from the differences in computational powers of groups, the variability of dataset sizes, and the unpredictable irregularities in the cluster environment. Results of preliminary tests indicate the effectiveness of the framework in fitting gamma-ray burst datasets with vector functional coefficient autoregressive time series models on 64 processors of a heterogeneous general-purpose Linux cluster