{"title":"Comparative performance analysis of a Big Data NORA problem on a variety of architectures","authors":"P. Kogge, D. Bayliss","doi":"10.1109/CTS.2013.6567199","DOIUrl":null,"url":null,"abstract":"Non Obvious Relationship Analysis (NORA) is one of the most stressing classes of Big Data Analytics problems. This paper proposes a reference NORA problem that is representative of real problems, and can rationally scale to very large sizes. It then develops a highly concurrent implementation that can run on large systems. Each step of this implementation is sized in terms of how much of four different resources (CPU, memory, disk, and network) might be used. From this, a parameterized model projecting both execution time and utilizations is used to identify the “tall poles” in performance. The parameters are then modified to represent several different target systems, from a large cluster typical of today to variations in an advanced architecture where processing has been moved into memory. A “thought experiment” then uses this model to discover the parameters of a system that would provide both a near 100X speedup, but with a balanced design where no resource is badly over or under utilized.","PeriodicalId":256633,"journal":{"name":"2013 International Conference on Collaboration Technologies and Systems (CTS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Collaboration Technologies and Systems (CTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTS.2013.6567199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Non Obvious Relationship Analysis (NORA) is one of the most stressing classes of Big Data Analytics problems. This paper proposes a reference NORA problem that is representative of real problems, and can rationally scale to very large sizes. It then develops a highly concurrent implementation that can run on large systems. Each step of this implementation is sized in terms of how much of four different resources (CPU, memory, disk, and network) might be used. From this, a parameterized model projecting both execution time and utilizations is used to identify the “tall poles” in performance. The parameters are then modified to represent several different target systems, from a large cluster typical of today to variations in an advanced architecture where processing has been moved into memory. A “thought experiment” then uses this model to discover the parameters of a system that would provide both a near 100X speedup, but with a balanced design where no resource is badly over or under utilized.