{"title":"Improving the Efficiency of Future Exascale Systems with rCUDA","authors":"C. Reaño, Javier Prades, F. Silla","doi":"10.1109/HiPINEB.2018.00014","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00014","url":null,"abstract":"The computing power of supercomputers and data centers has noticeably grown during the last decades at the cost of an ever increasing energy demand. The need for energy (and power) of these facilities has finally limited the evolution of high performance computing, making that many researchers are concerned not only about performance but also about energy efficiency. However, despite the many concerns about energy consumption, the search for computing power continues. In this regard, the research on exascale systems, able to deliver 10^18 floating point operations per second, has reached a widely consensus that these systems should operate within a maximum power budget of 20 megawatts. Many efficiency improvements are necessary for achieving this goal. One of these improvements is the usage of ARM low-power processors, as the Mont-Blanc proposes. In this paper we propose the combined use of ARM processors with the remote GPU virtualization rCUDA framework as a way to improve efficiency even more. Results show that it is possible to speed up applications by more than 12x when rCUDA is used to access high-end GPUs.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"19 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132359544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees","authors":"John Gliksberg, Jean-Noël Quintin, P. García","doi":"10.1109/HiPINEB.2018.00010","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00010","url":null,"abstract":"High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communications. Since application communication patterns are rarely available beforehand, we choose to rely on node types as a good guess for node usage. We provide a description of node type heterogeneity and analyse performance degradation caused by unlucky repartition of nodes of the same type. We provide an extension to routing algorithms for Parallel Generalized Fat-Tree topologies (PGFTs) which balances load amongst groups of nodes of the same type. We show how it removes these performance issues by comparing results in a variety of situations against corresponding classical algorithms.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124353122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco J. Andújar, S. Coll, M. Alonso, Juan-Miguel Martínez, P. López, F. J. Alfaro, J. L. Sánchez, R. Martínez
{"title":"Analyzing Topology Parameters for Achieving Energy-Efficient k-ary n-cubes","authors":"Francisco J. Andújar, S. Coll, M. Alonso, Juan-Miguel Martínez, P. López, F. J. Alfaro, J. L. Sánchez, R. Martínez","doi":"10.1109/HiPINEB.2018.00012","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00012","url":null,"abstract":"Achieving an optimal performance/energy ratio is a challenge for massively parallel computer architects, and in particular for the interconnection network designers. The k-ary n-cube is one of the most popular topologies used in the largest current supercomputers. In this paper, we present a study that considers two alternatives to build k-ary n-cube topologies taking advantage of the high-radix switches currently available: a topology with more dimensions and one NIC per router, or a topology with less dimensions, link aggregation and several NICs per router. The fact of using link aggregation eases the implementation of simple power consumption reduction techniques. Using a simple power model, we evaluate by trace-driven simulation the impact on energy and performance of several network sizes for both topology proposals. In order to do a fair comparison, we keep fixed the theoretical network bandwidth.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126669130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Energy-Saving Strategies on Torus, K-Ary N-Tree, and Dragonfly","authors":"F. Zahn, Armin Schoffer, H. Fröning","doi":"10.1109/HiPINEB.2018.00011","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00011","url":null,"abstract":"Energy is one of the most crucial factors in the design of large-scale computing systems, especially high-performance computing. While exascale systems could be built with current hardware solutions, the required funding exceeds the budget of most institutions. Since a system is never fully utilized, energy-proportional components can save a substantial amount of energy. However, current interconnect technologies still operate at a fixed power consumption rate. Therefore, network power consumption becomes increasingly important as its contribution to overall power consumption is increasing. Energy-proportional interconnection networks is a research area that is still emerging. In this work, we analyze the effects of different topology characteristics on power consumption and potential energy savings of interconnection networks. We compare the differences in the design of common topologies and the related impact to energy savings. In particular, we analyze the power consumption of torus, k-ary n-tree, and dragonfly. We also use existing topology-independent power-saving policies to derive potential energy savings for each topology and compare the policies to other work which is specific to topology hardware features. The comparison concludes that topology-independent policies are superior for energy savings and the other work is superior for execution time.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115185036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis and Improvement of Valiant Routing in Low-Diameter Networks","authors":"M. Benito, Pablo Fuentes, E. Vallejo, R. Beivide","doi":"10.1109/HiPINEB.2018.00009","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00009","url":null,"abstract":"Valiant routing randomizes network traffic to avoid pathological congestion issues by diverting traffic to a random intermediate switch. It has received significant attention in recently proposed high-radix, low-diameter topologies, which are prone to congestion issues. It has been implemented obliviously, or as the basis of some non-minimal adaptive routing algorithms. An analysis of the original mechanism identifies two potential improvements regarding the selection of the intermediate switch. First, when traffic is local the randomization introduced by Valiant results in unnecessarily long paths. Instead, the introduced Restricted Valiant routing randomizes traffic within a local partition, avoiding congestion and generating shorter paths. Second, in certain cases the path to the selected random intermediate node can be blocked; a version with recomputation selects a new random intermediate node as long as the associated path remains stalled. The proposals are evaluated by simulation in a state-of-the-art Dragonfly network with different traffic patterns. Results show that Restricted Valiant is highly effective in cases of local traffic, with a small improvement under global patterns. Valiant with recomputation increases injection, further reducing average latency and increasing throughput. However, the higher injection increases congestion effects in some cases. Such problem is emphasized when more injection buffers are added, because of the increased pressure on the interconnect. Overall, the results are very relevant for routing in high-radix networks and might constitute the basis for other adaptive routing algorithms.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"33 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129543452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Cano-Cano, Francisco J. Andújar, F. J. Alfaro, J. L. Sánchez
{"title":"VEF3 Traces: Towards a Complete Framework for Modelling Network Workloads for Exascale Systems","authors":"Javier Cano-Cano, Francisco J. Andújar, F. J. Alfaro, J. L. Sánchez","doi":"10.1109/HiPINEB.2018.00013","DOIUrl":"https://doi.org/10.1109/HiPINEB.2018.00013","url":null,"abstract":"To meet the expected performance requirements of applications running on future exascale systems, the number of processing nodes included in such systems will have to increase and, according to the current trend, also the number of cores in each node. In these systems, the networks, both off- and on-chip, interconnecting these nodes and cores inside nodes, respectively, will have to be much more efficient than current ones. In order to develop and research on interconnection networks, simulation is the most common technique used. Simulators traditionally have used synthetic traffic as network workload which does not represent the network workload that real applications generate. The use of application communication trace files is a best strategy for this purpose. In this paper, we extend an existing tool including functionality related to communication within each node. In this way, the tool will allow interconnection network simulators to model traffic due to all the communications generated in the exascale systems.","PeriodicalId":247186,"journal":{"name":"2018 IEEE 4th International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116172646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}