Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards
{"title":"摘要:混合核计算机的混合广度优先搜索实现","authors":"Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards","doi":"10.1109/SC.Companion.2012.184","DOIUrl":null,"url":null,"abstract":"Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"5 1","pages":"1354-1354"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Abstract: Hybrid Breadth First Search Implementation for Hybrid-Core Computers\",\"authors\":\"Kevin R. Wadleigh, John Amelio, K. Collins, G. Edwards\",\"doi\":\"10.1109/SC.Companion.2012.184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.\",\"PeriodicalId\":6346,\"journal\":{\"name\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"volume\":\"5 1\",\"pages\":\"1354-1354\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.Companion.2012.184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Abstract: Hybrid Breadth First Search Implementation for Hybrid-Core Computers
Summary form only given. The Graph500 benchmark is designed to evaluate the suitability of supercomputing systems for graph algorithms, which are increasingly important in HPC. The timed Graph500 kernel, Breadth First Search, exhibits memory access patterns typical of these types of applications, with poor spatial locality and synchronization between multiple streams of execution. The Graph500 benchmark was ported to the Convey HC-2ex and MX-100, hybrid-core computers with an Intel host system and a coprocessor incorporating four reprogrammable Xilinx FPGAs. The computers contain a unique memory system designed to sustain high bandwidth for random memory accesses. The BFS kernel was implemented as a hybrid algorithm with concurrent processing on both the host and coprocessor. The early steps use a top-down algorithm on the host with results copied to coprocessor memory for use in a bottom-up algorithm. The coprocessor uses thousands of threads to traverse the graph. The resulting implementation runs at over 16 billion TEPS.