{"title":"FPGA图链路预测的边连通Jaccard相似度","authors":"P. Sathre, Atharva Gondhalekar, Wu-chun Feng","doi":"10.1109/HPEC55821.2022.9926326","DOIUrl":null,"url":null,"abstract":"Graph analysis is a critical task in many fields, such as social networking, epidemiology, bioinformatics, and fraud de-tection. In particular, understanding and inferring relationships between graph elements lies at the core of many graph-based workloads. Real-world graph workloads and their associated data structures create irregular computational patterns that compli-cate the realization of high-performance kernels. Given these complications, there does not exist a de facto “best” architecture, language, or algorithmic approach that simultaneously balances performance, energy efficiency, portability, and productivity. In this paper, we realize different algorithms of edge-connected Jaccard similarity for graph link prediction and characterize their performance across a broad spectrum of graphs on an Intel Stratix 10 FPGA. By utilizing a high-level synthesis (HLS)-driven, high-productivity approach (via the C++-based SYCL language) we rapidly prototype two implementations - a from-scratch edge-centric version and a faithfully-ported commodity GPU implementation - which would have been intractable via a hardware description language. With these implementations, we further consider the benefit and necessity of four HLS-enabled optimizations, both in isolation and in concert - totaling seven distinct synthesized hardware pipelines. Leveraging real-world graphs of up to 516 million edges, we show empirically-measured speedups of up to 9.5 x over the initial HLS implementations when all optimizations work in concert.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Edge-Connected Jaccard Similarity for Graph Link Prediction on FPGA\",\"authors\":\"P. Sathre, Atharva Gondhalekar, Wu-chun Feng\",\"doi\":\"10.1109/HPEC55821.2022.9926326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph analysis is a critical task in many fields, such as social networking, epidemiology, bioinformatics, and fraud de-tection. In particular, understanding and inferring relationships between graph elements lies at the core of many graph-based workloads. Real-world graph workloads and their associated data structures create irregular computational patterns that compli-cate the realization of high-performance kernels. Given these complications, there does not exist a de facto “best” architecture, language, or algorithmic approach that simultaneously balances performance, energy efficiency, portability, and productivity. In this paper, we realize different algorithms of edge-connected Jaccard similarity for graph link prediction and characterize their performance across a broad spectrum of graphs on an Intel Stratix 10 FPGA. By utilizing a high-level synthesis (HLS)-driven, high-productivity approach (via the C++-based SYCL language) we rapidly prototype two implementations - a from-scratch edge-centric version and a faithfully-ported commodity GPU implementation - which would have been intractable via a hardware description language. With these implementations, we further consider the benefit and necessity of four HLS-enabled optimizations, both in isolation and in concert - totaling seven distinct synthesized hardware pipelines. Leveraging real-world graphs of up to 516 million edges, we show empirically-measured speedups of up to 9.5 x over the initial HLS implementations when all optimizations work in concert.\",\"PeriodicalId\":200071,\"journal\":{\"name\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC55821.2022.9926326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Edge-Connected Jaccard Similarity for Graph Link Prediction on FPGA
Graph analysis is a critical task in many fields, such as social networking, epidemiology, bioinformatics, and fraud de-tection. In particular, understanding and inferring relationships between graph elements lies at the core of many graph-based workloads. Real-world graph workloads and their associated data structures create irregular computational patterns that compli-cate the realization of high-performance kernels. Given these complications, there does not exist a de facto “best” architecture, language, or algorithmic approach that simultaneously balances performance, energy efficiency, portability, and productivity. In this paper, we realize different algorithms of edge-connected Jaccard similarity for graph link prediction and characterize their performance across a broad spectrum of graphs on an Intel Stratix 10 FPGA. By utilizing a high-level synthesis (HLS)-driven, high-productivity approach (via the C++-based SYCL language) we rapidly prototype two implementations - a from-scratch edge-centric version and a faithfully-ported commodity GPU implementation - which would have been intractable via a hardware description language. With these implementations, we further consider the benefit and necessity of four HLS-enabled optimizations, both in isolation and in concert - totaling seven distinct synthesized hardware pipelines. Leveraging real-world graphs of up to 516 million edges, we show empirically-measured speedups of up to 9.5 x over the initial HLS implementations when all optimizations work in concert.