Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms最新文献

Tensor-matrix products with a compressed sparse tensor 压缩稀疏张量的张量矩阵积

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833183

Shaden Smith, G. Karypis

{"title":"Tensor-matrix products with a compressed sparse tensor","authors":"Shaden Smith, G. Karypis","doi":"10.1145/2833179.2833183","DOIUrl":"https://doi.org/10.1145/2833179.2833183","url":null,"abstract":"The Canonical Polyadic Decomposition (CPD) of tensors is a powerful tool for analyzing multi-way data and is used extensively to analyze very large and extremely sparse datasets. The bottleneck of computing the CPD is multiplying a sparse tensor by several dense matrices. Algorithms for tensor-matrix products fall into two classes. The first class saves floating point operations by storing a compressed tensor for each dimension of the data. These methods are fast but suffer high memory costs. The second class uses a single uncompressed tensor at the cost of additional floating point operations. In this work, we bridge the gap between the two approaches and introduce the compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication. CSF offers similar operation reductions as existing compressed methods while using only a single tensor structure. We validate our contributions with experiments comparing against state-of-the-art methods on a diverse set of datasets. Our work uses 58% less memory than the state-of-the-art while achieving 81% of the parallel performance on 16 threads.","PeriodicalId":215872,"journal":{"name":"Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123623497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

Improving graph partitioning for modern graphs and architectures 改进现代图和架构的图分区

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833188

Dominique LaSalle, Md. Mostofa Ali Patwary, N. Satish, N. Sundaram, P. Dubey, G. Karypis

引用次数: 46

GAIL: the graph algorithm iron law 图算法铁律

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833187

S. Beamer, K. Asanović, D. Patterson

引用次数: 7

Hybrid memory cube performance characterization on data-centric workloads 以数据为中心的工作负载上的混合内存多维数据集性能表征

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833184

M. Gokhale, G. S. Lloyd, C. Macaraeg

{"title":"Hybrid memory cube performance characterization on data-centric workloads","authors":"M. Gokhale, G. S. Lloyd, C. Macaraeg","doi":"10.1145/2833179.2833184","DOIUrl":"https://doi.org/10.1145/2833179.2833184","url":null,"abstract":"The Hybrid Memory Cube is an early commercial product embodying attributes of future stacked DRAM architectures, namely large capacity, high bandwidth, on-package memory controller, and high speed serial interface. We study the performance and energy of a Gen2 HMC on data-centric workloads through a combination of emulation and execution on an HMC FPGA board. An in-house FPGA emulator has been used to obtain memory traces for a small collection of data-centric benchmarks. Our FPGA emulator is based on a 32-bit ARM processor and non-intrusively captures complete memory access traces at only 20X slowdown from real time. We have developed tools to run combined trace fragments from multiple benchmarks on the HMC board, giving a unique capability to characterize HMC performance and power usage under a data parallel workload. We find that the HMC's separate read and write channels are not well exploited by read-dominated data-centric workloads. Our benchmarks achieve between 66% -- 80% of peak bandwidth (80 GB/s for 32-byte packets with 50--50 read/write mix) on the HMC, suggesting that combined read/write channels might show higher utilization on these access patterns. Bandwidth scales linearly up to saturation with increased demand on highly concurrent application workloads with many independent memory requests. There is a corresponding increase in latency, ranging from 80 ns on an extremely light load to 130 ns at high bandwidth.","PeriodicalId":215872,"journal":{"name":"Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131063001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

PathFinder: a signature-search miniapp and its runtime characteristics PathFinder:一个签名搜索小应用程序及其运行时特性

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833190

Aditya M. Deshpande, J. Draper, J. Rigdon, R. Barrett

{"title":"PathFinder: a signature-search miniapp and its runtime characteristics","authors":"Aditya M. Deshpande, J. Draper, J. Rigdon, R. Barrett","doi":"10.1145/2833179.2833190","DOIUrl":"https://doi.org/10.1145/2833179.2833190","url":null,"abstract":"Graphs are widely used in data analytics applications in a variety of fields and are rapidly gaining attention in the computational scientific and engineering (CSE) application community. An important application of graphs concerns binary (executable) signature search to address the potential of a suspect binary evading binary signature detection via obfuscation. A control flow graph generated from a binary allows identification of a pattern of system calls, an ordered sequence of which can then be used as signatures in the search. An application proxy, named PathFinder, represents these properties, allowing examination of the performance characteristics of algorithms used in the search. In this work, we describe PathFinder, its signature search algorithm, which is a modified depth-first recursive search wherein adjacent nodes are compared before recursing down its edges for labels, and its general performance and cache characteristics. We highlight some important differences between PathFinder and traditional CSE applications. For example, the L2 cache hit ratio (less than 60%) in PathFinder is observed to be substantially lower than those observed for traditional CSE applications.","PeriodicalId":215872,"journal":{"name":"Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124904513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PL2AP: fast parallel cosine similarity search PL2AP:快速并行余弦相似度搜索

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833182

D. Anastasiu, G. Karypis

引用次数: 10

Betweenness centrality on Multi-GPU systems 多gpu系统的中间性中心性

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833192

M. Bernaschi, Giancarlo Carbone, Flavio Vella

引用次数: 6

Generalised vectorisation for sparse matrix: vector multiplication 稀疏矩阵的广义向量化:向量乘法

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833185

A. N. Yzelman

引用次数: 10

Data-centric GPU-based adaptive mesh refinement 以数据为中心的基于gpu的自适应网格细化

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833181

M. Wahib, N. Maruyama

引用次数: 11

A scalable architecture for ordered irregular parallelism 有序的不规则并行的可伸缩架构

Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms Pub Date : 2015-11-15 DOI: 10.1145/2833179.2833193

Daniel Sánchez

引用次数: 0