The Effect of Core Parking for Energy-efficient Matrix-Matrix Multiplication by using AVX and OpenMP

2021 21st International Conference on Control, Automation and Systems (ICCAS) Pub Date : 2021-10-12 DOI:10.23919/ICCAS52745.2021.9649926

Nwe Zin Oo, P. Chaikan

{"title":"The Effect of Core Parking for Energy-efficient Matrix-Matrix Multiplication by using AVX and OpenMP","authors":"Nwe Zin Oo, P. Chaikan","doi":"10.23919/ICCAS52745.2021.9649926","DOIUrl":null,"url":null,"abstract":"Today's modern computers support multi-core processors architecture that enhances parallel computing with single instruction multiple data computing. According to memory structure, the CPU core performance is a vital role in power-saving profiling across the multi-core architecture. Although CPU parking was controlled entirely by the operating system of both laptops and desktops computers, the performance can be boost by tweaking CPU core parking and changing frequency scaling in real-time. In this paper, the effect of core parking for parallel matrix-matrix multiplication on shared memory is proposed by utilizing AVX and OpenMP. When the large matrix sizes are multiplied parallelly on shared memory, the overheads of memory capacity and data transferring become the main issues not only for increased power consumption but also for decrease performance. The large square matrix multiplications are tested that range from 1024×1024 to 16384×16384 by utilizing Advanced Vector Extensions (AVX) intrinsics and OpenMP, and varying the different power-saving profiling dynamically. The default power-saving profile in a computer is the balanced mode and we tested for performance by tweaking CPU parking with four different modes (Balanced, High Performance, Bitsum Highest Performance, and Power Saving). According to tested results, the Bitsum Highest Performance mode obtained the maximum performance and minimum power and energy consumption than other profiling modes.","PeriodicalId":411064,"journal":{"name":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS52745.2021.9649926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Today's modern computers support multi-core processors architecture that enhances parallel computing with single instruction multiple data computing. According to memory structure, the CPU core performance is a vital role in power-saving profiling across the multi-core architecture. Although CPU parking was controlled entirely by the operating system of both laptops and desktops computers, the performance can be boost by tweaking CPU core parking and changing frequency scaling in real-time. In this paper, the effect of core parking for parallel matrix-matrix multiplication on shared memory is proposed by utilizing AVX and OpenMP. When the large matrix sizes are multiplied parallelly on shared memory, the overheads of memory capacity and data transferring become the main issues not only for increased power consumption but also for decrease performance. The large square matrix multiplications are tested that range from 1024×1024 to 16384×16384 by utilizing Advanced Vector Extensions (AVX) intrinsics and OpenMP, and varying the different power-saving profiling dynamically. The default power-saving profile in a computer is the balanced mode and we tested for performance by tweaking CPU parking with four different modes (Balanced, High Performance, Bitsum Highest Performance, and Power Saving). According to tested results, the Bitsum Highest Performance mode obtained the maximum performance and minimum power and energy consumption than other profiling modes.

查看原文本刊更多论文

基于AVX和OpenMP的核心停车对矩阵-矩阵乘法节能的影响

当今的现代计算机支持多核处理器体系结构，增强了单指令多数据计算的并行计算能力。根据内存结构，CPU核心性能在跨多核架构的节能分析中起着至关重要的作用。尽管CPU驻留完全由笔记本电脑和台式电脑的操作系统控制，但可以通过实时调整CPU核心驻留和改变频率缩放来提高性能。本文利用AVX和OpenMP，研究了并行矩阵-矩阵乘法的核心停放对共享内存的影响。当大矩阵大小在共享内存上并行增加时，内存容量和数据传输的开销不仅会增加功耗，还会降低性能。利用Advanced Vector Extensions (AVX) intrinsic和OpenMP对从1024×1024到16384×16384范围内的大方阵乘法进行了测试，并动态地改变了不同的节能分析。计算机中的默认省电配置文件是平衡模式，我们通过调整CPU停放以四种不同的模式(平衡、高性能、Bitsum最高性能和节能)来测试性能。根据测试结果，与其他分析模式相比，Bitsum最高性能模式获得了最大的性能和最小的功耗和能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 21st International Conference on Control, Automation and Systems (ICCAS)

自引率

0.00%

发文量