声子玻尔兹曼输运方程解的可伸缩并行化

Proceedings of the 37th International Conference on Supercomputing Pub Date : 2023-06-21 DOI:10.1145/3577193.3593723

H. Tran, Siddharth Saurav, P. Sadayappan, S. Mazumder, H. Sundar

{"title":"声子玻尔兹曼输运方程解的可伸缩并行化","authors":"H. Tran, Siddharth Saurav, P. Sadayappan, S. Mazumder, H. Sundar","doi":"10.1145/3577193.3593723","DOIUrl":null,"url":null,"abstract":"The Boltzmann Transport Equation (BTE) for phonons is often used to predict thermal transport at submicron scales in semiconductors. The BTE is a seven-dimensional nonlinear integro-differential equation, resulting in difficulty in its solution even after linearization under the single relaxation time approximation. Furthermore, parallelization and load balancing are challenging, given the high dimensionality and variability of the linear systems' conditioning. This work presents a 'synthetic' scalable parallelization method for solving the BTE on large-scale systems. The method includes cell-based parallelization, combined band+cell-based parallelization, and batching technique. The essential computational ingredient of cell-based parallelization is a sparse matrix-vector product (SpMV) that can be integrated with an existing linear algebra library like PETSc. The combined approach enhances the cell-based method by further parallelizing the band dimension to take advantage of low inter-band communication costs. For the batched approach, we developed a batched SpMV that enables multiple linear systems to be solved simultaneously, merging many MPI messages to reduce communication costs, thus maintaining scalability when the grain size becomes very small. We present numerical experiments to demonstrate our method's excellent speedups and scalability up to 16384 cores for a problem with 12.6 billion unknowns.","PeriodicalId":424155,"journal":{"name":"Proceedings of the 37th International Conference on Supercomputing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable parallelization for the solution of phonon Boltzmann Transport Equation\",\"authors\":\"H. Tran, Siddharth Saurav, P. Sadayappan, S. Mazumder, H. Sundar\",\"doi\":\"10.1145/3577193.3593723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Boltzmann Transport Equation (BTE) for phonons is often used to predict thermal transport at submicron scales in semiconductors. The BTE is a seven-dimensional nonlinear integro-differential equation, resulting in difficulty in its solution even after linearization under the single relaxation time approximation. Furthermore, parallelization and load balancing are challenging, given the high dimensionality and variability of the linear systems' conditioning. This work presents a 'synthetic' scalable parallelization method for solving the BTE on large-scale systems. The method includes cell-based parallelization, combined band+cell-based parallelization, and batching technique. The essential computational ingredient of cell-based parallelization is a sparse matrix-vector product (SpMV) that can be integrated with an existing linear algebra library like PETSc. The combined approach enhances the cell-based method by further parallelizing the band dimension to take advantage of low inter-band communication costs. For the batched approach, we developed a batched SpMV that enables multiple linear systems to be solved simultaneously, merging many MPI messages to reduce communication costs, thus maintaining scalability when the grain size becomes very small. We present numerical experiments to demonstrate our method's excellent speedups and scalability up to 16384 cores for a problem with 12.6 billion unknowns.\",\"PeriodicalId\":424155,\"journal\":{\"name\":\"Proceedings of the 37th International Conference on Supercomputing\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577193.3593723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

声子的玻尔兹曼输运方程(BTE)常用于预测半导体中亚微米尺度的热输运。BTE是一个七维非线性积分微分方程，在单一的松弛时间近似下，即使经过线性化处理也很难求解。此外，考虑到线性系统的高维性和可变性，并行化和负载平衡是具有挑战性的。本文提出了一种“综合”可扩展并行化方法，用于解决大规模系统上的BTE问题。该方法包括基于单元的并行化、组合频带+基于单元的并行化和批处理技术。基于单元的并行化的基本计算要素是稀疏矩阵向量积(SpMV)，它可以与现有的线性代数库(如PETSc)集成。该组合方法通过进一步并行化频带维度来增强基于小区的方法，以利用低频带间通信成本的优势。对于批处理方法，我们开发了一种批处理SpMV，可以同时解决多个线性系统，合并许多MPI消息以降低通信成本，从而在粒度变得非常小时保持可扩展性。我们提出了数值实验来证明我们的方法具有出色的加速和可扩展性，最多可达16384个内核，用于解决具有126亿个未知数的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable parallelization for the solution of phonon Boltzmann Transport Equation

The Boltzmann Transport Equation (BTE) for phonons is often used to predict thermal transport at submicron scales in semiconductors. The BTE is a seven-dimensional nonlinear integro-differential equation, resulting in difficulty in its solution even after linearization under the single relaxation time approximation. Furthermore, parallelization and load balancing are challenging, given the high dimensionality and variability of the linear systems' conditioning. This work presents a 'synthetic' scalable parallelization method for solving the BTE on large-scale systems. The method includes cell-based parallelization, combined band+cell-based parallelization, and batching technique. The essential computational ingredient of cell-based parallelization is a sparse matrix-vector product (SpMV) that can be integrated with an existing linear algebra library like PETSc. The combined approach enhances the cell-based method by further parallelizing the band dimension to take advantage of low inter-band communication costs. For the batched approach, we developed a batched SpMV that enables multiple linear systems to be solved simultaneously, merging many MPI messages to reduce communication costs, thus maintaining scalability when the grain size becomes very small. We present numerical experiments to demonstrate our method's excellent speedups and scalability up to 16384 cores for a problem with 12.6 billion unknowns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 37th International Conference on Supercomputing

自引率

0.00%

发文量