{"title":"使用优化的AOCL-FFTW提高量子浓缩咖啡的性能","authors":"S. Raut","doi":"10.1109/HPEC55821.2022.9926349","DOIUrl":null,"url":null,"abstract":"Quantum Espresso (QE) is an open-source software suite for electronic-structure calculations and materials modeling at the nanoscale. QE depends upon multiple libraries including an internal or external library for FFT computations. The iterative diagonalization process and the computation of charge density in QE use forward and inverse 3D FFTs that account for a large portion of the total application runtime. AOCL- FFTW is the FFT library recommended for QE on AMD CPU systems. QE currently uses the FFTW library in a sub-optimal manner thereby not achieving the best performance. This paper presents a new set of design and implementation strategies applied in AOCL-FFTW to overcome the major limitations of QE in its use of FFTW without requiring any code changes in QE. Results showcasing the performance benefits of the proposed optimizations in AOCL-FFTW are presented in this paper. Speed-ups are achieved in single-node and multi-node test executions that help to accelerate the QE application.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance speedup of Quantum Espresso using optimized AOCL-FFTW\",\"authors\":\"S. Raut\",\"doi\":\"10.1109/HPEC55821.2022.9926349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quantum Espresso (QE) is an open-source software suite for electronic-structure calculations and materials modeling at the nanoscale. QE depends upon multiple libraries including an internal or external library for FFT computations. The iterative diagonalization process and the computation of charge density in QE use forward and inverse 3D FFTs that account for a large portion of the total application runtime. AOCL- FFTW is the FFT library recommended for QE on AMD CPU systems. QE currently uses the FFTW library in a sub-optimal manner thereby not achieving the best performance. This paper presents a new set of design and implementation strategies applied in AOCL-FFTW to overcome the major limitations of QE in its use of FFTW without requiring any code changes in QE. Results showcasing the performance benefits of the proposed optimizations in AOCL-FFTW are presented in this paper. Speed-ups are achieved in single-node and multi-node test executions that help to accelerate the QE application.\",\"PeriodicalId\":200071,\"journal\":{\"name\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC55821.2022.9926349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance speedup of Quantum Espresso using optimized AOCL-FFTW
Quantum Espresso (QE) is an open-source software suite for electronic-structure calculations and materials modeling at the nanoscale. QE depends upon multiple libraries including an internal or external library for FFT computations. The iterative diagonalization process and the computation of charge density in QE use forward and inverse 3D FFTs that account for a large portion of the total application runtime. AOCL- FFTW is the FFT library recommended for QE on AMD CPU systems. QE currently uses the FFTW library in a sub-optimal manner thereby not achieving the best performance. This paper presents a new set of design and implementation strategies applied in AOCL-FFTW to overcome the major limitations of QE in its use of FFTW without requiring any code changes in QE. Results showcasing the performance benefits of the proposed optimizations in AOCL-FFTW are presented in this paper. Speed-ups are achieved in single-node and multi-node test executions that help to accelerate the QE application.