{"title":"A learning‐based approach to regression analysis for climate data–A case of Northeast China","authors":"Jiaxu Guo, Yidan Xu, Liang Hu, Xianwei Wu, Gaochao Xu, Xilong Che","doi":"10.1002/eng2.12797","DOIUrl":"https://doi.org/10.1002/eng2.12797","url":null,"abstract":"Global climate change is an important issue that all of humanity needs to address together. Precipitation is an important climatic feature for agricultural development and food security, and the study of precipitation and its associated climatic factors is important for the analysis of global change. As an important part of China's food production, Northeast China has a temperate monsoon climate with simultaneous rain and heat, which is favorable for crop growth. In this paper, a scientific workflow for climate data analysis with a learning‐based method is designed. Using climate data from typical models in CMIP6, a machine learning‐based approach is used to establish regression relationships between precipitation and climate variables such as temperature, humidity and wind speed in Northeast China, which is validated through a time series approach. We design a weight‐based model ensemble method and a learning‐based bias correction method, so that the ensemble model can achieve better performance. We also analyze the precipitation trends in Northeast China under the three Shared Socio‐economic Pathways (SSPs). This will help researchers to analyze the long‐term evolution and factors of climate.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":"210 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138981265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Refactoring BZIP2 on the new‐generation sunway supercomputer","authors":"Xiaohui Liu, Zekun Yin, Haodong Tian, Wubing Wan, Mengyuan Hua, Wenlai Zhao, Zhenchun Huang, Ping Gao, Fangjin Zhu, Hua Wang, Xiaohui Duan","doi":"10.1002/eng2.12806","DOIUrl":"https://doi.org/10.1002/eng2.12806","url":null,"abstract":"High‐performance computing is progressively assuming a fundamental role in advancing scientific research and engineering domains. However, the ever‐expanding scales of scientific simulations pose challenges for efficient data I/O and storage. The data compression technology has garnered significant attention as a solution to reduce data transmission and storage costs while enhancing performance. In particular, the BZIP2 lossless compression algorithm has been widely used due to its exceptional compression ratio, moderate compression speed, high reliability, and open‐source nature. This paper focuses on the design and realization of a parallelized BZIP2 algorithm tailored for deployment on the New‐Generation Sunway supercomputing platform. By leveraging the unique cache patterns of the New‐Generation Sunway processor, we propose the highly tuned multi‐threading and multi‐node implementations of the BZIP2 applications for different scenarios. Moreover, we also propose the efficient BZIP2 libraries based on the management processing element and computing processing element which support the commonly used high‐level (de)compression interfaces. The test results indicate that the our multi‐threading implementation achieves maximum speedup of 23.09 (8.57) in decompression(compression) compared to the sequential implementation. Furthermore, the multi‐node implementation achieves 50.81% (26.35%) parallel efficiency and peak performance of 16.6 GB/s (52.8 GB/s) for compression(decompression) when scaling up to 2048 processes.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135821318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaoqi Sun, Zhen Wang, Mengyuan Hua, Puyu Xiong, Wubing Wan, Ping Gao, Wenlai Zhao, Zhenchun Huang, Lin Han
{"title":"Accelerating ray tracing engine of <scp>BLENDER</scp> on the new Sunway architecture","authors":"Zhaoqi Sun, Zhen Wang, Mengyuan Hua, Puyu Xiong, Wubing Wan, Ping Gao, Wenlai Zhao, Zhenchun Huang, Lin Han","doi":"10.1002/eng2.12789","DOIUrl":"https://doi.org/10.1002/eng2.12789","url":null,"abstract":"Abstract With the increasing popularity of high‐resolution displays, there is a growing demand for more realistic rendered images. Ray tracing has become the most effective algorithm for image rendering, but its complexity and large amount of computing data require sophisticated HPC solutions. In this article, we present our efforts to port the ray tracing engine CYCLES of Blender to the new generation of Sunway supercomputers. We propose optimizations that are tailored to the new hardware architecture, including a multi‐level parallel scheme that efficiently maps and scales Blender onto the novel Sunway architecture, strategies to address memory bottlenecks, a revised task dispatching method that achieves excellent load balancing, and a pipeline approach that maximizes computation and communication overlap. By combining all these optimizations, we achieve a significant reduction in rendering time for a single‐frame image, from 2260 s using the single‐core serial version to 71 s using 48 processes, which is a speedup of about 128×. Accelerating the ray tracing engine CYCLES of Blender in the new generation of Sunway supercomputers.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":"18 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135863399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Field‐programmable gate array acceleration of the Tersoff potential in LAMMPS","authors":"Quan Deng, Qiang Liu","doi":"10.1002/eng2.12694","DOIUrl":"https://doi.org/10.1002/eng2.12694","url":null,"abstract":"Abstract Molecular dynamics simulation is a common method to help humans understand the microscopic world. The traditional general‐purpose high‐performance computing platforms are hindered by low computational and power efficiency, constraining the practical application of large‐scale and long‐time many‐body molecular dynamics simulations. In order to address these problems, a novel molecular dynamics accelerator for the Tersoff potential is designed based on field‐programmable gate array (FPGA) platforms, which enables the acceleration of LAMMPS using FPGAs. Firstly, an on‐the‐fly method is proposed to build neighbor lists and reduce storage usage. Besides, multilevel parallelizations are implemented to enable the accelerator to be flexibly deployed on FPGAs of different scales and achieve good performance. Finally, mathematical models of the accelerator are built, and a method for using the models to determine the optimal‐performance parameters is proposed. Experimental results show that, when tested on the Xilinx Alveo U200, the proposed accelerator achieves a performance of 9.51 ns/day for the Tersoff simulation in a 55,296‐atom system, which is a 2.00 increase in performance when compared to Intel I7‐8700K and 1.70 to NVIDIA Tesla K40c under the same test case. In addition, in terms of computational efficiency and power efficiency, the proposed accelerator achieves improvements of 2.00 and 7.19 compared to Intel I7‐8700K, and 4.33 and 2.11 compared to NVIDIA Titan Xp, respectively.","PeriodicalId":11735,"journal":{"name":"Engineering Reports","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135792476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}