K. Ibrahim, Samuel Williams, E. Epifanovsky, A. Krylov
{"title":"libtensor框架在多核架构下的分析与调优","authors":"K. Ibrahim, Samuel Williams, E. Epifanovsky, A. Krylov","doi":"10.1109/HIPC.2014.7116881","DOIUrl":null,"url":null,"abstract":"Libtensor is a framework designed to implement the tensor contractions arising form the coupled cluster and equations of motion computational quantum chemistry equations. It has been optimized for symmetry and sparsity to be memory efficient. This allows it to run efficiently on the ubiquitous and cost-effective SMP architectures. Unfortunately, movement of memory controllers on chip has endowed these SMP systems with strong NUMA properties. Moreover, the many core trend in processor architecture demands that the implementation be extremely thread-scalable on node. To date, Libtensor has been generally agnostic of these effects. To that end, in this paper, we explore a number of optimization techniques including a thread-friendly and NUMA-aware memory allocator and garbage collector, tuning the tensor tiling factor, and tuning the scheduling quanta. In the end, our optimizations can improve the performance of contractions implemented in Libtensor by up to 2× on representative Ivy Bridge, Nehalem, and Opteron SMPs.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Analysis and tuning of libtensor framework on multicore architectures\",\"authors\":\"K. Ibrahim, Samuel Williams, E. Epifanovsky, A. Krylov\",\"doi\":\"10.1109/HIPC.2014.7116881\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Libtensor is a framework designed to implement the tensor contractions arising form the coupled cluster and equations of motion computational quantum chemistry equations. It has been optimized for symmetry and sparsity to be memory efficient. This allows it to run efficiently on the ubiquitous and cost-effective SMP architectures. Unfortunately, movement of memory controllers on chip has endowed these SMP systems with strong NUMA properties. Moreover, the many core trend in processor architecture demands that the implementation be extremely thread-scalable on node. To date, Libtensor has been generally agnostic of these effects. To that end, in this paper, we explore a number of optimization techniques including a thread-friendly and NUMA-aware memory allocator and garbage collector, tuning the tensor tiling factor, and tuning the scheduling quanta. In the end, our optimizations can improve the performance of contractions implemented in Libtensor by up to 2× on representative Ivy Bridge, Nehalem, and Opteron SMPs.\",\"PeriodicalId\":337777,\"journal\":{\"name\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIPC.2014.7116881\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPC.2014.7116881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis and tuning of libtensor framework on multicore architectures
Libtensor is a framework designed to implement the tensor contractions arising form the coupled cluster and equations of motion computational quantum chemistry equations. It has been optimized for symmetry and sparsity to be memory efficient. This allows it to run efficiently on the ubiquitous and cost-effective SMP architectures. Unfortunately, movement of memory controllers on chip has endowed these SMP systems with strong NUMA properties. Moreover, the many core trend in processor architecture demands that the implementation be extremely thread-scalable on node. To date, Libtensor has been generally agnostic of these effects. To that end, in this paper, we explore a number of optimization techniques including a thread-friendly and NUMA-aware memory allocator and garbage collector, tuning the tensor tiling factor, and tuning the scheduling quanta. In the end, our optimizations can improve the performance of contractions implemented in Libtensor by up to 2× on representative Ivy Bridge, Nehalem, and Opteron SMPs.