C. Young, Joseph A. Bank, R. Dror, J. P. Grossman, J. Salmon, D. Shaw
{"title":"一个32x32x32,空间分布的3D FFT在4微秒安东","authors":"C. Young, Joseph A. Bank, R. Dror, J. P. Grossman, J. Salmon, D. Shaw","doi":"10.1145/1654059.1654083","DOIUrl":null,"url":null,"abstract":"Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32 × 32 × 32 FFT in 3.7 microseconds and a 64 × 64 × 64 FFT in 13.3 microseconds on a configuration with 512 nodes-an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton\",\"authors\":\"C. Young, Joseph A. Bank, R. Dror, J. P. Grossman, J. Salmon, D. Shaw\",\"doi\":\"10.1145/1654059.1654083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32 × 32 × 32 FFT in 3.7 microseconds and a 64 × 64 × 64 FFT in 13.3 microseconds on a configuration with 512 nodes-an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.\",\"PeriodicalId\":371415,\"journal\":{\"name\":\"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1654059.1654083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1654059.1654083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton
Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32 × 32 × 32 FFT in 3.7 microseconds and a 64 × 64 × 64 FFT in 13.3 microseconds on a configuration with 512 nodes-an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.