{"title":"可伸缩、多线程、局部就地排序","authors":"D. Haglin, Robert Adolf, Greg E. Mackey","doi":"10.1109/IPDPSW.2013.74","DOIUrl":null,"url":null,"abstract":"A recent trend in hardware development is producing computing systems that are stretching the number of cores and size of shared-memory beyond where most fundamental serial algorithms perform well. The expectation is that this trend will continue. So it makes sense to rethink our fundamental algorithms such as sorting. There are many situations where data that needs to be sorted will actually fit into the shared memory so applications could benefit from an efficient parallel sorting algorithm. When sorting large data (at least hundreds of Gigabytes) in a single shared memory, there are two factors that affect the algorithm choice. First, does the algorithm sort in-place? And second, does the algorithm scale well beyond tens of threads? Surprisingly, existing algorithms possess either one of these factors, but not both. We present an approach that gracefully degrades in performance as the amount of available working memory decreases relative to the size of the input.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable, Multithreaded, Partially-in-Place Sorting\",\"authors\":\"D. Haglin, Robert Adolf, Greg E. Mackey\",\"doi\":\"10.1109/IPDPSW.2013.74\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A recent trend in hardware development is producing computing systems that are stretching the number of cores and size of shared-memory beyond where most fundamental serial algorithms perform well. The expectation is that this trend will continue. So it makes sense to rethink our fundamental algorithms such as sorting. There are many situations where data that needs to be sorted will actually fit into the shared memory so applications could benefit from an efficient parallel sorting algorithm. When sorting large data (at least hundreds of Gigabytes) in a single shared memory, there are two factors that affect the algorithm choice. First, does the algorithm sort in-place? And second, does the algorithm scale well beyond tens of threads? Surprisingly, existing algorithms possess either one of these factors, but not both. We present an approach that gracefully degrades in performance as the amount of available working memory decreases relative to the size of the input.\",\"PeriodicalId\":234552,\"journal\":{\"name\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2013.74\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A recent trend in hardware development is producing computing systems that are stretching the number of cores and size of shared-memory beyond where most fundamental serial algorithms perform well. The expectation is that this trend will continue. So it makes sense to rethink our fundamental algorithms such as sorting. There are many situations where data that needs to be sorted will actually fit into the shared memory so applications could benefit from an efficient parallel sorting algorithm. When sorting large data (at least hundreds of Gigabytes) in a single shared memory, there are two factors that affect the algorithm choice. First, does the algorithm sort in-place? And second, does the algorithm scale well beyond tens of threads? Surprisingly, existing algorithms possess either one of these factors, but not both. We present an approach that gracefully degrades in performance as the amount of available working memory decreases relative to the size of the input.