{"title":"基于CMP数据共享的动态线程划分算法","authors":"Deng Zhou, Ye Tian, Hong Shen","doi":"10.1109/PDCAT.2011.36","DOIUrl":null,"url":null,"abstract":"At the level of multi-core processors that share the same cache, data sharing among threads which belong to different cores may not enjoy the benifit of non-uniform cache access because it is difficult to determine which core should be set as the local position of data block while each cache block is setting as one of the core's local block. Studies have found that the cost of long latency access can be reduced by using a proper thread partition/allocation algorithm [5]. However, at present work, researchers pay little attention to thread partitioning algorithms which can reduce the cost of long latency access. In this paper, we present a dynamic thread partitioning algorithm according to data sharing among threads at the level of cache-shared-multicore processers. In our design, the algorithm makes the best effort to minimize shared block accessed by threads of different cores. Compared with the existing work, our new algorithm achieves a performance improvement. We perform experiments on 4 cores and more than 100 threads and the result show that our algorithm can reduce the interaction of threads belonging to different cores between 30% and 50% over the previously known solutions.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamic Thread Partition Algorithm Based on Sharing Data on CMP\",\"authors\":\"Deng Zhou, Ye Tian, Hong Shen\",\"doi\":\"10.1109/PDCAT.2011.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At the level of multi-core processors that share the same cache, data sharing among threads which belong to different cores may not enjoy the benifit of non-uniform cache access because it is difficult to determine which core should be set as the local position of data block while each cache block is setting as one of the core's local block. Studies have found that the cost of long latency access can be reduced by using a proper thread partition/allocation algorithm [5]. However, at present work, researchers pay little attention to thread partitioning algorithms which can reduce the cost of long latency access. In this paper, we present a dynamic thread partitioning algorithm according to data sharing among threads at the level of cache-shared-multicore processers. In our design, the algorithm makes the best effort to minimize shared block accessed by threads of different cores. Compared with the existing work, our new algorithm achieves a performance improvement. We perform experiments on 4 cores and more than 100 threads and the result show that our algorithm can reduce the interaction of threads belonging to different cores between 30% and 50% over the previously known solutions.\",\"PeriodicalId\":137617,\"journal\":{\"name\":\"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2011.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2011.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dynamic Thread Partition Algorithm Based on Sharing Data on CMP
At the level of multi-core processors that share the same cache, data sharing among threads which belong to different cores may not enjoy the benifit of non-uniform cache access because it is difficult to determine which core should be set as the local position of data block while each cache block is setting as one of the core's local block. Studies have found that the cost of long latency access can be reduced by using a proper thread partition/allocation algorithm [5]. However, at present work, researchers pay little attention to thread partitioning algorithms which can reduce the cost of long latency access. In this paper, we present a dynamic thread partitioning algorithm according to data sharing among threads at the level of cache-shared-multicore processers. In our design, the algorithm makes the best effort to minimize shared block accessed by threads of different cores. Compared with the existing work, our new algorithm achieves a performance improvement. We perform experiments on 4 cores and more than 100 threads and the result show that our algorithm can reduce the interaction of threads belonging to different cores between 30% and 50% over the previously known solutions.