2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)最新文献

筛选
英文 中文
Discovering and understanding performance bottlenecks in transactional applications 发现并理解事务性应用程序中的性能瓶颈
Ferad Zyulkyarov, Srdjan Stipic, T. Harris, O. Unsal, A. Cristal, I. Hur, M. Valero
{"title":"Discovering and understanding performance bottlenecks in transactional applications","authors":"Ferad Zyulkyarov, Srdjan Stipic, T. Harris, O. Unsal, A. Cristal, I. Hur, M. Valero","doi":"10.1145/1854273.1854311","DOIUrl":"https://doi.org/10.1145/1854273.1854311","url":null,"abstract":"Many researchers have developed applications using transactional memory (TM) with the purpose of benchmarking different implementations, and studying whether or not TM is easy to use. However, comparatively little has been done to provide general-purpose tools for profiling and tuning programs which use transactions.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115046329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Approximating age-based arbitration in on-chip networks 在片上网络中近似基于年龄的仲裁
M. J. Lee, John Kim, D. Abts, Michael R. Marty, Jae W. Lee
{"title":"Approximating age-based arbitration in on-chip networks","authors":"M. J. Lee, John Kim, D. Abts, Michael R. Marty, Jae W. Lee","doi":"10.1145/1854273.1854359","DOIUrl":"https://doi.org/10.1145/1854273.1854359","url":null,"abstract":"The on-chip network of emerging many-core CMPs enables the sharing of numerous on-chip components. This on-chip network needs to ensure fairness when accessing the shared resources. In this work, we propose providing equality of service (EoS) in future many-core CMPs on-chip networks by leveraging distance, or hop count, to approximate the age of packets in the network. We propose probabilistic arbitration combined with distance-based weights to achieve EoS and overcome the limitation of conventional round-robin arbiter. We describe how nonlinear weights need to be used with probabilistic arbiters and propose three different arbitration weight metrics - fixed weight, constantly increasing weight, and variably increasing weight. By only modifying the arbitration of an on-chip router, we do not require any additional buffers or virtual channels and create a complexity-effective mechanism for achieving EoS.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132112537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SPACE: Sharing pattern-based directory coherence for multicore scalability SPACE:共享基于模式的目录一致性,实现多核可伸缩性
Hongzhou Zhao, Arrvindh Shriraman, S. Dwarkadas
{"title":"SPACE: Sharing pattern-based directory coherence for multicore scalability","authors":"Hongzhou Zhao, Arrvindh Shriraman, S. Dwarkadas","doi":"10.1145/1854273.1854294","DOIUrl":"https://doi.org/10.1145/1854273.1854294","url":null,"abstract":"An important challenge in multicore processors is the maintenance of cache coherence in a scalable manner. Directory-based protocols save bandwidth and achieve scalability by associating information about sharer cores with every cache block. As the number of cores and cache sizes increase, the directory itself adds significant area and energy overheads.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133010893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Ordered and unordered algorithms for parallel breadth first search 并行广度优先搜索的有序和无序算法
M. A. Hassaan, Martin Burtscher, K. Pingali
{"title":"Ordered and unordered algorithms for parallel breadth first search","authors":"M. A. Hassaan, Martin Burtscher, K. Pingali","doi":"10.1145/1854273.1854341","DOIUrl":"https://doi.org/10.1145/1854273.1854341","url":null,"abstract":"We describe and evaluate ordered and unordered algorithms for shared-memory parallel breadth-first search. The unordered algorithm is based on viewing breadth-first search as a fixpoint computation, and in general, it may perform more work than the ordered algorithms while requiring less global synchronization.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133083623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
System-level Max POwer (SYMPO) - a systematic approach for escalating system-level power consumption using synthetic benchmarks 系统级最大功率(SYMPO)——一种使用综合基准提高系统级功耗的系统方法
K. Ganesan, Jungho Jo, W. Bircher, Dimitris Kaseridis, Zhibin Yu, L. John
{"title":"System-level Max POwer (SYMPO) - a systematic approach for escalating system-level power consumption using synthetic benchmarks","authors":"K. Ganesan, Jungho Jo, W. Bircher, Dimitris Kaseridis, Zhibin Yu, L. John","doi":"10.1145/1854273.1854282","DOIUrl":"https://doi.org/10.1145/1854273.1854282","url":null,"abstract":"To effectively design a computer system for the worst case power consumption scenario, system architects often use hand-crafted maximum power consuming benchmarks at the assembly language level. These stressmarks, also called power viruses, are very tedious to generate and require significant domain knowledge. In this paper, we propose SYMPO, an automatic SYstem level Max POwer virus generation framework, which maximizes the power consumption of the CPU and the memory system using genetic algorithm and an abstract workload generation framework. For a set of three ISAs, we show the efficacy of the power viruses generated using SYMPO by comparing the power consumption with that of MPrime torture test, which is widely used by industry to test system stability. Our results show that the usage of SYMPO results in the generation of power viruses that consume 14–41% more power compared to MPrime on SPARC ISA. The genetic algorithm achieved this result in about 70 to 90 generations in 11 to 15 hours when using a full system simulator. We also show that the power viruses generated in the Alpha ISA consume 9–24% more power compared to the previous approach of stressmark generation. We measure and provide the power consumption of these benchmarks on hardware by instrumenting a quad-core AMD Phenom II X4 system. The SYMPO power virus consumes more power compared to various industry grade power viruses on x86 hardware. We also provide a microarchitecture independent characterization of various industry standard power viruses.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling tile - mapreduce:通过平铺优化多核数据并行应用程序的资源使用
Rong-Xin Chen, Haibo Chen, B. Zang
{"title":"Tiled-MapReduce: Optimizing resource usages of data-parallel applications on multicore with tiling","authors":"Rong-Xin Chen, Haibo Chen, B. Zang","doi":"10.1145/1854273.1854337","DOIUrl":"https://doi.org/10.1145/1854273.1854337","url":null,"abstract":"The prevalence of chip multiprocessor opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. MapReduce, a simple and elegant programming model to program large scale clusters, has recently been shown to be a promising alternative to harness the multicore platform.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems 高能效推测线程:同一isa异构多核系统中的动态线程分配
Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai
{"title":"Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems","authors":"Yangchun Luo, Venkatesan Packirisamy, W. Hsu, Antonia Zhai","doi":"10.1145/1854273.1854329","DOIUrl":"https://doi.org/10.1145/1854273.1854329","url":null,"abstract":"Thread-level parallelism at the chip level is critical in overcoming some of the challenges that have been ushered in through the advent of modern multicore processors (CMP). Extracting speculatively parallel threads from sequential applications and executing these threads on multicore processors is a promising technique to speed up these applications on multicore systems. However, the potential degradation in energy efficiency associated is an important factor that hinders the deployment of this technique. For multicore systems that integrate same-ISA heterogeneous cores, it is possible to judiciously allocate speculative threads to achieve energy-efficient performance improvement.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131382451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Adaptive spatiotemporal node selection in dynamic networks 动态网络中的自适应时空节点选择
P. Hari, John B. P. McCabe, Jon Banafato, Marcus Henry, Kevin Ko, Emmanouil Koukoumidis, U. Kremer, M. Martonosi, L. Peh
{"title":"Adaptive spatiotemporal node selection in dynamic networks","authors":"P. Hari, John B. P. McCabe, Jon Banafato, Marcus Henry, Kevin Ko, Emmanouil Koukoumidis, U. Kremer, M. Martonosi, L. Peh","doi":"10.1145/1854273.1854304","DOIUrl":"https://doi.org/10.1145/1854273.1854304","url":null,"abstract":"Dynamic networks—spontaneous, self-organizing groups of devices—are a promising new computing platform. Writing applications for such networks is a daunting task, however, due to their extreme variability and unpredictability, with many devices having significant resource limitations. Intelligent, automated distribution of work across network nodes is needed to get the most out of limited resource budgets.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116946558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic vector instruction selection for dynamic compilation 动态编译的自动矢量指令选择
R. Barik, Jisheng Zhao, Vivek Sarkar
{"title":"Automatic vector instruction selection for dynamic compilation","authors":"R. Barik, Jisheng Zhao, Vivek Sarkar","doi":"10.1145/1854273.1854358","DOIUrl":"https://doi.org/10.1145/1854273.1854358","url":null,"abstract":"Accelerating program performance via short SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, and AltiVec SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers still fall short of using the full potential of vector units, and only generate vector code for simple loop nests. In this poster, we present the design and implementation of an auto-vectorization framework in the back-end of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. Additionally, we describe a vector instruction selection algorithm based on dynamic programming. Our results obtained in JikesRVM dynamic compilation environment show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116503568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Feedback-directed pipeline parallelism 反馈导向的管道并行性
M. A. Suleman, Moinuddin K. Qureshi, Khubaib, Y. Patt
{"title":"Feedback-directed pipeline parallelism","authors":"M. A. Suleman, Moinuddin K. Qureshi, Khubaib, Y. Patt","doi":"10.1145/1854273.1854296","DOIUrl":"https://doi.org/10.1145/1854273.1854296","url":null,"abstract":"Extracting high performance from Chip Multiprocessors requires that the application be parallelized. A common software technique to parallelize loops is pipeline parallelism in which the programmer/compiler splits each loop iteration into stages and each stage runs on a certain number of cores. It is important to choose the number of cores for each stage carefully because the core-to-stage allocation determines performance and power consumption. Finding the best core-to-stage allocation for an application is challenging because the number of possible allocations is large, and the best allocation depends on the input set and machine configuration. This paper proposes Feedback-Directed Pipelining (FDP), a software framework that chooses the core-to-stage allocation at run-time. FDP first maximizes the performance of the workload and then saves power by reducing the number of active cores, without impacting performance. Our evaluation on a real SMP system with two Core2Quad processors (8 cores) shows that FDP provides an average speedup of 4.2x which is significantly higher than the 2.3x speedup obtained with a practical profile-based allocation. We also show that FDP is robust to changes in machine configuration and input set.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127070306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信