ACM International Conference on Computing Frontiers最新文献_第4页

Addressing the challenges of future large-scale many-core architectures 解决未来大规模多核架构的挑战

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482776

P. Petrides, P. Trancoso

{"title":"Addressing the challenges of future large-scale many-core architectures","authors":"P. Petrides, P. Trancoso","doi":"10.1145/2482767.2482776","DOIUrl":"https://doi.org/10.1145/2482767.2482776","url":null,"abstract":"Current processor trends show an increasing number of cores and a diversity of characteristics among them. Such processors offer a large potential for achieving high performance for different applications. Nevertheless, exploiting the characteristics of such processors is a challenge. In particular, considering all cores to be the same for scheduling tasks is not valid any longer. In this work we address three important characteristics for future many-core processors: (1) a many-core processor will include groups of different cores, (2) the latency to access off-chip memory will be larger for cores further from the on-chip memory controller and (3) as the number of cores per memory controller increases so does the pressure regarding the off-chip access bandwidth. To address these issues we propose a task assignment policy that monitors the demands of the application task and accordingly assigns the task to a better matching core if available. The assignment policy triggers, if needed, task migration in order to optimize both the execution time and the power consumption. In this paper we describe the assignment algorithm and how we will implement it on a many-core system.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121687741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance analysis and predictability of the software layer in dynamic binary translators/optimizers 动态二进制翻译/优化器中软件层的性能分析和可预测性

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482786

Aleksandar Brankovic, Kyriakos Stavrou, E. Gibert, Antonio González

{"title":"Performance analysis and predictability of the software layer in dynamic binary translators/optimizers","authors":"Aleksandar Brankovic, Kyriakos Stavrou, E. Gibert, Antonio González","doi":"10.1145/2482767.2482786","DOIUrl":"https://doi.org/10.1145/2482767.2482786","url":null,"abstract":"Dynamic Binary Translators and Optimizers (DBTOs) have been established as a common architecture during the last years. They are used in many different systems, such as emulation, instrumentation tools and innovative HW/SW co-designed microarchitectures. Although many researchers worked on characterizing and reducing the emulation overhead, there are no published results that explain how the DBTO behaves from the microarchitectural prospective and how its behavior may be predicted based on high-level, guest application statistics. Such results are important for guiding design decisions and system optimization.\u0000 In this paper we study the DBTO as an independent application by dividing its functionality into modules. We show that the behavior of the DBTO is not constant at all. The contribution of the different modules in the total overhead, the overhead itself, the microarchitectural interaction with the emulated application and the microarchitectural profile of the different modules changes significantly based on the emulated application. This result comes in contrast to numerous papers that consider this behavior constant and exclude the DBTO from the simulation. Throughout this paper we detail this variance, we quantify it and we explain the reasons behind it.\u0000 The insights presented in this work can be exploited towards the design of more efficient DBTOs and their early performance evaluation.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134191691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Load balancing in a changing world: dealing with heterogeneity and performance variability 变化世界中的负载平衡:处理异构性和性能可变性

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482794

Michael Boyer, K. Skadron, Shuai Che, N. Jayasena

引用次数: 66

To cache or not to cache: a trade-off analysis for locally cached database systems 缓存还是不缓存:本地缓存数据库系统的权衡分析

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482807

K. Rietveld, H. Wijshoff

引用次数: 4

TCNet: cross-node virtual machine communication acceleration TCNet:跨节点虚拟机通信加速

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482810

Chunkun Bo, Rui Hou, Junmin Wu, Tao Jiang, Liuhang Zhang

{"title":"TCNet: cross-node virtual machine communication acceleration","authors":"Chunkun Bo, Rui Hou, Junmin Wu, Tao Jiang, Liuhang Zhang","doi":"10.1145/2482767.2482810","DOIUrl":"https://doi.org/10.1145/2482767.2482810","url":null,"abstract":"Driven by rapid development of cloud computing, virtualized environments are becoming popular in data center. Frequent communication among multiple virtual machines is required by a large amount of applications. Although many virtualization acceleration techniques have been proposed, the network performance is still a hot research topic due to the complicated and costly implementations of I/O virtualization mechanism. Some previous research focuses on improving the efficiency of communication among virtual machines in the same host. But studying how to accelerate cross-node virtual machine communication is also necessary. On the other hand, many high efficient, tight-coupling interconnects have been proposed as data center interconnects. They have advantages in performance and efficiency, while traditional Ethernet and InfiniBand have good scalability. However, these two kinds of interconnects can coexist very well. Tight-coupling protocol is suitable for connecting small-scale data center nodes, which we call super-node, while super-node is connected by traditional interconnect. In our opinion, data center with such hybrid interconnect architecture is one of important trends. Targeting the hybrid interconnect architecture, this paper proposes an efficient mechanism, named as TCNet (abbreviation for tight-coupling network), to accelerate cross-node virtual machine communication.\u0000 To verify the acceleration mechanism, we build a prototype system which chooses PCIe (for inner-super-node interconnect) and Ethernet (for inter-super-node interconnect) as the hybrid interconnect and use KVM as software environments. We use several benchmarks to evaluate the mechanism. The latency of TCNet is 23% shorter than that of Gigabit Ethernet on average and the bandwidth is 1.14 times as large as that of Gigabit Ethernet on average. Besides, we use Specweb2006 to evaluate its web service ability. TCNet can support 20% more clients simultaneously than that of Ethernet and response requests 19% faster. The results demonstrate that TCNet has great potential to accelerate cross-node virtual machine communication for data center with hybrid interconnect.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131602880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Investigating hybrid SSD FTL schemes for Hadoop workloads 研究Hadoop工作负载的混合SSD FTL方案

ACM International Conference on Computing Frontiers Pub Date : 2013-05-14 DOI: 10.1145/2482767.2482793

Hyeran Jeon, Kaoutar El Maghraoui, G. Kandiraju

{"title":"Investigating hybrid SSD FTL schemes for Hadoop workloads","authors":"Hyeran Jeon, Kaoutar El Maghraoui, G. Kandiraju","doi":"10.1145/2482767.2482793","DOIUrl":"https://doi.org/10.1145/2482767.2482793","url":null,"abstract":"The Flash Translation Layer (FTL) is the core engine for Solid State Disks (SSD). It is responsible for managing the virtual to physical address mappings and emulating the functionality of a normal block-level device. SSD performance is highly dependent on the design of the FTL. For the last few years, several FTL schemes have been proposed. Hybrid FTL schemes have gained more popularity since they try to combine the benefits of both page-level mapping and block-level mapping schemes. Examples include BAST, FAST, LAST, etc. To provide high performance, FTL designers face several cross cutting issues: the right balance between coarse and fine grain address mapping, the asymmetric nature of reads and writes, the write amplification property of Flash memory, and the wear-out behavior of flash.\u0000 The MapReduce paradigm has become a very popular paradigm for performing parallel and distributed computations on large data. Hadoop, an open-source implementation of MapReduce, has accelerated MapReduce adoption. Flash SSD is increasingly being used as a storage solution in Hadoop deployments for faster processing and better energy utilization. Little work has been done to understand the endurance implications of SSD on Hadoop-based workloads. In this paper, using a highly flexible and reconfigurable kernel-level simulation infrastructure, we investigate the internal characteristics of various hybrid FTL schemes using a representative set of Hadoop workloads. Our investigation brings out the wear-out behavior of SSD for Hadoop-based workloads including wear-leveling details, garbage collection, translation and block/page mappings, and advocates the need for dynamic tuning of FTL parameters for these workloads.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130399597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

D3AS project: a different approach to the manycore challenges D3AS项目:解决多核心挑战的不同方法

ACM International Conference on Computing Frontiers Pub Date : 2012-05-15 DOI: 10.1145/2212908.2212948

L. Verdoscia, R. Vaccaro

引用次数: 3

SnCTM: reducing false transaction aborts by adaptively changing the source of conflict detection SnCTM:通过自适应地改变冲突检测源来减少错误的事务中止

ACM International Conference on Computing Frontiers Pub Date : 2012-05-15 DOI: 10.1145/2212908.2212919

Isuru Herath, Demian Rosas-Ham, M. Luján, I. Watson

{"title":"SnCTM: reducing false transaction aborts by adaptively changing the source of conflict detection","authors":"Isuru Herath, Demian Rosas-Ham, M. Luján, I. Watson","doi":"10.1145/2212908.2212919","DOIUrl":"https://doi.org/10.1145/2212908.2212919","url":null,"abstract":"Optimistic concurrency provided by Transactional Memory (TM) makes it a good candidate for maintaining synchronization in future multi-core processors. Speculative execution and bulk level conflict detection enable TM to provide synchronization at fine grain without the complexity of managing fine grain locks. Early hardware TM systems proposed to store the information needed for checking conflicts in the Level 1 (L1) cache, thereby limiting the size of a transaction to the size of the L1 cache. The introduction of signatures to TM systems removed this limitation and allowed transactions to be of any size.\u0000 However signatures produce false positives which leads to performance degradation in TM systems. The objective of introducing signatures to TM is that the size of a transaction can be bigger than the L1 cache. Once signatures are integrated to a TM system, they are used to detect conflicts regardless of the size of a transaction. This means signatures are being used even for transactions that can store their read and write sets in the L1 cache.\u0000 Based on this observation we propose SnCTM, a TM system that adaptively changes the source used to detect conflicts. In our approach, when a transaction fits in the L1 cache, cache line information is used to detect conflicts and signatures are used otherwise. By adaptively changing the source, SnCTM achieved up to 4.62 and 2.93 times speed-up over a baseline TM using lazy versioning and lazy conflict detection with two commonly used signature configurations. We also show that our system, even with a smaller signature (64 bit), can achieve performance comparable to a system with a perfect signature (8k bit).","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"88 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127027363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Mont-Blanc: towards energy-efficient HPC systems 勃朗峰:迈向高能效高性能计算系统

ACM International Conference on Computing Frontiers Pub Date : 2012-05-15 DOI: 10.1145/2212908.2212961

Nikola Puzovic

{"title":"Mont-Blanc: towards energy-efficient HPC systems","authors":"Nikola Puzovic","doi":"10.1145/2212908.2212961","DOIUrl":"https://doi.org/10.1145/2212908.2212961","url":null,"abstract":"This talk will present the Mont-Blanc project, an European initiative to build exascale systems using energy-efficient parts coming from the embedded market. The energy consumption of current general purpose and high-performance chips would require an unaffordable total power budget for an exascale system to be build using these parts.\u0000 The Mont-Blanc project aims to lower the total power of exascale systems by using parts from the embedded market which have a much higher FLOPS/Watt ration than traditional general purpose processor, at the cost of a lower peak performance per chip. Hence, exascale systems built using embedded parts would require a very high number of processors. In this context, overlapping communications and computations is key for applications to reach the system peak performance. This would require highly tuned application code which most users would not be able to afford. The Mont-Blanc project heavily relies on the OmpSs programming model. OmpSs provide a simple parallel programming interface that most users can easily use, and an advanced runtime system that automatically overlaps computation and communication. Furthermore, the OmpSs runtime system is also able to dynamically adapt the load of each node to accomplish the overall system load balance.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"546 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122503123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems GA-GPU:扩展可扩展异构计算系统的基于库的全局地址空间规划模型

ACM International Conference on Computing Frontiers Pub Date : 2012-05-15 DOI: 10.1145/2212908.2212918

V. Tipparaju, J. Vetter

{"title":"GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems","authors":"V. Tipparaju, J. Vetter","doi":"10.1145/2212908.2212918","DOIUrl":"https://doi.org/10.1145/2212908.2212918","url":null,"abstract":"Scalable heterogeneous computing (SHC) architectures are emerging as a response to new requirements for low cost, power efficiency, and high performance. For example, numerous contemporary HPC systems are using commodity Graphical Processing Units (GPU) to supplement traditional multicore processors. Yet scientists still face a number of challenges in utilizing SHC systems. First and foremost, they are forced to combine a number of programming models and then delicately optimize the data movement among these multiple programming systems on each architecture. In this paper, we investigate a new programming model for SHC systems that attempts to unify data access to the aggregate memory available in GPUs in the system. In particular, we extend the popular and easy to use Global Address Space (GAS) programming model to SHC systems. We explore multiple implementation options, and demonstrate our solution in the context of Global Arrays, a library based GAS model. We then evaluate these options in the context of kernels and applications, such as a scalable chemistry application: NWChem. Our results reveal that GA-GPU can offer considerable benefit to users in terms of programmability, and both our empirical results and performance model provide encouraging performance benefits for future systems that offer a tightly integrated memory system.","PeriodicalId":430420,"journal":{"name":"ACM International Conference on Computing Frontiers","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129301585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5