2010 22nd International Symposium on Computer Architecture and High Performance Computing最新文献

筛选
英文 中文
Improving In-memory Column-Store Database Predicate Evaluation Performance on Multi-core Systems 多核系统中内存列存储数据库谓词评估性能的改进
Hong Min, H. Franke
{"title":"Improving In-memory Column-Store Database Predicate Evaluation Performance on Multi-core Systems","authors":"Hong Min, H. Franke","doi":"10.1109/SBAC-PAD.2010.17","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.17","url":null,"abstract":"The ability to analyze a large volume of data for the purpose of business intelligence has led to various innovations in database technology. One example is the increased interest of using column-oriented data layout to address query performance in analytical and warehousing workloads. As system architectures move towards multi-core designs, it is important to address optimizing performance for these workloads on these platforms. In this paper we present SPHINX, an architecture that utilizes multi-core systems for search-based predicate evaluation operations in analytical query workloads against in-memory column store. We discuss the natural parallelism of predicate evaluations and various bottlenecks that impact search performance. We present several performance improvement techniques and apply a scan sharing technique based on cache reuse efficiency to further improve the performance. We demonstrate the performance benefits of our scan sharing scheduler over other scheduling approaches in a workload of mixed search queries.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123104397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using Support Vector Machines to Learn How to Compile a Method 使用支持向量机学习如何编译一个方法
Ricardo Nabinger Sanchez, J. N. Amaral, D. Szafron, Marius Pirvu, Mark G. Stoodley
{"title":"Using Support Vector Machines to Learn How to Compile a Method","authors":"Ricardo Nabinger Sanchez, J. N. Amaral, D. Szafron, Marius Pirvu, Mark G. Stoodley","doi":"10.1109/SBAC-PAD.2010.35","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.35","url":null,"abstract":"The question addressed in this paper is what subset of code transformations should be attempted for a given method in a Just-in-Time compilation environment. The solution proposed is to use a Support Vector Machine (SVM) to learn a model based on method features and on the measured compilation and execution times of the methods. An extensive exploration phase collects a set of example compilations to be used by the SVM to train the model. This paper reports on a work in progress. So far, linear-SVM models, applied to benchmarks from the SPECjvm98 suite, have not outperformed the compilation plans engineered by the development team over many years. However the models almost match that performance for the javac benchmark.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117229590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MOPSO Applied to Architecture Tuning with Unified Second-Level Cache for Energy and Performance Optimization 基于统一二级缓存的MOPSO在架构调优中的应用
F. Cordeiro, A. Silva-Filho, G. R. Carvalho
{"title":"MOPSO Applied to Architecture Tuning with Unified Second-Level Cache for Energy and Performance Optimization","authors":"F. Cordeiro, A. Silva-Filho, G. R. Carvalho","doi":"10.1109/SBAC-PAD.2010.40","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.40","url":null,"abstract":"Design Space Exploration (DSE) have been a suitable strategy to configure a parameterized SoC platform in terms of systems requirements such as energy and performance. In this work, a multi-objective approach (MOPSO) based on Particle Swarm Optimization was applied for DSE problems for supporting architecture tuning in memory hierarchy with unified second level cache. The proposed approach considers two objectives to be optimized: energy consumption and application performance; and allows to reduce the design space by exploring only 2,64% of the exploration space. Results of MOPSO with regard to cost function found solutions approaching Pareto Optimum in terms of energy consumption and performance in the majority of cases, about 66% of the studied cases. Experiments based on simulations were carried out on 18 applications from the Mibench and PowerStone suite benchmarks.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124653417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel Linear Octree Meshing with Immersed Surfaces 浸入曲面的平行线性八叉树网格划分
J. Camata, A. Coutinho
{"title":"Parallel Linear Octree Meshing with Immersed Surfaces","authors":"J. Camata, A. Coutinho","doi":"10.1109/SBAC-PAD.2010.26","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.26","url":null,"abstract":"A parallel octree-based mesh generation method is proposed to create reasonable-quality, geometry-adapted unstructured hexahedral meshes automatically from triangulated surface models. We present algorithms for the construction, 2:1 balancing and meshing large linear octrees on distributed memory machines. Our scheme uses efficient computer graphics algorithms for surface detection, allowing us to represent complex geometries. Is granular analysis is performed on a variety of input surfaces and demonstrates good scalability. Our implementation is able to execute the 2:1 balancing operations over 4.0e08 octants on 128 cores in less than 10 seconds per 2e05 octants/core.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130391330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model 应用于天气预报模型的负载平衡算法的比较分析
E. Rodrigues, P. Navaux, J. Panetta, Á. Fazenda, C. Mendes, L. Kalé
{"title":"A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model","authors":"E. Rodrigues, P. Navaux, J. Panetta, Á. Fazenda, C. Mendes, L. Kalé","doi":"10.1109/SBAC-PAD.2010.18","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.18","url":null,"abstract":"Among the many reasons for load imbalance in weather forecasting models, the dynamic imbalance caused by localized variations on the state of the atmosphere is the hardest one to handle. As an example, active thunderstorms may substantially increase load at a certain time step with respect to previous time steps in an unpredictable manner – after all, tracking storms is one of the reasons for running a weather forecasting model. In this paper, we present a comparative analysis of different load balancing algorithms to deal with this kind of load imbalance. We analyze the impact of these strategies on computation and communication and the effects caused by the frequency at which the load balancer is invoked on execution time. This is done without any code modification, employing the concept of processor virtualization, which basically means that the domain is over-decomposed and the unit of rebalance is a sub-domain. With this approach, we were able to reduce the execution time of a full, real-world weather model.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116866385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
High Level Power and Energy Exploration Using ArchC 利用ArchC进行高水平电力和能源勘探
T. Gupta, C. Bertolini, O. Héron, N. Ventroux, T. Zimmer, F. Marc
{"title":"High Level Power and Energy Exploration Using ArchC","authors":"T. Gupta, C. Bertolini, O. Héron, N. Ventroux, T. Zimmer, F. Marc","doi":"10.1109/SBAC-PAD.2010.13","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.13","url":null,"abstract":"With the increase in the design complexity of MPSoC architectures, estimating power consumption is very complex and time consuming at lower level of abstraction. We propose a methodology using ArchC named Power-ArchC for a fast high-level estimation of processor power consumption. Power values are obtained by an instruction level power characterization at gate level. The requirements for power evaluation infrastructure are compatible processor models written in ArchC and RTL, and the Technology library. We show power results for a 32-bit MIPS processor with different benchmarks, based on 45nm technology.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130331555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Performance Debugging of GPGPU Applications with the Divergence Map GPGPU应用发散图的性能调试
Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, Wagner Meira Jr
{"title":"Performance Debugging of GPGPU Applications with the Divergence Map","authors":"Bruno Coutinho, Diogo Sampaio, Fernando Magno Quintão Pereira, Wagner Meira Jr","doi":"10.1109/SBAC-PAD.2010.38","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.38","url":null,"abstract":"The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full bene t of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take di erent program paths due to branches in the code. In face of divergences, some threads will have to wait, idly, while their diverging siblings execute. Optimizing the code to avoid divergences is diffcult, because this task demands a deep understanding of programs that might be large and convoluted. In order to facilitate the detection of divergences, this paper introduces the divergence map, a data structure that indicates the location and the volume of divergences in a program. We build this map via dynamic profiling techniques, which we have implemented on top of an open source CUDA compiler. To illustrate the importance of the divergence map, we have used it to pin-point the core regions that must be optimized in well known public applications. By hand optimizing some applications, we have added 9-11% speedups onto kernels that have already gone through the sieve of many programmers.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121775449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Flexible Error Protection for Energy Efficient Reliable Architectures 灵活的错误保护节能可靠的架构
Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu
{"title":"Flexible Error Protection for Energy Efficient Reliable Architectures","authors":"Timothy N. Miller, Nagarjuna Surapaneni, R. Teodorescu","doi":"10.1109/SBAC-PAD.2010.37","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.37","url":null,"abstract":"Technology scaling is having an increasingly detrimental effect on microprocessor reliability, with increased variability and higher susceptibility to errors. At the same time, as integration of chip multiprocessors increases, power consumption is becoming a significant bottleneck that could threaten their growth. To deal with these competing trends, energy-efficient solutions are needed to deal with reliability problems. This paper presents a reliable multicore architecture that provides targeted error protection by adapting to the characteristics of individual cores and workloads, with the goal of providing reliability with minimum energy. The user can specify an acceptable reliability target for each chip, core, or application. The system then adjusts a range of parameters, including replication and supply voltage, to meet that reliability goal. In this multicore architecture, each core consists of a pair of pipelines that can run independently (running separate threads) or in concert (running the same thread and verifying results). Redundancy is enabled selectively, at functional unit granularity. The architecture also employs timing speculation for mitigation of variation-induced timing errors and to reduce the power overhead of error protection. On-line control based on machine learning dynamically adjusts multiple parameters to minimize energy consumption. Evaluation shows that dynamic adaptation of voltage and redundancy can reduce the energy delay product of a CMP by 30 − 60% compared to static dual modular redundancy.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125791943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs 基于反馈驱动的多线程应用程序重构在cmp中的NUCA缓存性能
S. Bartolini, P. Foglia, M. Solinas, C. Prete
{"title":"Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs","authors":"S. Bartolini, P. Foglia, M. Solinas, C. Prete","doi":"10.1109/SBAC-PAD.2010.20","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.20","url":null,"abstract":"This paper addresses feedback-directed restructuring techniques tuned to Non Uniform Cache Architectures (NUCA) in CMPs running multi-threaded applications. Access time to NUCA caches depends on the location of the referred block, so the locality and cache mapping of the application influence the overall performance. We show techniques for altering the distribution of applications into the cache space as to achieve improved average memory access time. In CMPs running multi-threaded applications, the aggregated accesses (and locality) of the processors form the actual cache load and pose specific issues. We consider a number of Splash-2 and Parsec benchmarks on an 8 processor system and we show that a relatively simple remapping algorithm is able to improve the average Static-NUCA (SNUCA) cache access time by 5.5% and allows an SNUCA cache to surpass the performance of a more complex dynamic-NUCA (DNUCA) for most benchmarks. Then, we present a more sophisticated remapping algorithm, relying on cache geometry information and on the access distribution statistics from individual processors, that reduces the average cache access time by 10.2% and is very stable across all benchmarks.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125424508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Characterizing Energy Consumption in Hardware Transactional Memory Systems 硬件事务性存储系统能耗表征
Epifanio Gaona-Ramírez, J. Gil, Juan Fernández, M. Acacio
{"title":"Characterizing Energy Consumption in Hardware Transactional Memory Systems","authors":"Epifanio Gaona-Ramírez, J. Gil, Juan Fernández, M. Acacio","doi":"10.1109/SBAC-PAD.2010.11","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.11","url":null,"abstract":"Transactional Memory is currently being advocated as a promising alternative to lock-based synchronization because it simplifies multithreaded programming. In this way, future many-core CMP architectures may need to provide hardware support for transactional memory. On the other hand, power dissipation constitutes a first class consideration in multicore processor design. In this work, we characterize the performance and energy consumption of two well-known Hardware Transactional Memory systems that employ opposite policies for data versioning and conflict management. More specifically, we compare the Log TM-SE Eager-Eager system and a version of the Scalable TCC Lazy-Lazy system that enables parallel commits. To the best of our knowledge, this is the first characterization in terms of energy consumption of hardware transactional memory systems. To do that, we extended the GEMS simulator to estimate the energy consumed in the on-chip caches according to CACTI, and used the interconnection network energy model given by Orion 2. Results show that the energy consumption of the Eager-Eager system is 60% higher on average than in the Lazy-Lazy case, whereas performance differences between the two systems are 42% on average. Finally, we found that although on average Lazy-Lazy beats Eager-Eager there are considerable deviations in performance depending on the particular characteristics of each application.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132968556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信