2011 23rd International Symposium on Computer Architecture and High Performance Computing最新文献_第2页

Applying CUDA Architecture to Accelerate Full Search Block Matching Algorithm for High Performance Motion Estimation in Video Encoding 应用CUDA架构加速视频编码中高性能运动估计的全搜索块匹配算法

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.19

Eduarda Monteiro, B. Vizzotto, C. Diniz, B. Zatt, S. Bampi

引用次数: 14

Workload Balancing Methodology for Data-Intensive Applications with Divisible Load 具有可分负载的数据密集型应用的工作负载平衡方法

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.15

C. Rosas, A. Sikora, Josep Jorba, Eduardo César

{"title":"Workload Balancing Methodology for Data-Intensive Applications with Divisible Load","authors":"C. Rosas, A. Sikora, Josep Jorba, Eduardo César","doi":"10.1109/SBAC-PAD.2011.15","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.15","url":null,"abstract":"Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally in High Performance Computing (HPC), the main performance problem associated to these applications is the load unbalance or inefficient resources utilization. This paper proposes a methodology for improving performance of data-intensive applications based on performing multiple data partitions prior to the execution, and ordering the data chunks according to their processing times during the application execution. As a first step, we consider that a single execution includes multiple related explorations on the same data set. Consequently, we propose to monitor the processing of each exploration and use the data gathered to dynamically tune the performance of the application. The tuning parameters included in the methodology are the partition factor of the data set, the distribution of these data chunks, and the number of processing nodes to be used by the application. The methodology has been initially tested using the well-known bioinformatics tool BLAST, obtaining encouraging results (up to a 40% of improvement).","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Distributed Skycube Computation with Anthill 分布式Skycube计算与Anthill

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.29

R. R. Veloso, L. Cerf, Chedy Raïssi, Wagner Meira Jr

{"title":"Distributed Skycube Computation with Anthill","authors":"R. R. Veloso, L. Cerf, Chedy Raïssi, Wagner Meira Jr","doi":"10.1109/SBAC-PAD.2011.29","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.29","url":null,"abstract":"Recently skyline queries have gained considerable attention and are among the most important tools for multi-criteria analysis. In order to process all possible combinations of criteria along with their inherent analysis, researchers introduced and studied the notion of emph{skycube}. Simply put, a skycube is a pre-materialization of all possible subspaces with their associated skylines. An efficient skycube computation relies on the detection of redundancies in the different processing steps and enhanced result sharing between subspaces. Lately, the Orion algorithm was proposed to compute the skycube in a very efficient way. The approach relies on the derivation of skyline points over different subspaces. Nevertheless, because there are 2^{|D|} - 1 subspaces (where D is the set of dimensions) in a skycube, the running time still grows exponentially with the number of dimensions and easily becomes intractable on real-world datasets. In this study, we detail the distribution of Orion within a emph{filter-stream} framework and we conduct an extensive set of experiments on large datasets collected from Twitter to demonstrate the efficiency of our method.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131146172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Data Parallelism for Belief Propagation in Factor Graphs 因子图中信念传播的数据并行性

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.34

N. Ma, Yinglong Xia, V. Prasanna

引用次数: 4

Predictive and Distributed Routing Balancing on High-Speed Cluster Networks 高速集群网络的预测和分布式路由平衡

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.27

Carlos Nunez Castillo, D. Lugones, Daniel Franco, E. Luque

引用次数: 2

Watershed: A High Performance Distributed Stream Processing System 分水岭:一个高性能分布式流处理系统

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.31

Thatyene Louise Alves de Souza Ramos, R. S. Oliveira, Ana Paula de Carvalho, R. Ferreira, Wagner Meira Jr

{"title":"Watershed: A High Performance Distributed Stream Processing System","authors":"Thatyene Louise Alves de Souza Ramos, R. S. Oliveira, Ana Paula de Carvalho, R. Ferreira, Wagner Meira Jr","doi":"10.1109/SBAC-PAD.2011.31","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.31","url":null,"abstract":"The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users' relationships and interactions. Such computation is intensive both in terms of the processing demand imposed by the algorithms and also the sheer amount of data that has to handled. In this paper we introduce Watershed, a distributed computing framework designed to support the analysis of very large data streams online and in real-time. Data are obtained from streams by the system's processing components, transformed, and directed to other streams, creating large flows of information. The processing components are decoupled from each other and their connections are strictly data-driven. They can be dynamically inserted and removed, providing an environment in which it is feasible that different applications share intermediate results or cooperate to a global purpose. Our experiments demonstrate the flexibility in creating a set of data analysis algorithms and their composition into a powerful stream analysis environment.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126906920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

MRU-Tour-based Replacement Algorithms for Last-Level Caches 基于mru - tour的最后一级缓存替换算法

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.13

A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato

{"title":"MRU-Tour-based Replacement Algorithms for Last-Level Caches","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/SBAC-PAD.2011.13","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.13","url":null,"abstract":"Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, LLCs are implemented as large memory structures with high associativities. To exploit temporal locality, LRU is the replacement algorithm usually implemented in caches. However, for a high-associative cache, its implementation is costly in terms of area and power consumption. Indeed, LRU is not well suited for the LLC, because as this cache level does not see all memory accesses, it cannot cope with temporal locality. In addition, blocks must descend down to the LRU position of the stack before eviction, even when they are not longer useful. In this paper, we show that most of the blocks are not referenced again once they leave the MRU position. Moreover, the probability of being referenced again does not depend on the location on the LRU stack. Based on these observations, we define the number of MRU-Tours (MRUTs) of a block as the number of times that a block occupies the MRU position while it is stored in the cache, and propose the MRUT replacement algorithm, which selects the block to be replaced among the blocks that show only one MRUT. Variations of this algorithm have been also proposed to exploit both MRUT behavior and recency of information. Experimental results show that, compared to LRU, the proposal reduces the MPKI up to 22%, while IPC is improved by 48%.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123862599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Power-Efficient Co-designed Out-of-Order Processor 一种高效节能的协同设计无序处理器

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.9

Abhishek Deb, J. M. Codina, Antonio González

{"title":"A Power-Efficient Co-designed Out-of-Order Processor","authors":"Abhishek Deb, J. M. Codina, Antonio González","doi":"10.1109/SBAC-PAD.2011.9","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.9","url":null,"abstract":"A co-designed processor helps in cutting down both the complexity and power consumption by co-designing certain key performance enablers. In this paper, we propose a FIFO based co-designed out-of-order processor. Multiple FIFOs are added in order to dynamically schedule, in a complexity-effective manner, the micro-ops. We propose a commit logic that is able to commit the program state as a superblock commits atomically. This enables us to get rid of the Reorder Buffer (ROB) entirely. Instead to maintain the correct program state, we propose a four/eight entry Superblock Ordering Buffer (SOB). We also propose the per superblock Register Rename Table (SRRT) that holds the register state pertaining to the superblock. Our proposed processor dissipates 6% less power and obtains 12% speedup for SPECFP, as a result, it consumes less energy. Furthermore, we propose an enhanced steering heuristic and an early release mechanism to increase the performance of a FIFO based out-of-order processor. We obtain performance improvement of nearly 25% and 70% for a four FIFO and for a two FIFO configurations, respectively. We also show that our proposed steering heuristic based processor consumes 10% less energy than the previously proposed steering heuristic.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116003191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Modeling the Performance of the Hadoop Online Prototype Hadoop在线原型的性能建模

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.24

Emanuel Vianna, Giovanni V. Comarela, Tatiana Pontes, J. Almeida, Virgílio A. F. Almeida, K. Wilkinson, Harumi A. Kuno, U. Dayal

引用次数: 14

Efficiently Managing Advance Reservations Using Lists of Free Blocks 有效地管理使用免费块列表提前预订

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.25

Jörg Schneider, B. Linnert

{"title":"Efficiently Managing Advance Reservations Using Lists of Free Blocks","authors":"Jörg Schneider, B. Linnert","doi":"10.1109/SBAC-PAD.2011.25","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.25","url":null,"abstract":"Advance reservation was identified as a key technology to enable guaranteed Quality of Service and co-allocation in the Grid. Nonetheless, most Grid and local resource management systems still use the queuing approach because of the additional complexity introduced by advance reservation. A planning based resource management system has to keep track of the reservations in the future and needs a good overview on the available capacity during the negotiation of incoming reservations. For advance reservation, the resource management problem becomes a two dimensional problem. In this paper different data structures are investigated and discussed in order to fit to planning based resource management. As a result the benefits of using lists of resource allocation or free blocks are exposed. This general idea widely used to manage continuous resources is extended to cover not only the resource dimension but also the time dimension. The list of blocks approach is evaluated in a Grid level and a resource level resource management system. The extensive simulations showed a better runtime and higher reservation success rate compared with the currently favored approach of a slotted time.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130997366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2