2011 23rd International Symposium on Computer Architecture and High Performance Computing最新文献

High Performance by Exploiting Information Locality through Reverse Computing 通过反向计算利用信息局部性实现高性能

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.10

Mouad Bahi, C. Eisenbeis

引用次数: 1

Accelerating Maximum Likelihood Based Phylogenetic Kernels Using Network-on-Chip 利用片上网络加速基于最大似然的系统发育核

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.17

Turbo Majumder, P. Pande, A. Kalyanaraman

引用次数: 6

Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++ 基于BSP++的异构高性能计算平台上并行生物序列比较

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.16

Khaled Hamidouche, F. Mendonca, J. Falcou, D. Etiemble

引用次数: 3

FAIRIO: An Algorithm for Differentiated I/O Performance 一种差分I/O性能算法

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.26

Sarala Arunagiri, Yipkei Kwok, P. Teller, Ricardo Portillo, Seetharami R. Seelam

{"title":"FAIRIO: An Algorithm for Differentiated I/O Performance","authors":"Sarala Arunagiri, Yipkei Kwok, P. Teller, Ricardo Portillo, Seetharami R. Seelam","doi":"10.1109/SBAC-PAD.2011.26","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.26","url":null,"abstract":"Providing differentiated service in a consolidated storage environment is a challenging task. To address this problem, we introduce FAIRIO, a cycle-based I/O scheduling algorithm that provides differentiated service to workloads concurrently accessing a consolidated RAID storage system. FAIRIO enforces proportional sharing of I/O service through fair scheduling of disk time. During each cycle of the algorithm, I/O requests are scheduled according to workload weights and disk-time utilization history. Experiments, which were driven by the I/O request streams of real and synthetic I/O benchmarks and run on a modified version of DiskSim, provide evidence of FAIRIO's effectiveness and demonstrate that fair scheduling of disk time is key to achieving differentiated service. In particular, the experimental results show that, for a broad range of workload request types, sizes, and access characteristics, the algorithm provides differentiated storage throughput that is within 10% of being perfectly proportional to workload weights, and, it achieves this with little or no degradation of aggregate throughput. The core design concepts of FAIRIO, including service-time allocation and history-driven compensation, potentially can be used to design I/O scheduling algorithms that provide workloads with differentiated service in storage systems comprised of RAIDs, multiple RAIDs, SANs, and hypervisors for Clouds.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125914282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Speeding Up Learning in Real-Time Search through Parallel Computing 通过并行计算加速实时搜索中的学习

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.30

Vinícius Marques, L. Chaimowicz, R. Ferreira

{"title":"Speeding Up Learning in Real-Time Search through Parallel Computing","authors":"Vinícius Marques, L. Chaimowicz, R. Ferreira","doi":"10.1109/SBAC-PAD.2011.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.30","url":null,"abstract":"Real-time search algorithms solve the problem of path planning, regardless the size and complexity of the maps, and the massive presence of entities in the same environment. In such methods, the learning step aims to avoid local minima and improve the results for future searches, ensuring the convergence to the optimal path when the same planning task is solved repeatedly. However, performing search in a limited area due to real-time constraints makes the run to convergence a lengthy process. In this work, we present a parallelization strategy that aims to reduce the time to convergence, maintaining the real-time properties of the search. The parallelization technique consists on using auxiliary searches without the real-time restrictions present in the main search. In addition, the same learning is shared by all searches. The empirical evaluation shows that even with the additional cost required to coordinate the auxiliary searches, the reduction in time to convergence is significant, showing gains from searches occurring in environments with fewer local minima to larger searches on complex maps, where performance improvement is even better.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125398898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Why Online Dynamic Mesh Refinement is Better for Parallel Climatological Models 为什么在线动态网格细化对并行气候模型更好

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.14

C. Schepke, N. Maillard, Jörg Schneider, Hans-Ulrich Heiß

{"title":"Why Online Dynamic Mesh Refinement is Better for Parallel Climatological Models","authors":"C. Schepke, N. Maillard, Jörg Schneider, Hans-Ulrich Heiß","doi":"10.1109/SBAC-PAD.2011.14","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.14","url":null,"abstract":"Forecast precisions of climatological models are limited by computing power and time available for the executions. As more and faster processors are used in the computation, the resolution of the mesh adopted to represent the Earth's atmosphere can be increased, and consequently the numerical forecast is more accurate and shows local phenomena. However, a finer mesh resolution, able to include local phenomena in a global atmosphere integration, is still not possible. To overcome this situation, different mesh refinement levels can be used at the same time for different areas. In this context, this paper evaluates how mesh refinement at run time can improve performance for climatological models. In order to contribute with this analysis, an online dynamic mesh refinement was developed. It increases mesh resolution in parts of a parallel distributed model, when special atmosphere conditions are registered during the execution. The results show that the parallel execution of this improvement provides better resolution for the meshes, without a significant increase of execution time.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133527071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms 使用自适应阻塞算法提高高性能BLAS实现的准确性

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.21

M. Badin, P. D'Alberto, L. Bic, M. Dillencourt, A. Nicolau

{"title":"Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms","authors":"M. Badin, P. D'Alberto, L. Bic, M. Dillencourt, A. Nicolau","doi":"10.1109/SBAC-PAD.2011.21","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.21","url":null,"abstract":"Matrix multiply is ubiquitous in scientific computing. Considerable effort has been spent on improving its performance. Once methods that make efficient use of the processor have been exhausted, methods that use less operations than the canonical matrix multiply must be explored. Combining the two methods yields a hybrid matrix multiply algorithm. Hybrid matrix multiply algorithms tend to be less accurate than the canonical matrix multiply implementation, leaving room for improvement. There are well-known techniques for improving accuracy, but they tend to be slow and it is not immediately obvious how best to apply them to hybrid algorithms without lowering performance. Previous attempts have focused on the bottom of the hybrid matrix multiply algorithm, modifying the high-performance matrix multiply implementation. In contrast, the top-down approach presented here does not require the modification of the high-performance matrix multiply implementation at the bottom, nor does it require modification of the fast asymptotic matrix multiply algorithm at the top. The three-level hybrid algorithm presented here not only has up to 10% better performance than the fastest high-performance matrix multiply, but is also more accurate.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133151306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A New Parallel Schema for Branch-and-Bound Algorithms Using GPGPU 基于GPGPU的分支定界算法并行架构

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.20

T. Carneiro, A. Muritiba, Marcos Negreiros, G. Campos

引用次数: 34

Structure-Constrained Microcode Compression 结构约束的微码压缩

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.32

E. Borin, G. Araújo, M. Breternitz, Youfeng Wu

{"title":"Structure-Constrained Microcode Compression","authors":"E. Borin, G. Araújo, M. Breternitz, Youfeng Wu","doi":"10.1109/SBAC-PAD.2011.32","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.32","url":null,"abstract":"Microcode enables programmability of (micro) architectural structures to enhance functionality and to apply patches to an existing design. As more features get added to a CPU core, the area and power costs associated with microcode increase. One solution to address the microcode size issue is to store the microcode in a compressed form and decompress it during execution. Furthermore, the reuse of a single hardware building block layout to implement different dictionaries in the two-level microcode compression reduces the cost and the design time of the decompression engine. However, the reuse of the hardware building block imposes structural constraints to the compression algorithm, and existing algorithms may yield poor compression. In this paper, we develop the SC2 algorithm that considers the structural constraint in its objective function and reduces the area expansion when reusing hardware building blocks to implement different dictionaries. Our experimental results show that the SC2 algorithm is able to produce similar sized dictionaries and achieves the similar compression ratio to the non-constrained algorithm.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131502766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Classification and Elimination of Conflicts in Hardware Transactional Memory Systems 硬件事务性存储系统中冲突的分类与消除

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI: 10.1109/SBAC-PAD.2011.18

M. Waliullah, P. Stenström

{"title":"Classification and Elimination of Conflicts in Hardware Transactional Memory Systems","authors":"M. Waliullah, P. Stenström","doi":"10.1109/SBAC-PAD.2011.18","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2011.18","url":null,"abstract":"This paper analyzes the sources of performance losses in hardware transactional memory and investigates techniques to reduce the losses. It dissects the root causes of data conflicts in hardware transactional memory systems (HTM) into four classes of conflicts: true sharing, false sharing, silent store, and write-write conflicts. These conflicts can cause performance and energy losses due to aborts and extra communication. To quantify losses, the paper first proposes the 5C cache-miss classification model that extends the well-established 4C model with a new class of cache misses known as contamination misses. The paper also contributes with two techniques for removal of data conflicts: One for removal of false sharing conflicts and another for removal of silent store conflicts. In addition, it revisits and adapts a technique that is able to reduce losses due to both true and false conflicts. All of the proposed techniques can be accommodated in a lazy versioning and lazy conflict resolution HTM built on top of a MESI cache-coherence infrastructure with quite modest extensions. Their ability to reduce performance is quantitatively established, individually as well as in combination. Performance is improved substantially.","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130596394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11