IEEE International Symposium on Workload Characterization (IISWC'10)最新文献

Improving virtualization performance and scalability with advanced hardware accelerations 通过高级硬件加速改进虚拟化性能和可伸缩性

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5649499

Yaozu Dong, Xudong Zheng, Xiantao Zhang, J. Dai, Jianhui Li, Xin Li, Gang Zhai, Haibing Guan

引用次数: 14

Parallelization and characterization of GARCH option pricing on GPUs gpu上GARCH期权定价的并行化与表征

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5648864

Ren-Shuo Liu, Yun-Cheng Tsai, Chia-Lin Yang

{"title":"Parallelization and characterization of GARCH option pricing on GPUs","authors":"Ren-Shuo Liu, Yun-Cheng Tsai, Chia-Lin Yang","doi":"10.1109/IISWC.2010.5648864","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5648864","url":null,"abstract":"Option pricing is an important problem in computational finance due to the fast-growing market and increasing complexity of options. For option pricing, a model is required to describe the price process of the underlying asset. The GARCH model is one of the prominent option pricing models since it can model stochastic volatility of the underlying asset. To derive expected profit based on the GARCH model, tree-based simulations are one of the commonly used approaches. Tree-based GARCH option pricing is computing intensive since the tree grows exponentially, and it requires enormous floating point arithmetic operations. In this paper, we present the first work on accelerating the tree-based GARCH option pricing on GPUs with CUDA. As the conventional tree data structure is not memory access friendly to GPUs, we propose a new family of tree data structures which position concurrently accessed nodes in contiguous and aligned memory locations. Moreover, to reduce memory bandwidth requirement, we apply fusion optimization, which combines two threads into one to keep data with temporal locality in register files. Our results show 50× speedup compared to a multi-threaded program on a 4-core CPU.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133940654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Performance characterization and acceleration of Optical Character Recognition on handheld platforms 手持平台上光学字符识别的性能表征与加速

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5648852

S. Srinivasan, Li Zhao, Lin Sun, Zhen Fang, Peng Li, Tao Wang, R. Iyer, Dong Liu

{"title":"Performance characterization and acceleration of Optical Character Recognition on handheld platforms","authors":"S. Srinivasan, Li Zhao, Lin Sun, Zhen Fang, Peng Li, Tao Wang, R. Iyer, Dong Liu","doi":"10.1109/IISWC.2010.5648852","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5648852","url":null,"abstract":"Optical Character Recognition (OCR) converts images of handwritten or printed text captured by camera or scanner into editable text. OCR has seen limited adoption in mobile platforms due to the performance constraints of these systems. Intel® Atom™ processors have enabled general purpose applications to be executed on handheld devices. In this paper, we analyze a reference implementation of the OCR workload on a low power general purpose processor and identify the primary hotspot functions that incur a large fraction of the overall response time. We also present a detailed architectural characterization of the hotspot functions in terms of CPI, MPI, etc. We then implement and analyze several software/algorithmic optimizations such as i) Multi-threading, ii) image sampling for a hotspot function and iii) miscellaneous code optimization. Our results show that up to 2X performance improvement in execution time of the application and almost 9X improvement for a hotspot can be achieved by using various software optimizations. We designed and implemented a hardware accelerator for one of the hotspots to further reduce the execution time and power. Overall, we believe our analysis provides a detailed understanding of the processing overheads for OCR running on a new class of low power compute platforms.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115747199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Exploiting approximate value locality for data synchronization on multi-core processors 利用近似值局部性实现多核处理器上的数据同步

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5650333

Jaswanth Sreeram, S. Pande

{"title":"Exploiting approximate value locality for data synchronization on multi-core processors","authors":"Jaswanth Sreeram, S. Pande","doi":"10.1109/IISWC.2010.5650333","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5650333","url":null,"abstract":"This paper shows that for a variety of parallel “soft computing” programs that use optimistic synchronization, the approximate nature of the values produced during execution can be exploited to improve performance significantly. Specifically, through mechanisms for imprecise sharing of values between threads, the amount of contention in these programs can be reduced thereby avoiding expensive aborts and improving parallel performance while keeping the results produced by the program within the bounds of an acceptable approximation. This is made possible due to our observation that for many such programs, a large fraction of the values produced during execution exhibit a substantial amount of value locality. We describe how this locality can be exploited using extensions to C/C++ language types that allow specification of limits on the precision and accuracy required and a novel value-aware conflict detection scheme that minimizes the number of conflicts while respecting these limits. Our experiments indicate that for the programs studied substantial speedups can be achieved - upto 5.7x over the original program for the same number of threads. We also present experimental evidence that for the programs studied, the amount of error introduced often grows relatively slowly.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127359983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Data handling inefficiencies between CUDA, 3D rendering, and system memory CUDA、3D渲染和系统内存之间的数据处理效率低下

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5648828

Brian Gordon, S. Sohoni, D. Chandler

引用次数: 13

Real Java applications in software transactional memory 软件事务性内存中的真实Java应用程序

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5654431

T. Nakaike, Rei Odaira, T. Nakatani, Maged M. Michael

{"title":"Real Java applications in software transactional memory","authors":"T. Nakaike, Rei Odaira, T. Nakatani, Maged M. Michael","doi":"10.1109/IISWC.2010.5654431","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5654431","url":null,"abstract":"Transactional Memory (TM) shows promise as a new concurrency control mechanism to replace lock-based synchronization. However, there have been few studies of TM systems with real applications, and the real-world benefits and barriers of TM remain unknown. In this paper, we present a detailed analysis of the behavior of real applications on a software transactional memory system. Based on this analysis, we aim to clarify what programming work is required to achieve reasonable performance in TM-based applications. We selected three existing Java applications: (1) HSQLDB, (2) the Geronimo application server, and (3) the GlassFish application server, because each application has a scalability problem caused by lock contentions. We identified the critical sections where lock contentions frequently occur, and modified the source code so that the critical sections are executed transactionally. However, this simple modification proved insufficient to achieve reasonable performance because of excessive data conflicts. We found that most of the data conflicts were caused by application-level optimizations such as reusing objects to reduce the memory usage. After modifying the source code to disable those optimizations, the TM-based applications showed higher or competitive performance compared to lock-based applications. Another finding is that the number of variables that actually cause data conflicts is much smaller than the number of variables that can be accessed in critical sections. This implies that the performance tuning of TM-based applications may be easier than that of lock-based applications where we need to take care of all of the variables that can be accessed in the critical sections.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130212495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Performance variations of two open-source cloud platforms 两个开源云平台的性能变化

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5650280

Yohei Ueda, T. Nakatani

引用次数: 22

Runtime workload behavior prediction using statistical metric modeling with application to dynamic power management 使用统计度量建模进行运行时工作负载行为预测，并将其应用于动态电源管理

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5650339

R. Sarikaya, C. Isci, A. Buyuktosunoglu

{"title":"Runtime workload behavior prediction using statistical metric modeling with application to dynamic power management","authors":"R. Sarikaya, C. Isci, A. Buyuktosunoglu","doi":"10.1109/IISWC.2010.5650339","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5650339","url":null,"abstract":"Adaptive computing systems rely on accurate predictions of workload behavior to understand and respond to the dynamically-varying application characteristics. In this study, we propose a Statistical Metric Model (SMM) that is system-and metric-independent for predicting workload behavior. SMM is a probability distribution over workload patterns and it attempts to model how frequently a specific behavior occurs. Maximum Likelihood Estimation (MLE) criterion is used to estimate the parameters of the SMM. The model parameters are further refined with a smoothing method to improve prediction robustness. The SMM learns the application patterns during runtime as applications run, and at the same time predicts the upcoming program phases based on what it has learned so far. An extensive and rigorous series of prediction experiments demonstrates the superior performance of the SMM predictor over existing predictors on a wide range of benchmarks. For some of the benchmarks, SMM improves prediction accuracy by 10X and 3X, compared to the existing last-value and table-based prediction approaches respectively. SMM's improved prediction accuracy results in superior power-performance trade-offs when it is applied to dynamic power management.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132824491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Tackling the challenges of server consolidation on multi-core systems 应对多核系统上服务器整合的挑战

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5654398

Hui Lv, Xudong Zheng, Zhiteng Huang, Jiangang Duan

引用次数: 8

Eigenbench: A simple exploration tool for orthogonal TM characteristics 特征bench:一种简单的TM正交特征探测工具

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI: 10.1109/IISWC.2010.5648812

Sungpack Hong, Tayo Oguntebi, J. Casper, N. Bronson, C. Kozyrakis, K. Olukotun

{"title":"Eigenbench: A simple exploration tool for orthogonal TM characteristics","authors":"Sungpack Hong, Tayo Oguntebi, J. Casper, N. Bronson, C. Kozyrakis, K. Olukotun","doi":"10.1109/IISWC.2010.5648812","DOIUrl":"https://doi.org/10.1109/IISWC.2010.5648812","url":null,"abstract":"There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127960294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74