IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004最新文献

筛选
英文 中文
Evaluating performance of BLAST on Intel Xeon and Itanium2 processors 评估BLAST在Intel Xeon和Itanium2处理器上的性能
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-12-13 DOI: 10.1007/978-3-540-30566-8_115
R. Radhakrishnan, R. Ali, G. Kochhar, Kalyana Chadalavada, R. Rajagopalan, J. Hsieh, O. Celebioglu
{"title":"Evaluating performance of BLAST on Intel Xeon and Itanium2 processors","authors":"R. Radhakrishnan, R. Ali, G. Kochhar, Kalyana Chadalavada, R. Rajagopalan, J. Hsieh, O. Celebioglu","doi":"10.1007/978-3-540-30566-8_115","DOIUrl":"https://doi.org/10.1007/978-3-540-30566-8_115","url":null,"abstract":"","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116534761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On the extraction and analysis of prevalent dataflow patterns 流行数据流模式的提取与分析
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437389
P. Sassone, D. S. Wills
{"title":"On the extraction and analysis of prevalent dataflow patterns","authors":"P. Sassone, D. S. Wills","doi":"10.1109/WWC.2004.1437389","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437389","url":null,"abstract":"The complexity-effectiveness of modern wire-dominated architectures is heavily influenced by operand movement patterns within workloads. Unfortunately, the study of these common patterns is burdensome given the NP-completeness of the problem and the size of the dataflow graphs in modern applications. In response we present CPX, a fast and memory-efficient tool for the extraction of common dataflow subgraphs from application binaries. Using this tool and a practical metric of pattern popularity, we analyze Media-Bench and Spec2000int benchmarks and present their most frequent communication patterns. Results confirm the intuition of prior research that dependence chains dominate integer code, but more importantly demonstrate that dataflow communication is restricted to a tractable set of templates. A set of only ten small patterns characterizes over 90% of Spec2000int and over 75% of MediaBench dynamic instructions. These common dataflow idioms are amenable to dynamic optimization, more efficient code representations, and reducing the broadcast nature of micro-architectural resources.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127093763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
GENIUS: a generator of interactive user media sessions GENIUS:交互式用户媒体会话的生成器
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437393
C. Costa, C. Ramos, I. Cunha, J. Almeida
{"title":"GENIUS: a generator of interactive user media sessions","authors":"C. Costa, C. Ramos, I. Cunha, J. Almeida","doi":"10.1109/WWC.2004.1437393","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437393","url":null,"abstract":"The generation of realistic interactive synthetic streaming media workloads is of great importance for the effective evaluation of alternative media distribution techniques. This paper fills a gap left by previous studies and proposes a hierarchical model that captures key aspects of media user behavior and workloads, in particular, interactivity and heterogeneity. It also introduces GENIUS, a highly flexible and realistic streaming media workload generator that implements the proposed model and is parameterized with results from an extensive characterization of a rich set of real media workloads. Preliminary experiments show that our generator accurately captures workload aspects of key impact to system performance.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128944135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Evaluation of a speculative multithreading compiler by characterizing program dependences 通过描述程序依赖关系来评估推测性多线程编译器
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437390
A. Bhowmik, M. Franklin
{"title":"Evaluation of a speculative multithreading compiler by characterizing program dependences","authors":"A. Bhowmik, M. Franklin","doi":"10.1109/WWC.2004.1437390","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437390","url":null,"abstract":"Speculative multithreading (SpMT) promises to be an effective mechanism for parallelizing non-numeric programs. Proper thread formation is crucial for obtaining good speedup in an SpMT system. We have developed an SpMT compiler framework for partitioning sequential programs into multiple threads. Since control and data speculations are the essence of SpMT execution model, inter-thread data dependences and inter-thread control predictions at run-time play crucial roles in affecting the performance of the SpMT system. Therefore, to evaluate existing SpMT compiler or hardware systems, and to design more efficient systems it is necessary to characterize the dynamic program dependences carefully. In this paper, we have studied the run-time behaviors of inter-thread data and control dependences of the threads generated by our compiler in detail and used that for analyzing the performance. The analyses reveal that our compiler has successfully modeled the inter-thread data and control dependences of non-numeric applications and minimized them while generating the threads.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123158768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction and performance characterization of parallel interior point solver on 4-way Intel Itanium 2 multiprocessor system Intel Itanium 2 4路多处理器系统并行内部点求解器的构建与性能表征
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437402
P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, C. Dulong
{"title":"Construction and performance characterization of parallel interior point solver on 4-way Intel Itanium 2 multiprocessor system","authors":"P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, C. Dulong","doi":"10.1109/WWC.2004.1437402","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437402","url":null,"abstract":"In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium/spl reg/ 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune/spl trade/ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"11 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114107801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Does halting make hardware trace collection inaccurate? A study using Pentium 4 performance counters and SPEC2000 停机是否使硬件跟踪收集不准确?使用奔腾4性能计数器和SPEC2000的研究
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437397
M. Watson, J. Flanagan
{"title":"Does halting make hardware trace collection inaccurate? A study using Pentium 4 performance counters and SPEC2000","authors":"M. Watson, J. Flanagan","doi":"10.1109/WWC.2004.1437397","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437397","url":null,"abstract":"Processor address traces are invaluable for characterizing workloads and testing proposed memory hierarchies. Long traces are needed to exercise modern cache designs and produce meaningful results, but are difficult to collect with hardware monitors because microprocessors access memory too frequently for disks or other large storage to keep up. The small, fast buffers of the monitors fill quickly; in order to obtain long contiguous traces, the processor must be stopped while the buffer is emptied. This halting may perturb the traces collected, but this cannot be measured directly, since long uninterrupted traces cannot be collected. We make the case that hardware performance counters, which collect runtime statistics without influencing execution, can be used to measure halting effects. We use the performance counters of the Pentium 4 processor to collect statistics while halting the processor as if traces were being collected. We then compare these results to the statistics obtained from unhalted runs. We present our results in terms of which counters are affected, why, and what this means for trace-collection systems.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"32 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120858697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The USAR characterization model USAR表征模型
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437399
A. Pereira, G. Franco, L. Silva, Wagner Meira, Jr, W. Santos
{"title":"The USAR characterization model","authors":"A. Pereira, G. Franco, L. Silva, Wagner Meira, Jr, W. Santos","doi":"10.1109/WWC.2004.1437399","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437399","url":null,"abstract":"Understanding the user behavior is a need to analyze the performance and the scalability of Web servers. This knowledge is used, for instance, to build workload generators that help evaluating the performance of those servers. Current workload generators are typically memory-less, being unable to mimic actual user interaction with the system. In this work we propose a hierarchical characterization and simulation model focused on the user behavior, named USAR. We use the latency and inter-arrival time of the requests to model user actions, which are the basis of our model. We validate this model through a proxy-cache server case study, where we perform the characterization and construct a user behavior simulator. We foresee from the results the possibility to generate more realistic workloads.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133977729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Characterizing the impact of different memory-intensity levels 描述不同内存强度水平的影响
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437388
R. Kotla, A. Devgan, S. Ghiasi, T. Keller, F. Rawson
{"title":"Characterizing the impact of different memory-intensity levels","authors":"R. Kotla, A. Devgan, S. Ghiasi, T. Keller, F. Rawson","doi":"10.1109/WWC.2004.1437388","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437388","url":null,"abstract":"Applications on today's high-end processors typically make varying load demands over time. A single application may have many different phases during its lifetime, and workload mixes show interleaved phases. This work examines and uses the differences between memory- and CPU-intensive phases to reduce power. Today's processors provide resources that are underutilized during memory-intensive phases, consuming power while producing little incremental gain in performance. This work examines a deployed system consisting of identical cores with a goal of running each one at a different effective frequency. The initial goal is to find the appropriate frequency at which to run each phase. This paper demonstrates that memory intensity directly affects the throughput of applications. The results indicate that simple metrics such as IPC (instructions per cycle) cannot be used to determine what frequency to run a phase. Instead, it uses performance counters which directly monitor memory behavior to identify. Memory-intensive phases can then be run on a slower core without incurring significant performance penalties. The key result of the paper is the introduction of a very simple, online model that uses the performance counter data to predict the performance of a program phase at any particular frequency setting. The information from this model allows a scheduler to decide which core to use to execute the program phase. Using a sophisticated power model for the processor family shows that this approach significantly reduces power consumption. The model was evaluated using a subset of SPECCPU and the SPECjbb and TPC-W benchmarks. It predicts performance with an average error of less than 10%. The power modeling shows that memory-intensive benchmarks achieve up to-a 58%, power reduction at a performance loss of less than 20% when run at 80% of nominal frequency.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128631930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Experiments with subsetting benchmark suites 使用子集基准套件进行实验
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437398
Jabatan Perangkaan, Malaysia Kenyataan, Sebut Harga, Tawaran adalah dipelawa, daripada syarikat-syarikat, tempatan yang berdaftar dengan, Kementerian Kewangan, Samihah binti Kamaruddin
{"title":"Experiments with subsetting benchmark suites","authors":"Jabatan Perangkaan, Malaysia Kenyataan, Sebut Harga, Tawaran adalah dipelawa, daripada syarikat-syarikat, tempatan yang berdaftar dengan, Kementerian Kewangan, Samihah binti Kamaruddin","doi":"10.1109/WWC.2004.1437398","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437398","url":null,"abstract":"Benchmarks are one of the most popular tools to compare the performance of computing systems. Benchmark suites typically contain multiple benchmark programs with more or less the same properties. Hence the suite contains redundancy, which increases the cost of executing or simulating the benchmark suite without adding value. To limit simulation time, researchers frequently subset benchmark suites. However, correctly identifying a representative subset is of paramount importance to perform a trustworthy evaluation. This paper shows that subsetting a benchmark suite in such a way that representativeness of the suite is maintained is non-trivial. We show that a small randomly selected subset is not representative of the fill benchmark suite. We discuss algorithms to subset the SPEC CPU 2000 benchmark suite and show that they provide more representative subsets than randomly selected subsets. However, the algorithms evaluated in this paper do not always compute representative subsets: the algorithms produce bad results for some subset sizes. In this sense, these algorithms are unreliable, as it remains necessary to validate the benchmark suite subset. We find one subsetting algorithm that is reliable. It is, however, uncertain whether this algorithm is also reliable under other circumstances.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124364226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Micro-architectural anatomy of a commercial TCP/IP stack 商业TCP/IP栈的微观架构剖析
IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004 Pub Date : 2004-10-25 DOI: 10.1109/WWC.2004.1437394
R. Illikkal, R. Iyer, D. Newell
{"title":"Micro-architectural anatomy of a commercial TCP/IP stack","authors":"R. Illikkal, R. Iyer, D. Newell","doi":"10.1109/WWC.2004.1437394","DOIUrl":"https://doi.org/10.1109/WWC.2004.1437394","url":null,"abstract":"Over the last couple of decades, computer architects and performance analysts have routinely attempted to profile the overhead of TCP/IP processing in an effort to understand where the time was spent. It is well understood that this is a rather difficult problem since the processing time is spread across various software modules such as the network stack, interrupt routines, drivers, O/S scheduler, etc. As a result, the problem of extracting the micro-architectural characteristics of TCP/IP processing is significantly more challenging. In this paper, we start by covering the previous attempts at this problem and show what existing tools can provide in terms of execution time characteristics. We then propose a detailed methodology that combines full-system simulation, cycle-accurate performance simulations and symbol annotation to provide a rich cycle-accurate view of TCP/IP packet processing execution. We discuss initial results based on our profiling methodology and discuss where the time is spent. This includes an analysis of micro-architectural characteristics (such as instruction breakdown, CPI, MPI and TLB misses on a state-of-the-art microprocessor).","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130968671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信