Performance and memory-access characterization of data mining applications

Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization Pub Date : 1998-11-29 DOI:10.1109/WWC.1998.809358

J. P. Bradford, J. Fortes

{"title":"Performance and memory-access characterization of data mining applications","authors":"J. P. Bradford, J. Fortes","doi":"10.1109/WWC.1998.809358","DOIUrl":null,"url":null,"abstract":"Characterizes the performance and memory-access behavior of a decision tree induction program, a previously unstudied application used in data mining and knowledge discovery in databases. Performance is studied via RSIM, an execution-driven simulator, for three uniprocessor models that exploit instruction-level parallelism to varying degrees. Several properties of the program are noted. Out-of-order dispatch and multiple issue provide a significant performance advantage: 50%-250% improvement in inter-processor communication (IPC) for out-of-order dispatch vs. in-order dispatch, and 5%-120% improvement in IPC for four-way issue vs. single issue. Multiple issue provides a greater performance improvement for larger L2 cache sizes, when the program is limited by CPU performance; out-of-order dispatch provides a greater performance improvement for smaller L2 cache sizes. The program has a very small instruction footprint: for an 8-kB L1 instruction cache, the instruction miss rate is below 0.1%. A small (8 kB) L1 data cache is sufficient to capture most of the locality of the data references, resulting in L1 miss rates between 10%-20%. Increasing the size of the L2 data cache does not significantly improve performance until a significant fraction (over 1/4) of the data set fits into the L2 cache. Lastly, a procedure is developed for scaling the cache sizes when using scaled-down data sets, allowing the results for smaller data sets to be used to predict the performance of full-sized data sets.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WWC.1998.809358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Characterizes the performance and memory-access behavior of a decision tree induction program, a previously unstudied application used in data mining and knowledge discovery in databases. Performance is studied via RSIM, an execution-driven simulator, for three uniprocessor models that exploit instruction-level parallelism to varying degrees. Several properties of the program are noted. Out-of-order dispatch and multiple issue provide a significant performance advantage: 50%-250% improvement in inter-processor communication (IPC) for out-of-order dispatch vs. in-order dispatch, and 5%-120% improvement in IPC for four-way issue vs. single issue. Multiple issue provides a greater performance improvement for larger L2 cache sizes, when the program is limited by CPU performance; out-of-order dispatch provides a greater performance improvement for smaller L2 cache sizes. The program has a very small instruction footprint: for an 8-kB L1 instruction cache, the instruction miss rate is below 0.1%. A small (8 kB) L1 data cache is sufficient to capture most of the locality of the data references, resulting in L1 miss rates between 10%-20%. Increasing the size of the L2 data cache does not significantly improve performance until a significant fraction (over 1/4) of the data set fits into the L2 cache. Lastly, a procedure is developed for scaling the cache sizes when using scaled-down data sets, allowing the results for smaller data sets to be used to predict the performance of full-sized data sets.

查看原文本刊更多论文

数据挖掘应用程序的性能和内存访问特性

描述了决策树归纳程序的性能和内存访问行为，决策树归纳程序是一个以前未研究的应用程序，用于数据库的数据挖掘和知识发现。通过RSIM(一个执行驱动模拟器)研究了三种不同程度地利用指令级并行性的单处理器模型的性能。注意到该程序的几个属性。乱序调度和多问题提供了显著的性能优势:与有序调度相比，乱序调度的处理器间通信(IPC)提高了50%-250%，与单问题相比，四方问题的IPC提高了5%-120%。当程序受到CPU性能限制时，对于较大的二级缓存大小，多重问题提供了更大的性能改进;乱序调度为较小的二级缓存大小提供了更大的性能改进。该程序具有非常小的指令占用:对于8 kb的L1指令缓存，指令缺失率低于0.1%。一个小的(8 kB) L1数据缓存足以捕获数据引用的大部分位置，导致L1缺失率在10%-20%之间。增加L2数据缓存的大小不会显著提高性能，直到数据集的很大一部分(超过1/4)适合L2缓存。最后，开发了一个过程，用于在使用按比例缩小的数据集时缩放缓存大小，从而允许使用较小数据集的结果来预测全尺寸数据集的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization

自引率

0.00%

发文量