Memory characterization of a parallel data mining workload

Jin-Soo Kim, Xiaohan Qin, Yarsun Hsu
{"title":"Memory characterization of a parallel data mining workload","authors":"Jin-Soo Kim, Xiaohan Qin, Yarsun Hsu","doi":"10.1109/WWC.1998.809359","DOIUrl":null,"url":null,"abstract":"Studies a representative of an important class of emerging applications: a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar, based on a neural network model called a self-organizing map. We examine and compare, in detail, two implementations of the application: (1) temporal locality or working set size; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) translation lookaside buffer (TLB) performance. First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a two-way set-associative TLB may result in severely skewed TLB performance in a multiprocessor environment, caused by the large discrepancy in the number of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WWC.1998.809359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

Studies a representative of an important class of emerging applications: a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar, based on a neural network model called a self-organizing map. We examine and compare, in detail, two implementations of the application: (1) temporal locality or working set size; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) translation lookaside buffer (TLB) performance. First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a two-way set-associative TLB may result in severely skewed TLB performance in a multiprocessor environment, caused by the large discrepancy in the number of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size.
并行数据挖掘工作负载的内存特性
研究一类重要的新兴应用程序的代表:并行数据挖掘工作负载。该应用程序是从IBM Intelligent Miner中提取出来的,它基于一种称为自组织地图的神经网络模型,识别出数学上相似的记录组。我们详细检查和比较了应用程序的两种实现:(1)时间局部性或工作集大小;(2)空间局部性和内存块利用率;(3)通信特性和可扩展性;(4)翻译暂存缓冲(TLB)性能。首先,我们发现应用程序的工作集层次结构由两个参数控制,即输入记录的大小和原型数组的大小;它与输入记录的数量无关。其次,应用程序表现出良好的空间局部性,对空间局部性稍差的稀疏数据集进行了优化。第三,由于采用批量更新方案,应用程序的通信负担非常低。最后,在多处理器环境下,双向集关联TLB可能会导致TLB性能严重倾斜,这是由于冲突丢失次数的巨大差异造成的。增加集合结合性比增加TLB大小更有效地缓解了这个问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信