基于可扩展邻近度的原子探针数据大规模分析方法

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI:10.1109/HiPC.2018.00034

Hao Lu, S. Seal, J. Poplawsky

{"title":"基于可扩展邻近度的原子探针数据大规模分析方法","authors":"Hao Lu, S. Seal, J. Poplawsky","doi":"10.1109/HiPC.2018.00034","DOIUrl":null,"url":null,"abstract":"Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within rea-sonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.","PeriodicalId":113335,"journal":{"name":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","volume":"96 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scalable Proximity-Based Methods for Large-Scale Analysis of Atom Probe Data\",\"authors\":\"Hao Lu, S. Seal, J. Poplawsky\",\"doi\":\"10.1109/HiPC.2018.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within rea-sonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.\",\"PeriodicalId\":113335,\"journal\":{\"name\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"volume\":\"96 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 25th International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2018.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2018.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在数据采集技术的最新进展的推动下，当今最先进的原子探针显微镜产生的数据集大小从几百万原子到数十亿原子不等。在合理的周转时间内分析这些原子数据集对材料科学家来说是一个紧迫的数据分析挑战，因为目前配备的软件系统无法扩展到这些大规模数据集。在这里，我们展示了一个更大的正在进行的工作的共享内存组件，以开发一个多功能数据分析框架，能够分析从桌面多核机器到具有混合(共享和分布式内存)架构的大型高性能计算平台的各种大小和规模的原子探测数据。我们在这里的重点是广泛的一类流行的原子探测数据分析方法，这些方法依赖于核心耗时的k-NN查询。我们提出了一个可扩展的启发式算法k-NN查询使用三维范围树。为了证明其有效性，k-NN算法与原子探针数据分析方法的两个用例集成在一起，结果显示，在使用多达800万个原子的工作负载的32核Cray XC40节点上，分析时间加快了20倍以上，这已经超出了现有原子探针软件的大规模能力。使用这种k-NN算法，我们还为一类称为朋友的朋友(FoF)方法的聚类发现方法引入了一种新的参数估计方法，以完全绕过它们昂贵的预处理步骤。在每种情况下，我们在各种控制数据集上验证结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Proximity-Based Methods for Large-Scale Analysis of Atom Probe Data

Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within rea-sonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 25th International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量