使用Dash对海量数据进行子集移除

TeraGrid Conference Pub Date : 2011-07-18 DOI:10.1145/2016741.2016750

Jonathan Myers, M. Tatineni, R. Sinkovits

{"title":"使用Dash对海量数据进行子集移除","authors":"Jonathan Myers, M. Tatineni, R. Sinkovits","doi":"10.1145/2016741.2016750","DOIUrl":null,"url":null,"abstract":"Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.","PeriodicalId":257555,"journal":{"name":"TeraGrid Conference","volume":"14 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Subset removal on massive data with Dash\",\"authors\":\"Jonathan Myers, M. Tatineni, R. Sinkovits\",\"doi\":\"10.1145/2016741.2016750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.\",\"PeriodicalId\":257555,\"journal\":{\"name\":\"TeraGrid Conference\",\"volume\":\"14 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TeraGrid Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2016741.2016750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TeraGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016741.2016750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

大型综合巡天望远镜(LSST)正在进行的工作包括研究小行星搜索算法及其在真实和模拟数据上的表现。夜空的图像揭示了由小行星反射阳光引起的大量事件。从连续的夜晚探测到的信息可以组合成可能代表小行星天空运动的一小部分的轨迹。对这些轨迹的分析非常耗时，人们对开发能够消除不必要轨迹的技术非常感兴趣，从而使问题更易于管理。其中一种方法是集体检查轨道集，并丢弃那些是其他子集的轨道集。我们的子集移除算法的实现已经被证明在中等大小的轨道集合上是快速和准确的，但不幸的是，对于现实数据集有非常大的内存需求，并且不能有效地使用传统的高性能计算资源。我们报告了在TeraGrid Appro Dash系统上运行子集移除算法的经验，该系统使用ScaleMP开发的vSMP软件从多个计算节点聚合内存，以提供对大型逻辑共享内存空间的访问。我们的研究结果表明，Dash非常适合这种算法，其性能与SGI Altix UV等专用、高需求、大内存系统上获得的性能相当或更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Subset removal on massive data with Dash

Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

TeraGrid Conference

自引率

0.00%

发文量