{"title":"使用Dash对海量数据进行子集移除","authors":"Jonathan Myers, M. Tatineni, R. Sinkovits","doi":"10.1145/2016741.2016750","DOIUrl":null,"url":null,"abstract":"Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.","PeriodicalId":257555,"journal":{"name":"TeraGrid Conference","volume":"14 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Subset removal on massive data with Dash\",\"authors\":\"Jonathan Myers, M. Tatineni, R. Sinkovits\",\"doi\":\"10.1145/2016741.2016750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.\",\"PeriodicalId\":257555,\"journal\":{\"name\":\"TeraGrid Conference\",\"volume\":\"14 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TeraGrid Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2016741.2016750\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TeraGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016741.2016750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ongoing efforts by the Large Synoptic Survey Telescope (LSST) involve the study of asteroid search algorithms and their performance on both real and simulated data. Images of the night sky reveal large numbers of events caused by the reflection of sunlight from asteroids. Detections from consecutive nights can then be grouped together into tracks that potentially represent small portions of the asteroids' sky-plane motion. The analysis of these tracks is extremely time consuming and there is strong interest in the development of techniques that can eliminate unnecessary tracks, thereby rendering the problem more manageable. One such approach is to collectively examine sets of tracks and discard those that are subsets of others. Our implementation of a subset removal algorithm has proven to be fast and accurate on modest sized collections of tracks, but unfortunately has extremely large memory requirements for realistic data sets and cannot effectively use conventional high performance computing resources. We report our experience running the subset removal algorithm on the TeraGrid Appro Dash system, which uses the vSMP software developed by ScaleMP to aggregate memory from across multiple compute nodes to provide access to a large, logical shared memory space. Our results show that Dash is ideally suited for this algorithm and has performance comparable to or superior to that obtained on specialized, heavily demanded, large-memory systems such as the SGI Altix UV.