2017 New York Scientific Data Summit (NYSDS)最新文献

筛选
英文 中文
Meeting the challenges of data analysis on the wire 应对线上数据分析的挑战
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085050
Alya Boumiza, Cole Lewis, A. Martin, Shilpi Bhattacharyya, Junwei Zhang, D. Katramatos, Meng Yue, Shinjae Yoo
{"title":"Meeting the challenges of data analysis on the wire","authors":"Alya Boumiza, Cole Lewis, A. Martin, Shilpi Bhattacharyya, Junwei Zhang, D. Katramatos, Meng Yue, Shinjae Yoo","doi":"10.1109/NYSDS.2017.8085050","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085050","url":null,"abstract":"The project of Analysis on the Wire (AoW) at Brookhaven National Laboratory has the goal of executing generic computations in the network fabric, \"on the wire,\" to sup-port early decision making. We are research-ing, developing, and evaluating hardware and software mechanisms and middleware to be used for data analysis on the wire. We further pursue to address several complex challenges encountered in such a computing environ-ment, such as handling raw packet data, deal-ing with computations of a streaming nature, and eliminating performance-killing bottle-necks and delays by taking advantage of any means of acceleration.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128933807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementing a distributed volumetric data analytics toolkit on apache spark 在apache spark上实现分布式容量数据分析工具包
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085038
Chao Chen, Yuzhong Yan, Lei Huang, Lijun Qian
{"title":"Implementing a distributed volumetric data analytics toolkit on apache spark","authors":"Chao Chen, Yuzhong Yan, Lei Huang, Lijun Qian","doi":"10.1109/NYSDS.2017.8085038","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085038","url":null,"abstract":"The multidimensional array is a fundamental data structure that has been widely used in scientific computing, as well as in many big data analytics applications. Distributed multi-dimensional array has been well studied in the High Performance Computing (HPC) platforms; however, little research has been done in the widely-used big data analytics platforms. In this paper, we present an implementation of Distributed Multi-dimensional Array Toolkit (DMAT) on top of the Apache Spark big data analytics platform. The toolkit supports several fashions for multidimensional array distributions, repartition, transposition, access, and data parallelism with a variety of parallel execution templates. This paper introduces the software architecture and implementations of DMAT, and also studies the performance characteristics of some typical multi-dimensional array operations with different configurations.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122067390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Preliminary performance analysis of Hadoop 3.0.0-alpha3 Hadoop 3.0.0-alpha3的初步性能分析
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085052
Rohit G. Masur, Suzanne K. McIntosh
{"title":"Preliminary performance analysis of Hadoop 3.0.0-alpha3","authors":"Rohit G. Masur, Suzanne K. McIntosh","doi":"10.1109/NYSDS.2017.8085052","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085052","url":null,"abstract":"Apache Software Foundation has released an alpha version of Hadoop, Apache Hadoop 3.0.0-alpha3. This research paper focuses on analyzing the performance of the new version with respect to the older version of Hadoop, Apache Hadoop 2.7.3. The performance analysis is done by running various benchmark tests included in the Apache Hadoop distribution on the two versions of Apache Hadoop installed on different isolated Virtual Machines. The results of the test runs are detailed in this paper.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125065932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Remote sensing data integration for mapping glacial extents 冰川范围测绘遥感数据集成
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085048
Daniel Cisek, M. Mahajan, M. Brown, David C. Genaway
{"title":"Remote sensing data integration for mapping glacial extents","authors":"Daniel Cisek, M. Mahajan, M. Brown, David C. Genaway","doi":"10.1109/NYSDS.2017.8085048","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085048","url":null,"abstract":"Glaciers serve as one of the most prominent natural indicators of climate change, given their sensitivity to even incremental changes in temperature. In recent years the mapping of glacier extent has become accepted as an effective way to measure the effects of climate change. By measuring the extent of a glacier's terminus over multiple years, one can gain valuable information about how a glacier, and its surrounding environment, is changing over time. The glaciology community has recently embraced the advantages that Geographic Information Systems (GIS) offer to glacier mapping, with projects such as Global Land Ice Measurements from Space (GLIMS) establishing a global database for glacier data, including extent measurements. This paper proposes a workflow for integrating data from different platforms, including aerial imagery, natural-color satellite imagery, and multispectral imagery, to map glacier extent in a GIS. We focus on this workflow’s ability to enhance longitudinal glacier studies by increasing the number of unique data sources that can be drawn from. To demonstrate the effectiveness of this method, we mapped the terminus extent of five glaciers in Alaska's Juneau Icefield over a thirty year period (1981-2011), integrating data from three different airborne sensor platforms in our analysis.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making a case for high-bandwidth monitoring - a use case for analysis on the wire 为高带宽监控做一个案例——在线分析的一个用例
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085037
M. Dephillips, D. Katramatos, Shilpi Bhattacharyya
{"title":"Making a case for high-bandwidth monitoring - a use case for analysis on the wire","authors":"M. Dephillips, D. Katramatos, Shilpi Bhattacharyya","doi":"10.1109/NYSDS.2017.8085037","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085037","url":null,"abstract":"This paper describes current efforts to architect, research, develop, and test a next-generation, high-bandwidth network monitoring framework designed to handle the rigors of large scientific feeds. This framework will be capable of transparently capturing and analyzing network traffic in real time so as to enable early and rapid response to potential threats. We seek to adapt and integrate existing and ongoing work on streaming data analysis on the wire and packet capture with real-time analytics using accelerators to create a next-generation, high-bandwidth network-monitoring framework. Flow inter-rogation in real time will transparently divert selected network flows to an attached computing infrastructure and subject them to processing and analysis. With acceptable quality of service (QoS), this system will detect suspicious activities, with innocent flows allowed to proceed to their original destination and suspicious flows are either dropped or further processed and monitored with appropriate storage and analysis. Going beyond detecting what would be the preponderance of attack vectors to identifying all attack vectors including the subtle methods of Advanced Persistent Threats (APTs). Although it is hard to hack the existing systems, with no direct monitoring or air-gap, a determined adversary such as an APT could find a way onto a government network.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallelizing x-ray photon correlation spectroscopy software tools using python multiprocessing 并行x射线光子相关光谱学软件工具,使用python进行多处理
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085042
Sameera K. Abeykoon, Meifeng Lin, K. K. van Dam
{"title":"Parallelizing x-ray photon correlation spectroscopy software tools using python multiprocessing","authors":"Sameera K. Abeykoon, Meifeng Lin, K. K. van Dam","doi":"10.1109/NYSDS.2017.8085042","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085042","url":null,"abstract":"The third generation synchrotron facilities that are designed to deliver highly intense and bright X-ray beams along with the new area detectors capable of achieving high dynamic ratios and fast frame rates have enabled novel Coherent X-ray scattering experiments. X-ray Photon Correlation Spectroscopy is such a technique that measures nano- and mesoscale dynamics in materials. The scikit-beam Python analysis library developed at the National Synchrotron Light Source-II at Brookhaven National Laboratory contains a serial version of Xray Photon Correlation Spectroscopy software tools to perform streaming analysis of structural dynamics of materials, which can be time consuming given the anticipated fast data rates and high image resolutions at the National Synchrotron Light Source-II. Therefore, it is essential to parallelize these data analysis tools to achieve the best performance on the available workstations that contain multi-core processors. In this paper, we report the progress that we have made in using the Python multiprocessing module to parallelize the time-correlation functions in scikit-beam. We will compare the results from different multiprocessing approaches, and discuss pros and cons associated with each method.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114597818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Visualization of higher genus carbon nanomaterials: free energy, persistent current, and entanglement entropy 高等碳纳米材料的可视化:自由能、持续电流和纠缠熵
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085055
T. Duong, M. McGuigan
{"title":"Visualization of higher genus carbon nanomaterials: free energy, persistent current, and entanglement entropy","authors":"T. Duong, M. McGuigan","doi":"10.1109/NYSDS.2017.8085055","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085055","url":null,"abstract":"The goal of this project is to explore computational nanoscience. We theoretically investigate the fascinating quantum structures, as well as electrical and chemical properties of carbonbased nanomaterials.We examine the toroidal topology of a tightbinding model of carbon-based nanomaterials with magnetic flux using visualization and simulation with applications of nanoelectronics and beyond Moores Law computing. Our method starts with constructing a 3D structure with molecular editing and modeling program, such as Avogadro and Visual Molecular Dynamics (VMD). Then we generate an adjacency matrix of each structure using C++ code and determine the eigenvalues from the adjacency matrix. Finally, we animate these nanomaterials properties at a finite temperature, density, and flux potential to observe how the free energy, persistent current, and entanglement entropy changes shape for each of the carbon-based nanomaterial structures such as ring, double rings, M¨obius strip, nanotorus, and double nanotorus. Additionally, each illustrative calculation will be exported as a 3D animated plot video, and the geometry of each material will be 3D printed and view on virtual reality (VR) headsets using UnityMol as well as high resolution displays.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128466354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Near real time ETEM streaming video analysis 近实时ETEM流视频分析
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085054
Yuewei Lin, D. Zakharov, R. Mégret, Shinjae Yoo, E. Stach
{"title":"Near real time ETEM streaming video analysis","authors":"Yuewei Lin, D. Zakharov, R. Mégret, Shinjae Yoo, E. Stach","doi":"10.1109/NYSDS.2017.8085054","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085054","url":null,"abstract":"The Environmental Transmission Electron Microscopy (ETEM) provides a powerful tool to observe the formation and evolution of nano-particles over time. However, ETEM generates extremely large amounts of data at level of 3GB/s, which impossible to be analyzed by manually processing or even by using a single PC. Moreover, the image stream obtained from the ETEM is very noisy. In this project, our goal is automatically analyze the physical characteristics of the nanoparticles. We proposed an approach that detect the nano-particles in each frame, and then track all the nano-particles over time, finally we can analyze the dynamical physical characteristics of the nano-particles, such as merging, absorbing, size and distance change over time. Specifically, our proposed approach detects the nano-particles in each frame independently, which could be highly parallelized. The experimental results show the proposed model could detect and track nano-particles robustly.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"374 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133848049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensor network-based wind field estimation using deep learning 基于传感器网络的深度学习风场估计
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085047
Daniel Lee, Daniel Cisek, Shinjae Yoo
{"title":"Sensor network-based wind field estimation using deep learning","authors":"Daniel Lee, Daniel Cisek, Shinjae Yoo","doi":"10.1109/NYSDS.2017.8085047","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085047","url":null,"abstract":"The incorporation of wind fields, or movement of clouds, significantly improves the accuracy of time-series-based solar irradiance prediction models. To resolve problems regarding the cost and accuracy of current wind field estimation methods, there are the challenges in estimating wind fields using only solar irradiance sensor networks and evaluating the performance of models. We propose a cost-effective and reliable method to estimate wind fields through the application of Deep Learning and computational geometric algorithms. Using a realistic cloud simulator, validation datasets for the proposed model were generated, accounting for various complex factors including topology of sensor placement, changing wind speed and direction, and cloud density that directly impact sensor data. Preliminary qualitative and quantitative results indicate promising potential for practical deployment as an estimation model.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126176302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Building near-real-time processing pipelines with the spark-MPI platform 使用spark-MPI平台构建近实时处理管道
2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085039
N. Malitsky, Aashish Chaudhary, S. Jourdain, Matt Cowan, P. O’leary, M. Hanwell, K. K. Dam
{"title":"Building near-real-time processing pipelines with the spark-MPI platform","authors":"N. Malitsky, Aashish Chaudhary, S. Jourdain, Matt Cowan, P. O’leary, M. Hanwell, K. K. Dam","doi":"10.1109/NYSDS.2017.8085039","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085039","url":null,"abstract":"Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three V’s (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129820948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信