2017 New York Scientific Data Summit (NYSDS)最新文献_第2页

Automated X-ray diffraction of irradiated materials 辐照材料的自动x射线衍射

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085053

J. A. Rodman, Yuewei Lin, D. Sprouster, L. Ecker, Shinjae Yoo

引用次数: 1

A transfer learning approach to parking lot classification in aerial imagery 航空影像中停车场分类的迁移学习方法

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085049

Daniel Cisek, M. Mahajan, J. Dale, Susan Pepper, Yuewei Lin, Shinjae Yoo

{"title":"A transfer learning approach to parking lot classification in aerial imagery","authors":"Daniel Cisek, M. Mahajan, J. Dale, Susan Pepper, Yuewei Lin, Shinjae Yoo","doi":"10.1109/NYSDS.2017.8085049","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085049","url":null,"abstract":"The importance of satellite imagery analysis has increased dramatically over the last several years, keeping pace with the rapid improvements seen in both remote sensing platforms and sensors. As this field expands, so too does the interest in using machine learning methods to automate parts of the imagery analyst’s workflow. In this paper we address one aspect of this challenge: the development of a method for the automatic extraction of parking lots from aerial imagery. To the best of our knowledge, there has been no prior work conducted on the development of an end-to-end pipeline for this particular task. Due to the limited size of our dataset and to accommodate the potentially limited size of future datasets, we propose a deep learning approach using transfer learning. This process hinges upon the use of state of the art Convolutional Neural Networks (CNNs), trained on general image classification datasets. These networks were then fine-tuned on our custom dataset, to establish a comprehensive benchmark for this task. Our method exhibits promising results for automatic parking lot extraction, and is generalizable enough to work with different input types, including high resolution aerial orthoimagery, satellite imagery, full motion video (FMV), and UAV imagery.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121540055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A scientific data provenance harvester for distributed applications 分布式应用的科学数据来源采集器

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085041

E. Stephan, B. Raju, T. Elsethagen, Line C. Pouchard, Carlos Gamboa

{"title":"A scientific data provenance harvester for distributed applications","authors":"E. Stephan, B. Raju, T. Elsethagen, Line C. Pouchard, Carlos Gamboa","doi":"10.1109/NYSDS.2017.8085041","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085041","url":null,"abstract":"Data provenance provides a way for scientists to observe how experimental data originates, conveys process history, and explains influential factors such as experimental rationale and associated environmental factors from system metrics measured at runtime. The US Department of Energy Office of Science Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project has developed a provenance harvester that is capable of collecting observations from file based evidence typically produced by distributed applications. To achieve this, file based evidence is extracted and transformed into an intermediate data format inspired in part by W3C CSV on the Web recommendations, called the Harvester Provenance Application Interface (HAPI) syntax. This syntax provides a general means to pre-stage provenance into messages that are both human readable and capable of being written to a provenance store, Provenance Environment (ProvEn). HAPI is being applied to harvest provenance from climate ensemble runs for Accelerated Climate Modeling for Energy (ACME) project funded under the U.S. Department of Energy’s Office of Biological and Environmental Research (BER) Earth System Modeling (ESM) program. ACME informally provides provenance in a native form through configuration files, directory structures, and log files that contain success/failure indicators, code traces, and performance measurements. Because of its generic format, HAPI is also being applied to harvest tabular job management provenance from Belle II DIRAC scheduler relational database tables as well as other scientific applications that log provenance related information.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124738853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Statistical data reduction for streaming data 流数据的统计数据缩减

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085035

Kesheng Wu, Dongeun Lee, A. Sim, Jaesik Choi

{"title":"Statistical data reduction for streaming data","authors":"Kesheng Wu, Dongeun Lee, A. Sim, Jaesik Choi","doi":"10.1109/NYSDS.2017.8085035","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085035","url":null,"abstract":"Bulk of the streaming data from scientific simulations and experiments consists of numerical values, and these values often change in unpredictable ways over a short time horizon. Such data values are known to be hard to compress, however, much of the random fluctuation is not essential to the scientific application and could therefore be removed without adverse impact. We have developed a compression technique based on statistical similarity that could reduce the storage requirement by over 100-fold while preserve prominent features in the data stream. We achieve these impressive compression ratios because most data blocks have similar probability distribution and could be reproduced from a small block. The core concept behind this work is the exchangeability in statistics. To create a practical compression algorithm, we choose to work with fixed size blocks and use Kolmogorov-Smirnov test to measure similarity. The resulting technique could be regarded as a dictionary-based compression scheme. In this paper, we describe the method and explore its effectiveness on two sets of application data. We pay particular attention to the Fourier components of the reconstructed data and show that in addition to preserving unique features in data it is also faithfully preserving the Fourier components whose periods extend more than a few blocks.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124790411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Progressive clustering of big data with GPU acceleration and visualization 基于GPU加速和可视化的大数据渐进式聚类

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085036

Jun Wang, E. Papenhausen, B. Wang, S. Ha, A. Zelenyuk, K. Mueller

引用次数: 1

Comparative study of deep learning framework in HPC environments HPC环境下深度学习框架的比较研究

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085040

HamidReza Asaadi, B. Chapman

{"title":"Comparative study of deep learning framework in HPC environments","authors":"HamidReza Asaadi, B. Chapman","doi":"10.1109/NYSDS.2017.8085040","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085040","url":null,"abstract":"The rise of machine learning and deep learning applications in recent years has resulted in the development of several specialized frameworks to design neural networks, train them and use them in production. The efforts toward scaling and tuning of such frameworks have coincided with the increasing popularity of heterogeneous architectures (e.g. GPUs and accelerators); and developers found that the iterative and highly concurrent nature of machine learning algorithms is a good fit for the offerings of such architectures. As a result, most machine learning and deep learning frameworks now support offloading features and job distribution among heterogeneous processing units. Despite increasing use of deep learning techniques in scientific computing, HPC architectures has not been a first-class requirement for framework designers and is missing in many cases. We have taken a first step toward understanding the behavior of deep learning frameworks in HPC environments by comparing the performance of such frameworks on a regular HPC cluster setup and their compatibility with cluster architecture. We also studied the support for HPC-specific features provided by each of the frameworks. In order to accomplish this, a set of tests to compare deep learning frameworks has been introduced as well. In addition to the performance results, we observed some design conflicts between these frameworks and the traditional HPC tool chain. Launching deep learning framework jobs using common HPC job schedulers is not straightforward. Also, limited HPC-specific hardware support by these frameworks results in scalability issues and high communication overhead when running in multi-node environments. We discuss the idea of adding native support for executing deep learning frameworks to HPC job schedulers as an example of such adjustments in more details.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"10 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114448951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Capturing provenance as a diagnostic tool for workflow performance evaluation and optimization 捕获来源作为工作流性能评估和优化的诊断工具

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085043

Line C. Pouchard, A. Malik, H. V. Dam, C. Xie, W. Xu, K. K. van Dam

引用次数: 8

Machine learning aided prediction of family history of depression 机器学习辅助预测抑郁症家族史

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085046

Allen Liu, Bryant Liu, Daniel Lee, M. Weissman, J. Posner, Jiook Cha, Shinjae Yoo

{"title":"Machine learning aided prediction of family history of depression","authors":"Allen Liu, Bryant Liu, Daniel Lee, M. Weissman, J. Posner, Jiook Cha, Shinjae Yoo","doi":"10.1109/NYSDS.2017.8085046","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085046","url":null,"abstract":"Increased risk for psychopathology in the offspring of depressed parents has been widely known. The brain may mediate the effects of risk of familial depression on the offspring via shared genetic and environmental factors. Conventional brain imaging studies to test this mediation effects primarily use a priori knowledge to select a subset of brain imaging-derived features. Despite of the existing positive results supporting the notion of the brain as an endophenotype for familial depression, no quantitative assessment has been performed regarding to what extents the complex brain structure contains information about familial depression. To this end, here we aim to predict whether an individual has a history of familial depression. We propose a data-driven, unbiased, and rigorous machine learning approach using multimodal brain features (e.g., grey matter morphometry based on T1-weighted images and structural connectome based on probabilistic diffusion tractography) to capture the complex representations of brain structure. We implemented logistic regression (LR) with regularization, support vector machine (SVR), and graph convolutional neural network (GCN). Our models show promising cross-validated classification accuracy: 97.78% (LR), 93.67% (SVM) and 89.58% (GCN). Brain features with greatest weights in the models include brain regions previously implicated in the depression literature (e.g., emotion regulation frontal-limbic circuit) as well as new regions. Results suggest a large impact of familial depression on the brain structure and connectome. Results also highlight potentials for prediction of risk for psychopathology in a data-driven fashion using costeffective, simple, ubiquitous brain images.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127724594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Visualization of Higgs potentials and decays from sources beyond the standard model including dark matter and extra dimensions 标准模型之外的希格斯势和衰变的可视化，包括暗物质和额外维度

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085051

R. Miceli, M. McGuigan

{"title":"Visualization of Higgs potentials and decays from sources beyond the standard model including dark matter and extra dimensions","authors":"R. Miceli, M. McGuigan","doi":"10.1109/NYSDS.2017.8085051","DOIUrl":"https://doi.org/10.1109/NYSDS.2017.8085051","url":null,"abstract":"Because the Higgs particle interacts with so many different particles, the potential associated with it takes contributions from many different sectors. This makes it very difficult to calculate, even when dealing with a restricted number of components. Another concern is creating useful visualizations of these potentials, as visual inspection is one of the main ways that physicists can gain new insights about them. Our main project involved plotting various Higgs potentials from new physics beyond the Standard Mod-el, in ways that would illustrate their dependence on various parameters such as temperature, energy scale and coupling strength. We also exported these potentials as 3D models with the aim of displaying them in virtual reality. We will calculate and plot new potentials including contributions from sources like dark matter and its interactions which could be visible through astrophysics or LHC experiments. We plotted new types of potentials associated with extra dimensions, dynamical symmetry breaking, and hidden gauge sectors involving undiscovered fermions. Another part of the project concerned the setup and con-figuration of the Visualization Center in the Computational Science Initiative at BNL. The room is equipped with a graphics computer with dual GPUs powering 6 wall mounted televisions and two virtual reality headsets. The televisions are configured to work as a single large unit intended to display large animations and data visualizations. This setup should make it easier for scientists to interact with and draw meaning from data, such as the high energy physics models that we studied.","PeriodicalId":380859,"journal":{"name":"2017 New York Scientific Data Summit (NYSDS)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Keyword extraction for document clustering using submodular optimization 基于子模块优化的文档聚类关键字提取

2017 New York Scientific Data Summit (NYSDS) Pub Date : 2017-08-01 DOI: 10.1109/NYSDS.2017.8085056

Xi Zhang, K. Mueller, Shinjae Yoo

引用次数: 0