2016 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
DeepSky: Identifying Absorption Bumps via Deep Learning DeepSky:通过深度学习识别吸收颠簸
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.34
Xiaoyong Yuan, Min Li, Sudeep Gaddam, Xiaolin Li, Yinan Zhao, Jingzhe Ma, J. Ge
{"title":"DeepSky: Identifying Absorption Bumps via Deep Learning","authors":"Xiaoyong Yuan, Min Li, Sudeep Gaddam, Xiaolin Li, Yinan Zhao, Jingzhe Ma, J. Ge","doi":"10.1109/BigDataCongress.2016.34","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.34","url":null,"abstract":"The pervasive interstellar grains provide significant insights to help us understand the formation and evolution of stars, planetary systems, and galaxies, and could potentially lead us to the secret of the origin of life. One of the most effective ways to analyze the dusts is via their interaction and interference on some background light. The observable extinction curves and spectral features carry the information about the size and composition of the dusts. Among the features, the broad 2175 Å absorption bump is one of the most significant spectroscopic interstellar extinction features. Traditionally, astronomers apply conventional statistical and signal processing techniques to detect the existence of absorption bumps. These approaches require labor-intensive preprocessing and the co-existence of some other reference features to alleviate the influence from the noises. Conventional approaches not only involve substantial labor cost in complicated workflows, but also demand well-trained expertise to make subtle and error-prone conditional decisions. In this paper, we propose to leverage deep learning to automate the detection workflow without minute feature engineering. We design and analyze deep convolutional neural networks for detecting absorption bumps. We further propose the framework of deep learning mechanisms and models (collectively called DeepSky) for scientific discovery in astronomy. The prototype of DeepSky demonstrates efficient and effective results using limited labeled data. With well-designed data augmentation, our trained model achieved about 99% accuracy in prediction using the real-world data.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123178513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A NoSQL Data Model for Scalable Big Data Workflow Execution 面向可扩展大数据工作流执行的NoSQL数据模型
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.15
Aravind Mohan, M. Ebrahimi, Shiyong Lu, Alexander Kotov
{"title":"A NoSQL Data Model for Scalable Big Data Workflow Execution","authors":"Aravind Mohan, M. Ebrahimi, Shiyong Lu, Alexander Kotov","doi":"10.1109/BigDataCongress.2016.15","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.15","url":null,"abstract":"While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131143378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Do Drivers' Behaviors Reflect Their Past Driving Histories? - Large Scale Examination of Vehicle Recorder Data 司机的行为是否反映了他们过去的驾驶历史?-大规模查阅车辆记录资料
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.58
Daisaku Yokoyama, Masashi Toyoda
{"title":"Do Drivers' Behaviors Reflect Their Past Driving Histories? - Large Scale Examination of Vehicle Recorder Data","authors":"Daisaku Yokoyama, Masashi Toyoda","doi":"10.1109/BigDataCongress.2016.58","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.58","url":null,"abstract":"We present a method for analyzing the relationships between driver characteristics and driving behaviors on the basis of large-scale and long-term vehicle recorder data. Previous studies relied on precise data obtained under critical driving situations, which led to overlooking routine driving behaviors. In contrast, we used a dataset that was sparse but large-scale (over 100 fleet drivers) and long-term (one year's worth) and covering all driving operations. We focused on classifying drivers by their accident history and examined the correlation between having an accident and driving behavior. We were able to reliably predict whether a driver had recently experienced an accident (f-measure > 86 %). This level of performance cannot be achieved using only the drivers' demographic information. We also found that taking into account the driving circumstances improved classification performance and that driving operations at low velocity were more informative. This method can be used, for example, by fleet driver management to classify drivers by their skill level, safety, physical/mental fatigue, aggressiveness, and so on.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131387335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems 用于优化大图处理系统的动态图的连续和经济的划分
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.12
Amir Abdolrashidi, Lakshmish Ramaswamy
{"title":"Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems","authors":"Amir Abdolrashidi, Lakshmish Ramaswamy","doi":"10.1109/BigDataCongress.2016.12","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.12","url":null,"abstract":"Recently, several cluster computing frameworks have been proposed for scalable and efficient processing of big graphs. The manner in which graph data is partitioned and placed on the compute nodes has a significant impact on cluster performance. While most existing graph partitioning and placement strategies have been designed for static graphs, the graphs in many modern applications are dynamic (time-evolving). In this paper, we propose a unique, continuous and multi-cost sensitive approach for partitioning dynamic graphs. Our approach incorporates novel cost functions that take into account major factors that impact the performance of big graph processing clusters. We also present incremental algorithms to efficaciously handle various types of graph dynamics. Our algorithms are unique in that they work by locally adjusting the partitions thus avoiding massive repartitioning. This paper reports a series of experiments to demonstrate the effectiveness of the proposed algorithms in maximizing the performance of big graph processing systems on dynamic graphs.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"88 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128000887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Semi-clustering That Scales: An Empirical Evaluation of GraphX 可伸缩的半聚类:GraphX的经验评价
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.51
J. S. Andersen, O. Zukunft
{"title":"Semi-clustering That Scales: An Empirical Evaluation of GraphX","authors":"J. S. Andersen, O. Zukunft","doi":"10.1109/BigDataCongress.2016.51","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.51","url":null,"abstract":"GraphX is a distributed graph processing framework build on top of Spark Core. This work investigates the two questions, whether GraphX is an appropriate environment for the implementation of graph algorithms and how the computation of graph algorithms based on GraphX scales. This paper examines a graph algorithm for semi-clustering as used in social network analysis. We describe the implementation process of this algorithm beginning with a graph-oriented modeling tailored for GraphX up to an executable program. Based on our implementation, we have performed empirical evaluations regarding the scalability of our implementation and the GraphX platform. The experiments evidence that different kind of graph algorithms are supported by GraphX and that the execution of our algorithm can scale almost linearly when properly designed.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"316 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115254426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Workflow Transformation for Real-Time Big Data Processing 实时大数据处理的工作流转换
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-01-15 DOI: 10.1109/BigDataCongress.2016.47
Yuji Ishizuka, Wuhui Chen, Incheon Paik
{"title":"Workflow Transformation for Real-Time Big Data Processing","authors":"Yuji Ishizuka, Wuhui Chen, Incheon Paik","doi":"10.1109/BigDataCongress.2016.47","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.47","url":null,"abstract":"With the explosion of big data, processing and analyzing large numbers of continuous data streams in real-time, such as social media stream, sensor data streams, log streams, stock exchanges streams, etc., has become a crucial requirement for many scientific and industrial applications in recent years. Increased volume of streaming data as well as the demand for more complex real-time analytics require for execution of processing pipelines among heterogeneous event processing engines as a workflow. In this paper, we propose a workflow transformation for cost minimization in real-time big data processing on the heterogeneous systems. We first give the definition of stream-based workflow, and then we define eight different patterns as rules for workflow transformation, next, we give our workflow transformation algorithm based on our designed rules. Finally, our experiment shows that our proposed workflow transformation method can reduce the communication and computation cost effectively.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124894935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling 基于结构化数据洗牌的大数据分析栈优化案例研究
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.19
Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu, Wei Li
{"title":"A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling","authors":"Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu, Wei Li","doi":"10.1109/CLUSTER.2015.19","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.19","url":null,"abstract":"Current major big data analytical stacks often consist of a general-purpose, multi-staged cluster computation framework (e.g. Hadoop) and a SQL query execution system (e.g. Hive) on its top. In such stacks, a key factor of query execution performance is the efficiency of data shuffling between two execution stages (e.g. Map/Reduce). However, current stacks often execute data shuffling in a data-oblivious way, which means that for structured data processing, various useful information about the shuffled data and the queries on the data is simply wasted. Specifically, this problem makes two optimization opportunities lost: i) unnecessary records cannot be filtered in advance, ii) column-oriented compression algorithms cannot be applied. To solve the problem, in this paper, we have designed and implemented a novel data shuffling mechanism in Hadoop, called Structured Data Shuffling (S-Shuffle), which avoids the low efficiencies of traditional data shuffling by carefully leveraging the rich information in data and queries provided by Hive. Our experimental results with industry-standard TPC-H benchmark show that by using S-Shuffle, the performance of SQL query processing on Hadoop can be improved by up to 2.4x..","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130060233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信