2016 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
Modeling the Location Selection of Mirror Servers in Content Delivery Networks 内容分发网络中镜像服务器位置选择的建模
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.68
Peter Hillmann, Tobias Uhlig, G. Rodosek, O. Rose
{"title":"Modeling the Location Selection of Mirror Servers in Content Delivery Networks","authors":"Peter Hillmann, Tobias Uhlig, G. Rodosek, O. Rose","doi":"10.1109/BigDataCongress.2016.68","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.68","url":null,"abstract":"For a provider of a Content Delivery Network (CDN), the location selection of mirror servers is a complex optimization problem. Generally, the objective is to place the nodes centralized such that all customers have convenient access to the service according to their demands. It is an instance of the k-center problem, which is proven to be NP-hard. Determining reasonable server locations directly influences run time effects and future service costs. We model, simulate, and optimize the properties of a content delivery network. Specifically, considering the server locations in a network infrastructure with prioritized customers and weighted connections. A simulation model for the servers is necessary to analyze the caching behavior in accordance to the targeted customer requests. We analyze the problem and compare different optimization strategies. For our simulation, we employ various realistic scenarios and evaluate several performance indicators. Our new optimization approach shows a significant improvement. The presented results are generally applicable to other domains with k-center problems, e.g., the placement of military bases, the planning and placement of facility locations, or data mining.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":" 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113947138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Design and Implementation of a Multidimensional Data Retrieval Sorting Optimization Model 多维数据检索排序优化模型的设计与实现
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.38
Danfeng Yan, Liying Zhang, Xuan Zhao
{"title":"Design and Implementation of a Multidimensional Data Retrieval Sorting Optimization Model","authors":"Danfeng Yan, Liying Zhang, Xuan Zhao","doi":"10.1109/BigDataCongress.2016.38","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.38","url":null,"abstract":"Currently, how to accurately and quickly locate required information from the massive network data, especially from the current popular social network data, is the focus of data retrieval services. Based on the traditional data retrieval sorting technology, this paper proposes a multi-dimensional data retrieval sorting optimization model, considering the characteristics of data, users and applications. Meanwhile, this paper implements this model in the system of financial microblog data retrieval. It enables the retrieval system to sort the results according to the characteristics of the microblog data, users' real query intentions and financial tendency of the system. Finally, this paper shows the basic test results, and future researches are discussed.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132496049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Open Source Big Data Analytics Frameworks Written in Scala 用Scala编写的开源大数据分析框架
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.61
J. Miller, Casey N. Bowman, V. Harish, Shannon P. Quinn
{"title":"Open Source Big Data Analytics Frameworks Written in Scala","authors":"J. Miller, Casey N. Bowman, V. Harish, Shannon P. Quinn","doi":"10.1109/BigDataCongress.2016.61","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.61","url":null,"abstract":"Frameworks for big data arguably began with Google's use of MapReduce. Since then, a huge amount of progress has been made in the development of big data frameworks, many of which have been released as open source. Further to increase portability and ease of set-up, many are coded in a Java Virtual Machine (JVM) based language, e.g., Java or Scala. In addition, processing of big data involves the flow of data, and of course, the processing of data as it flows. This computational paradigm is a natural for functional programming. Furthermore, the map, reduce and combiner have analogs in functional programming. There has been a trend in the last few years toward developing open source big data frameworks written in Scala to support big data analytics. Scala is a modern JVM language that supports both object-oriented and functional programming paradigms.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"22 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124090328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Complex Quality of Service Lifecycle Assessment Methodology 复杂服务质量生命周期评估方法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.71
R. Maule
{"title":"Complex Quality of Service Lifecycle Assessment Methodology","authors":"R. Maule","doi":"10.1109/BigDataCongress.2016.71","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.71","url":null,"abstract":"Large-scale systems engineering projects involving hundreds of independent systems with complex systems integration requirements and high levels of security necessitate specialized analytics methodology to ensure systems readiness across their operational lifecycle. This includes assessment of systems, components, processes and services over time, and in the range of technical, operational and environmental contexts in which the service will operate. This paper presents a quality of service audit method for assessment of complex integrated services.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"38 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131963004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Aware Big Data Warehouse Architecture 隐私敏感的大数据仓库架构
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.53
Karthik Navuluri, R. Mukkamala, Aftab Ahmad
{"title":"Privacy-Aware Big Data Warehouse Architecture","authors":"Karthik Navuluri, R. Mukkamala, Aftab Ahmad","doi":"10.1109/BigDataCongress.2016.53","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.53","url":null,"abstract":"Along with the ever increasing growth in data collection and its mining, there is an increasing fear of compromising individual and population privacy. Several techniques have been proposed in literature to preserve privacy of collected data while storing and processing. In this paper, we propose a privacy-aware architecture for storing and processing data in a Big Data warehouse. In particular, we propose a flexible, extendable, and adaptable architecture that enforces user specified privacy requirements in the form of Embedded Privacy Agreements. The paper discusses the details of the architecture with some implementation details.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115245185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Identification as a Service: Large-Scale Cloud Service Discovery over the World Wide Web 识别即服务:万维网上的大规模云服务发现
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.74
Abdullah Alfazi, Quan Z. Sheng, W. Zhang, Lina Yao, Talal H. Noor
{"title":"Identification as a Service: Large-Scale Cloud Service Discovery over the World Wide Web","authors":"Abdullah Alfazi, Quan Z. Sheng, W. Zhang, Lina Yao, Talal H. Noor","doi":"10.1109/BigDataCongress.2016.74","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.74","url":null,"abstract":"Cloud computing is provisioned with high flexibility with regard to on demand infrastructures, platforms and software as services through the Internet. The unique characteristics of cloud services such as dynamic and diverse services offering at different levels, as well as the lack of standardized description, are becoming important challenges in efficiently discovering cloud services for customers. In this paper, we propose a cloud service search engine that has the capability to automatically identify cloud services aiming at improving the accuracy when searching cloud services in real environments. Our search engine can detect cloud services effectively from the Web sources. Furthermore, we focus on learning the cloud service features, such as similarity function, semantic ontology and cloud service components to identify the cloud services. We use a real cloud service dataset to build an identifier. Our cloud service identifier can be used to automatically determine whether a given Web source is a cloud service with high accuracy.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133438414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards an Efficient Top-K Trajectory Similarity Query Processing Algorithm for Big Trajectory Data on GPGPUs 基于gpgpu的大轨迹数据Top-K轨迹相似度查询处理算法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.33
Eleazar Leal, L. Gruenwald, Jianting Zhang, Simin You
{"title":"Towards an Efficient Top-K Trajectory Similarity Query Processing Algorithm for Big Trajectory Data on GPGPUs","authors":"Eleazar Leal, L. Gruenwald, Jianting Zhang, Simin You","doi":"10.1109/BigDataCongress.2016.33","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.33","url":null,"abstract":"Through the use of location-sensing devices, it has been possible to collect very large datasets of trajectories. These datasets make it possible to issue spatio-temporal queries with which users can gather information about the characteristics of the movements of objects, derive patterns from that information, and understand the objects themselves. Among such spatio-temporal queries that can be issued is the top-K trajectory similarity query. This query finds many applications, such as bird migration analysis in ecology and trajectory sharing in social networks. However, the large size of the trajectory query sets and databases poses significant computational challenges. In this work, we propose a parallel GPGPU algorithm Top-KaBT that is specifically designed to reduce the size of the candidate set generated while processing these queries, and in doing so strives to address these computational challenges. The experiments show that the state of the art top-K trajectory similarity query processing algorithm on GPGPUs, TKSimGPU, achieves a 6.44X speedup in query processing time when combined with our algorithm and a 13X speedup over a GPGPU algorithm that uses exhaustive search.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125581834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Don't Fire Me, a Kernel Autoregressive Hybrid Model for Optimal Layoff Plan 最优裁员计划的核自回归混合模型——不要解雇我
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.72
Zhiling Luo, Ying Li, Ruisheng Fu, Jianwei Yin
{"title":"Don't Fire Me, a Kernel Autoregressive Hybrid Model for Optimal Layoff Plan","authors":"Zhiling Luo, Ying Li, Ruisheng Fu, Jianwei Yin","doi":"10.1109/BigDataCongress.2016.72","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.72","url":null,"abstract":"Job cutting occurs when a modern service enterprise reduces the employing labour cost by firing some staffs. Making an appropriate layoff plan is always quite difficult since a bad job cutting has a serious impact on not only the organization but also the business process executing efficiency. Therefore, in this paper, we address the problem of making an optimal layoff plan with the least influence on the executing of the business process. The key challenge is estimating the process throughput under a layoff plan. We overcome this challenge by two steps: regressing the activity throughput by the stuff number and inferring process throughput by the maximum flow or minimum cut algorithm on the Directed Acyclic Graph of process. In the regressing step, a kernel autoregressive hybrid model is proposed, whose MSE is 30% lower than SVM. After that, an augmenting path based algorithm is introduced to make an optimal layoff plan. To evaluate the accuracy of our model, we conduct an external experiment on a real dataset from the workflow system employed in the government of Hangzhou City in China, which results in 9750969 logs from 2050 activities and 16295 employees in two years.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134328921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Geelytics: Enabling On-Demand Edge Analytics over Scoped Data Sources Geelytics:在限定数据源上实现按需边缘分析
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.21
Bin Cheng, Apostolos Papageorgiou, M. Bauer
{"title":"Geelytics: Enabling On-Demand Edge Analytics over Scoped Data Sources","authors":"Bin Cheng, Apostolos Papageorgiou, M. Bauer","doi":"10.1109/BigDataCongress.2016.21","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.21","url":null,"abstract":"Large-scale Internet of Things (IoT) systems typically consist of a large number of sensors and actuators distributed geographically in a physical environment. To react fast on real time situations, it is often required to bridge sensors and actuators via real-time stream processing close to IoT devices. Existing stream processing platforms like Apache Storm and S4 are designed for intensive stream processing in a cluster or in the Cloud, but they are unsuitable for large scale IoT systems in which processing tasks are expected to be triggered by actuators on-demand and then be allocated and performed in a Cloud-Edge environment. To fill this gap, we designed and implemented a new system called Geelytics, which can enable on-demand edge analytics over scoped data sources via IoT-friendly interfaces to sensors and actuators. This paper presents its design, implementation, interfaces, and core algorithms. Three example applications have been built to showcase the potential of Geelytics in enabling advanced IoT edge analytics. Our preliminary evaluation results demonstrate that we can reduce the bandwidth cost by 99% in a face detection example, achieve less than 10 milliseconds reacting latency and about 1.5 seconds startup latency in an outlier detection example, and also save 65% duplicated computation cost via sharing intermediate results in a data aggregation example.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121374805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Infra: SLO Aware Elastic Auto-scaling in the Cloud for Cost Reduction 基础设施:基于SLO的云计算弹性自动扩展,以降低成本
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.25
Subhajit Sidhanta, S. Mukhopadhyay
{"title":"Infra: SLO Aware Elastic Auto-scaling in the Cloud for Cost Reduction","authors":"Subhajit Sidhanta, S. Mukhopadhyay","doi":"10.1109/BigDataCongress.2016.25","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.25","url":null,"abstract":"Enterprises often host applications and services on clusters of virtual machine instances provided by cloud service providers, like Amazon, Rackspace, Microsoft, etc. Users pay a cloud usage cost on the basis of the hourly usage [1] of virtual machine instances composing the cluster. A cluster composition refers to the number of virtual machine instances of each type (from a predefined list of types) comprising a cluster. We present Infra, a cloud provisioning framework that can predict an (ϵ, δ)-minimum cluster composition required to run a given application workload on a cloud under an SLO (i.e., Service Level Objective) deadline. This paper does not present a new approximation algorithm, instead we provide a tool that applies existing machine learning techniques to predict an (ϵ, δ)-minimum cluster composition. An (ϵ, δ)-minimum cluster composition specifies a cluster composition whose cost approximates that of the minimum cluster composition (i.e., the cluster composition that incurs the minimum cloud usage cost that must be incurred in executing a given application under an SLO deadline); the approximation bounds the error to a predefined threshold ϵ with a degree of confidence 100 * (1 - δ)%. The degree of confidence 100 * (1 - δ)% specifies that the probability of failure in achieving the error threshold ϵ for the above approximation is at most δ. For ϵ = 0.1 and δ = 0.02, we experimentally demonstrate that an (ϵ, δ)-minimum cluster composition predicted by Infra successfully approximates the minimum cluster composition, i.e., the accuracy of prediction of minimum cluster composition ranges from 93.1% to 97.99% (the error is bound by the error threshold of 0.1) with a 98% degree of confidence, since 100* (1 - δ) = 98%. Auto scaling refers to the process of automatically adding cloud instances to a cluster to adapt to an increase in application workload (increased request rate), and deleting instances from a cluster when there is a decrease in workload (reduced request rate). However, state-of-the-art auto scaling techniques have the following disadvantages: A) they require explicit policy definition for changing the cluster configuration and therefore lack the ability to automatically adapt a cluster with respect to changing workload, B) they do not compute the appropriate size of resources required, and therefore do not result in an “optimal” cluster composition. Infra provides an auto scaler that automatically adapts a cloud infrastructure to changing application workload, scaling the cluster up/down based on predictions from the Infra provisioning tool.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128435628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信