2016 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
An OWL Ontology Representation for Machine-Learned Functions Using Linked Data 基于关联数据的机器学习函数OWL本体表示
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.48
Jingyuan Xu, Hao Wang, Henry Trimbach
{"title":"An OWL Ontology Representation for Machine-Learned Functions Using Linked Data","authors":"Jingyuan Xu, Hao Wang, Henry Trimbach","doi":"10.1109/BigDataCongress.2016.48","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.48","url":null,"abstract":"This paper proposes a method to represent classifiers or learned regression functions using an OWL ontology. Also proposed are methods for finding an appropriate learned function to answer a simple query. The ontology standardizes variable names and dependence properties, so that feature values can be given by users or found on the semantic web.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131051161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach 大数据分布式环境下的预测建模:一种可扩展的偏差校正方法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.17
Gianluca Bontempi, Y. Borgne
{"title":"Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach","authors":"Gianluca Bontempi, Y. Borgne","doi":"10.1109/BigDataCongress.2016.17","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.17","url":null,"abstract":"Massive datasets are becoming pervasive in computational sciences. Though this opens new perspectives for discovery and an increasing number of processing and storage solutions is available, it is still an open issue how to transpose machine learning and statistical procedures to distributed settings. Big datasets are no guarantee for optimal modeling since they do not automatically solve the issues of model design, validation and selection. At the same time conventional techniques of cross-validation and model assessment are computationally prohibitive when the size of the dataset explodes. This paper claims that the main benefit of a massive dataset is not related to the size of the training set but to the possibility of assessing in an accurate and scalable manner the properties of the learner itself (e.g. bias and variance). Accordingly, the paper proposes a scalable implementation of a bias correction strategy to improve the accuracy of learning techniques for regression in a big data setting. An analytical derivation and an experimental study show the potential of the approach.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129457478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Data Mining Tools for Telecommunication Monitoring Data Using Design of Experiment 基于实验设计的电信监控数据挖掘工具评价
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.43
Samneet Singh, Yan Liu, Wayne Ding, Zheng Li
{"title":"Evaluation of Data Mining Tools for Telecommunication Monitoring Data Using Design of Experiment","authors":"Samneet Singh, Yan Liu, Wayne Ding, Zheng Li","doi":"10.1109/BigDataCongress.2016.43","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.43","url":null,"abstract":"Telecommunication monitoring data requires the automation of data analysis workflows. A data mining tool provides data workflow management systems to process and perform analysis tasks. This paper presents an evaluation of two example data mining tools following the principles of design of experiment (DOE) to run forecasting and clustering workflows for telecom monitoring data. We conduct both quantitative and qualitative evaluation on datasets collected from a trial mobile network. The datasets consist of 1 month, six months, one year and two years of time frames that provide the average number of connected users per cell on base stations. The observations from this evaluation provide insights of each data mining tool in the context of data analysis workflows. This documented design of experiment will further facilitate replicating this evaluation study and evaluate other data mining tools.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124524328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation and Analysis of In-Memory Key-Value Systems 内存中键值系统的评价与分析
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.13
Wenqi Cao, S. Sahin, Ling Liu, Xianqiang Bao
{"title":"Evaluation and Analysis of In-Memory Key-Value Systems","authors":"Wenqi Cao, S. Sahin, Ling Liu, Xianqiang Bao","doi":"10.1109/BigDataCongress.2016.13","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.13","url":null,"abstract":"This paper presents an in-depth measurement study of in-memory key-value systems. We examine in-memory data placement and processing techniques, including data structures, caching, performance of read/write operations, effects of different in-memory data structures on throughput performance of big data workloads. Based on the analysis of our measurement results, we attempt to answer a number of challenging and yet most frequently asked questions regarding in-memory key-value systems, such as how do in-memory key-value systems respond to the big data workloads, which exceeds the capacity of physical memory or the pre-configured size of in-memory data structures? How do in-memory key value systems maintain persistency and manage the overhead of supporting persistency? why do different in-memory key-value systems show different throughput performance? and what types of overheads are the key performance indicators? We conjecture that this study will benefit both consumers and providers of big data services and help big data system designers and users to make more informed decision on configurations and management of key-value systems and on parameter turning for speeding up the execution of their big data applications.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125901796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Optimizing Hadoop Framework for Solid State Drives 面向固态硬盘的Hadoop框架优化
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.11
Jae-Ki Hong, Liang Li, Chihye Han, Bingxu Jin, Qichao Yang, Zilong Yang
{"title":"Optimizing Hadoop Framework for Solid State Drives","authors":"Jae-Ki Hong, Liang Li, Chihye Han, Bingxu Jin, Qichao Yang, Zilong Yang","doi":"10.1109/BigDataCongress.2016.11","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.11","url":null,"abstract":"Solid state drives (SSDs) have been widely used in Hadoop clusters ever since their introduction to the big data industry. However, the current Hadoop framework is not optimized to take full advantage of SSDs. In this paper, we introduce architectural improvements in the core Hadoop components to fully exploit the performance benefits of SSDs for data-and compute-intensive workloads. The improved architecture features: a simplified data handling algorithm that utilizes SSD's high random IOPS to store and shuffle the map output data, an accurate pre-read model for HDFS based on libaio to reduce read latency and improve request parallelism, a record size based reduce scheduler to overcome the data skew problem in the reduce phase, and a new block placement policy of HDFS based on the disk wear information to manage SSDs' lifetime. The simplified map output collector and the pre-read model of HDFS show 30% and 18% of performance improvement with Terasort and DFSIO benchmarks, respectively. The modified reduce scheduler shows 12% faster execution time with a real MapReduce application. To extend these results, we affirm that the modified structure also achieves 21% performance improvement on Samsung's MicroBrick-based hyperscale system.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125205041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Key Update as a Service (KAAS): An Agent-Based Modeling for Cloud-Based Access Control 密钥更新即服务(KAAS):基于代理的基于云的访问控制建模
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.67
S. Fugkeaw, Hiroyuki Sato
{"title":"Key Update as a Service (KAAS): An Agent-Based Modeling for Cloud-Based Access Control","authors":"S. Fugkeaw, Hiroyuki Sato","doi":"10.1109/BigDataCongress.2016.67","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.67","url":null,"abstract":"Changes (add, update or revoke) of attributes in the attribute-based access control (ABAC) require the users whose keys containing the changed attributes need to update their keys. In the ABAC setting, attribute authority or data owner has to re-generate the keys and re-distribute the keys to affected users. This imposes the computation and communication cost as well as the administrative cost to handle the attribute change. In this paper, we propose a key update scheme to support attribute changes in ciphertext policy - attribute based encryption (CP-ABE) based access control. We introduce key update algorithm as a part of access control service that is specifically aimed at optimizing user key update processing cost in multi-authority cloud. To this end, we employ a multi-agent system (MAS) to perform the access control functions including user authentication, key update handling, and authorization. To support key update process, the agents will execute key update algorithm by updating all user's keys containing changed attributes on behalf of the attribute authority (AA). In addition, we provide the security proof of our key updating scheme in the general security model. Finally, the performance evaluation is provided to substantiate the efficiency of our proposed scheme.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128506147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable Probabilistic Change Detection Algorithm for Very High Resolution (VHR) Satellite Imagery 一种高分辨率(VHR)卫星图像的可扩展概率变化检测算法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BIGDATACONGRESS.2016.42
Seokyong Hong, Ranga Raju Vatsavai
{"title":"A Scalable Probabilistic Change Detection Algorithm for Very High Resolution (VHR) Satellite Imagery","authors":"Seokyong Hong, Ranga Raju Vatsavai","doi":"10.1109/BIGDATACONGRESS.2016.42","DOIUrl":"https://doi.org/10.1109/BIGDATACONGRESS.2016.42","url":null,"abstract":"Detecting landscape changes using very high-resolution multispectral imagery demands an accurate and scalable algorithm that is robust to geometric and atmospheric errors. Existing pixel-based change detection approaches, however, have several drawbacks, which render them ineffective for VHR imagery analysis. A recent probabilistic change detection framework provides more accurate assessment of changes than traditional approaches by analyzing image patches than pixels. However, this patch (grid)-based approach produces coarse-resolution (patch size) changes. In this work we present a sliding window based approach that produces changes at the native image resolution. The increased computational demand of the sliding window based approach is addressed through thread-level parallelization on shared memory architectures. Our experimental evaluation showed a 91% performance improvement compared to its sequential counterpart on a sq. KM aerial image with varying window sizes on a 16-core (32 virtual threads) Intel Xeon processor.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126832831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Privacy Preserving Predictive Analytics with Smart Meters 隐私保护预测分析与智能电表
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.31
Biruk K. Habtemariam, A. Miranskyy, A. Miri, Saeed Samet, M. Davison
{"title":"Privacy Preserving Predictive Analytics with Smart Meters","authors":"Biruk K. Habtemariam, A. Miranskyy, A. Miri, Saeed Samet, M. Davison","doi":"10.1109/BigDataCongress.2016.31","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.31","url":null,"abstract":"Smart meter data analysis provides key insights about energy demand and usage patterns for efficient operation of power generation and distribution companies. The increase in modern communication bandwidth enables smart meters to transmit the data to a corresponding utility company at hourly update rates or faster. Analysing such large amount of data often requires a high performance cloud computing environment. However, using such environment may lead to exposure of energy consumption patterns of individual households, with the potential consequence of damaging privacy breaches. To mitigate the risk of a privacy breach, this paper proposes a secure linear regression model for smart meter data analytics, based on a Partially Homomorphic Encryption algorithm. In the proposed method, the primary variable; here, the power reading, is encrypted. The statistical coefficients are then computed directly from the cyphertext using integer mappings. With this approach, a computationally feasible linear regression is achievable without compromising a detailed household energy usage profile. Simulation experiments are conducted that demonstrate the performance of proposed method with respect to accuracy and computational complexity.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"32 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125708142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Green Cabs vs. Uber in New York City 纽约市的绿色出租车和优步
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.35
L. K. Poulsen, D. Dekkers, N. Wagenaar, Wesley Snijders, Ben Lewinsky, R. Mukkamala, Ravikiran Vatrapu
{"title":"Green Cabs vs. Uber in New York City","authors":"L. K. Poulsen, D. Dekkers, N. Wagenaar, Wesley Snijders, Ben Lewinsky, R. Mukkamala, Ravikiran Vatrapu","doi":"10.1109/BigDataCongress.2016.35","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.35","url":null,"abstract":"This paper reports on the process and outcomes of big data analytics of ride records for Green cabs and Uber in the outer boroughs of New York City (NYC), USA. Uber is a new entrant to the taxi market in NYC and is rapidly eating away market share from the NYC Taxi & Limousine Commission's (NYCTLC) Yellow and Green cabs. The problem investigated revolves around where exactly Green cabs are losing market share to Uber outside Manhattan and what, if any, measures can be taken to preserve market share? Two datasets were included in the analysis including all rides of Green cabs and Uber respectively from April-September 2014 in New York excluding Manhattan and NYC's two airports. Tableau was used as the visual analytics tool, and PostgreSQL in combination with PostGIS was used as the data processing engine. Our findings show that the performance of Green cabs in isolated zip codes differ significantly, and that Uber is growing faster than Green cabs in general and especially in the areas close to Manhattan. We discuss meaningful facts from the analysis, outline actionable insights, list valuable outcomes and mention some of the study limitations.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114522480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Distributed Top-k Keyword Search over Very Large Databases with MapReduce 基于MapReduce的超大型数据库分布式Top-k关键字搜索
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.55
Ziqiang Yu, Xiaohui Yu, Yuehui Chen, Kun Ma
{"title":"Distributed Top-k Keyword Search over Very Large Databases with MapReduce","authors":"Ziqiang Yu, Xiaohui Yu, Yuehui Chen, Kun Ma","doi":"10.1109/BigDataCongress.2016.55","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.55","url":null,"abstract":"In the last decade, keyword search over relational databases has been extensively studied because it promises to allow users lacking knowledge of structured query languages or unaware of the database schema to query the database in an intuitive way. The existing works about keyword search on databases proposed many approaches and have gain remarkable results. However, most of these approaches are designed for the centralized setting where keyword search is processed by only a single server. In reality, the scale of databases increases sharply and centralized methods hardly can handle keyword queries over these large databases. Moreover, processing keyword search over relational databases is a very time-consuming task, and the efficiency of the existing centralized approaches will degrade notably because the single server cannot provide enough computation power for the keyword search over very large databases. To address these challenges, we propose a distributed keyword search (DKS) approach with MapReduce and this approach can be well deployed on a cluster of servers to deal with keyword search over large databases in a parallel way.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127887855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信