2018 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
An Architecture for Cost Optimization in the Processing of Big Geospatial Data in Public Cloud Providers 公共云提供商处理大地理空间数据的成本优化架构
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00032
João Bachiega, M. Reis, M. Holanda, Aleteia P. F. Araujo
{"title":"An Architecture for Cost Optimization in the Processing of Big Geospatial Data in Public Cloud Providers","authors":"João Bachiega, M. Reis, M. Holanda, Aleteia P. F. Araujo","doi":"10.1109/BigDataCongress.2018.00032","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00032","url":null,"abstract":"Cloud computing is a suitable platform for running applications to process big data. Currently, with the increase in the volume of geographic and spatial data volume, conceptualized as Big Geospatial Data, a variety of tools have been developed to efficiently process this data. The index applied to the dataset is an important aspect. This paper presents an architecture, supported by a Knownlegde Base and an Inference Engine, to process big geospatial data in public cloud providers with the ultimate goal of optimizing costs. The tests executed demonstrated that the rules created are capable of optimizing the total costs for processing large geospatial data up to 71%.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114528773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning a Joint Low-Rank and Gaussian Model in Matrix Completion with Spectral Regularization and Expectation Maximization Algorithm 用谱正则化和期望最大化算法学习矩阵补全中的低秩高斯联合模型
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00035
Gang Wu, Ratnesh Kumar
{"title":"Learning a Joint Low-Rank and Gaussian Model in Matrix Completion with Spectral Regularization and Expectation Maximization Algorithm","authors":"Gang Wu, Ratnesh Kumar","doi":"10.1109/BigDataCongress.2018.00035","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00035","url":null,"abstract":"Completing a partially-known matrix, is an important problem in the field of data science and useful for many related applications, e.g., collaborative filtering for recommendation systems, global positioning in large-scale sensor networks. Low-rank and Gaussian models are two popular classes of models used in matrix completion, both of which have proven success. In this paper, we introduce a single model that leverage the features of both low-rank and Gaussian models. We develop a novel method based on Expectation Maximization (EM) that involves spectral regularization (for low-rank part) as well as maximum likelihood maximization (for learning Gaussian parameters). We also test our framework on real-world movie rating data, and provide comparison results with some of the common methods used for matrix completion.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117191870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Useful ToPIC: Self-Tuning Strategies to Enhance Latent Dirichlet Allocation 有用话题:增强潜在狄利克雷分配的自调谐策略
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00012
Stefano Proto, Evelina Di Corso, F. Ventura, T. Cerquitelli
{"title":"Useful ToPIC: Self-Tuning Strategies to Enhance Latent Dirichlet Allocation","authors":"Stefano Proto, Evelina Di Corso, F. Ventura, T. Cerquitelli","doi":"10.1109/BigDataCongress.2018.00012","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00012","url":null,"abstract":"TToPIC (Tuning of Parameters for Inference of Concepts) is a distributed self-tuning engine whose aim is to cluster collections of textual data into correlated groups of documents through a topic modeling methodology (i.e., LDA). ToPIC includes automatic strategies to relieve the end-user of the burden of selecting proper values for the overall analytics process. ToPIC's current implementation runs on Apache Spark, a state-of-the-art distributed computing framework. As a case study, ToPIC has been validated on three real collections of textual documents characterized by different distributions. The experimental results show the effectiveness and efficiency of the proposed solution in analyzing collections of documents without tuning algorithm parameters and in discovering cohesive and well-separated groups of documents with a similar topic.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"60 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127582345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Graph-Based Data Relevance Estimation for Large Storage Systems 基于图的大型存储系统数据相关性估计
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00040
V. Venkatesan, Taras Lehinevych, G. Cherubini, A. Glybovets, M. Lantz
{"title":"Graph-Based Data Relevance Estimation for Large Storage Systems","authors":"V. Venkatesan, Taras Lehinevych, G. Cherubini, A. Glybovets, M. Lantz","doi":"10.1109/BigDataCongress.2018.00040","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00040","url":null,"abstract":"In storage systems, the relevance of files to users can be taken into account to determine storage control policies to reduce cost, while retaining high reliability and performance. The relevance of a file can be estimated by applying supervised learning and using the metadata as features. However, supervised learning requires many training samples to achieve an acceptable estimation accuracy. In this paper, we propose a novel graph-based learning system for the relevance estimation of files using a small training set. First, files are grouped into different file-sets based on the available metadata. Then a parameterized similarity metric among files is introduced for each file-set using the knowledge of the metadata. Finally, message passing over a bipartite graph is applied for relevance estimation. The proposed system is tested on various datasets and compared with logistic regression.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
IEEE BigData Congress 2018 Organizing Committee IEEE大数据大会2018组委会
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/bigdatacongress.2018.00006
{"title":"IEEE BigData Congress 2018 Organizing Committee","authors":"","doi":"10.1109/bigdatacongress.2018.00006","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2018.00006","url":null,"abstract":"","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploration of Bi-Level PageRank Algorithm for Power Flow Analysis Using Graph Database 利用图数据库进行潮流分析的双层PageRank算法探索
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00026
Chen Yuan, Yi Lu, Kewen Liu, Guangyi Liu, Renchang Dai, Zhiwei Wang
{"title":"Exploration of Bi-Level PageRank Algorithm for Power Flow Analysis Using Graph Database","authors":"Chen Yuan, Yi Lu, Kewen Liu, Guangyi Liu, Renchang Dai, Zhiwei Wang","doi":"10.1109/BigDataCongress.2018.00026","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00026","url":null,"abstract":"Compared with traditional relational database, graph database (GDB) is a natural expression of most real-world systems. Each node in the GDB is not only a storage unit, but also a logic operation unit to implement local computation in parallel. This paper firstly explores the feasibility of power system modeling using GDB. Then a brief introduction of the PageRank algorithm and the feasibility analysis of its application in GDB are presented. Then the proposed GDB based bi-level PageRank algorithm is developed from PageRank algorithm and Gauss-Seidel methodology realize high performance parallel computation. MP 10790 case, and its extensions, MP 10790*10 and MP 10790*100, are tested to verify the proposed method and investigate its parallelism in GDB. Besides, a provincial system, FJ case which include 1425 buses and 1922 branches, is also included in the case study to further prove the proposed algorithm's effectiveness in real world.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116814199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Personalized Travel Recommendation System Using Social Media Analysis 基于社会媒体分析的个性化旅游推荐系统
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00046
Joseph Coelho, Paromita Nitu, P. Madiraju
{"title":"A Personalized Travel Recommendation System Using Social Media Analysis","authors":"Joseph Coelho, Paromita Nitu, P. Madiraju","doi":"10.1109/BigDataCongress.2018.00046","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00046","url":null,"abstract":"Personalization of recommender systems enables customized services to users. Social media is one resource that aids personalization. This study explores the use of twitter data to personalize travel recommendations. A machine learning classification model is used to identify travel related tweets. The travel tweets are then used to personalize recommendations regarding places of interest for the user. Places of interest are categorized as: historical buildings, museums, parks, and restaurants. To better personalize the model, travel tweets of the user’s friends and followers are also mined. Volunteer twitter users were asked to provide their twitter handle as well as rank their travel category preferences in a survey. We evaluated our model by comparing the predictions made by our model with the users choices in the survey. The evaluations show 68% prediction accuracy. The accuracy can be improved with a better travel-tweet training dataset as well as a better travel category identification technique using machine learning. The travel categories can be increased to include items like sports venues, musical events, entertainment, etc. and thereby fine-tune the recommendations. The proposed model lists 'n' places of interest from each category in proportion to the travel category score generated by the model.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116574592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Autoencoder Evaluation and Hyper-Parameter Tuning in an Unsupervised Setting 无监督设置下的自编码器评估和超参数调整
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00034
Ellie Ordway-West, P. Parveen, Austin Henslee
{"title":"Autoencoder Evaluation and Hyper-Parameter Tuning in an Unsupervised Setting","authors":"Ellie Ordway-West, P. Parveen, Austin Henslee","doi":"10.1109/BigDataCongress.2018.00034","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00034","url":null,"abstract":"This paper aims to introduce a new methodology for evaluating autoencoder performance and to shorten time spent on heuristic analysis during hyper-parameter tuning. Existing methodologies for evaluating hyper-parameter tuning focus on finding known anomalies in a labeled set or minimizing the average per row reconstruction error as a method of model selection. This paper focuses on anomaly detection in a completely unsupervised setting, where labels are not known during model training or evaluation. This approach uses the approximate Full Width Half Max (FWHM) of the histogram of the per row reconstruction error in conjunction with the average per row reconstruction error and the number of anomalies found to define a new method of model selection that aims to maximize the FWHM while minimizing the average per row reconstruction error. This methodology simplifies and speeds up model evaluation by presenting model results in an intuitive manner and simplifies the heuristic analysis needed to determine the \"best\" model.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117115418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Message from the IEEE BigData Congress 2018 Chairs 2018年IEEE大数据大会主席致辞
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/bigdatacongress.2018.00005
{"title":"Message from the IEEE BigData Congress 2018 Chairs","authors":"","doi":"10.1109/bigdatacongress.2018.00005","DOIUrl":"https://doi.org/10.1109/bigdatacongress.2018.00005","url":null,"abstract":"","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116903128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latency Measurement of Fine-Grained Operations in Benchmarking Distributed Stream Processing Frameworks 分布式流处理框架中细粒度操作的延迟测量
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00043
Giselle van Dongen, Bram Steurtewagen, D. V. Poel
{"title":"Latency Measurement of Fine-Grained Operations in Benchmarking Distributed Stream Processing Frameworks","authors":"Giselle van Dongen, Bram Steurtewagen, D. V. Poel","doi":"10.1109/BigDataCongress.2018.00043","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00043","url":null,"abstract":"This paper describes a benchmark for stream processing frameworks allowing accurate latency benchmarking of fine-grained individual stages of a processing pipeline. By determining the latency of distinct common operations in the processing flow instead of the end-to-end latency, we can form guidelines for efficient processing pipeline design. Additionally, we address the issue of defining time in distributed systems by capturing time on one machine and defining the baseline latency. We validate our benchmark for Apache Flink using a processing pipeline comprising common stream processing operations. Our results show that joins are the most time consuming operation in our processing pipeline. The latency incurred by adding a join operation is 4.5 times higher than for a parsing operation, and the latency gradually becomes more dispersed after adding additional stages.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115125591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信