2016 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
Big Datasets for Research: A Survey on Flagship Conferences 研究大数据集:旗舰会议调查
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.62
Yi Wei, Shijun Liu, Jiao Sun, Li-zhen Cui, Li Pan, Lei Wu
{"title":"Big Datasets for Research: A Survey on Flagship Conferences","authors":"Yi Wei, Shijun Liu, Jiao Sun, Li-zhen Cui, Li Pan, Lei Wu","doi":"10.1109/BigDataCongress.2016.62","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.62","url":null,"abstract":"It is obvious that big data can bring us new opportunities to discover valuable information. Apparently, corresponding big datasets are powerful tools for scholars, which connect theoretical studies to reality. They can help scholars to evaluate their achievements and find new problems. In recent years, there has been a significant growth in research data repositories and registries. However, these infrastructures are fragmented across institutions, countries and research domains. As such, finding research datasets is not a trivial task for many researchers. Thus we investigated 195 papers regarding big data on some notable international conferences in recent 3 years, and also gathered 285 datasets mentioned in them. In this paper, we present and analyze our survey results in terms of the status quo of big data research and datasets from different aspects. In particular, we propose two different taxonomies of big datasets and classify our surveyed datasets into them. In addition, we also give a brief introduction about 7 widely accepted data collections online. Finally, some basic principles for scholars in choosing and using big datasets are given.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"16 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129343962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mobile Network Traffic Prediction Using MLP, MLPWD, and SVM 基于MLP、MLPWD和SVM的移动网络流量预测
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.63
A. Nikravesh, S. Ajila, Chung-Horng Lung, Wayne Ding
{"title":"Mobile Network Traffic Prediction Using MLP, MLPWD, and SVM","authors":"A. Nikravesh, S. Ajila, Chung-Horng Lung, Wayne Ding","doi":"10.1109/BigDataCongress.2016.63","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.63","url":null,"abstract":"Mobile networks are critical for today's social mobility and the Internet. More and more people are subscribing to mobile networks, which has led to substantial demands. The network operators need to find ways of meeting the huge demands. Since mobile network resources, such as spectrum, are expensive, there is a need for efficient management of network resources as well as finding a way to predict future use for network management and planning. Network planning is crucial for network operators to provide services that are both cost effective and have high degree of quality of service (QoS). The aim of this research is to apply data analysis techniques to support network operators to maximize the resource usage for network operators, that is, to prevent both under-provisioning and over-provisioning. Therefore, this paper investigates the prediction accuracy of machine learning techniques - Multi-Layer Perceptron (MLP), Multi-Layer Perceptron with Weight Decay (MLPWD), and Support Vector Machines (SVM) - using a dataset from a commercial trial mobile network. The experimental results show that SVM outperforms MLP and MLPWD in predicting the multidimensionality of the real-life network traffic data, while MLPWD has better accuracy in predicting the unidimensional data. Our experimental results can help network operators predict future demands and facilitate provisioning and placement of mobile network resources for effective resource management.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126619266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Catching Social Butterflies: Identifying Influential Users of an Event-Based Social Networking Service 捕捉社交蝴蝶:识别基于事件的社交网络服务的有影响力的用户
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.32
Jonathan Popa, Kusha Nezafati, Y. Gel, John Zweck, G. Bobashev
{"title":"Catching Social Butterflies: Identifying Influential Users of an Event-Based Social Networking Service","authors":"Jonathan Popa, Kusha Nezafati, Y. Gel, John Zweck, G. Bobashev","doi":"10.1109/BigDataCongress.2016.32","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.32","url":null,"abstract":"Online social media information is often used as a proxy for unavailable or partially observed data on networks of offline contacts. This, in turn, requires an understanding of how close the proxy online structure is to the \"true\" offline social network. Social media tools such as Meetup that collect information about both online networks and their offline counterparts are of particularly importance as they shed more light on the (dis)similarity of online and offline contacts and highlight its potential causes. In this paper we examine structural (dis)similarities of the Meetup online and offline data, with a particular focus on geographical differences. We introduce a new measure called the event score to assess connections made by the most socially active individuals, or social butterflies. We apply the new social activity metric to determine which sorts of events are attended most by social butterflies and to evaluate how this aspect of the network structure differs across US cities.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126776635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effective Multi-stream Joining in Apache Samza Framework Apache Samza框架中有效的多流连接
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.41
Zhenyun Zhuang, Tao Feng, Yi Pan, H. Ramachandra, B. Sridharan
{"title":"Effective Multi-stream Joining in Apache Samza Framework","authors":"Zhenyun Zhuang, Tao Feng, Yi Pan, H. Ramachandra, B. Sridharan","doi":"10.1109/BigDataCongress.2016.41","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.41","url":null,"abstract":"Increasing adoption of Big Data in business environments have driven the needs of stream joining in realtime fashion. Multi-stream joining is an important stream processing type in today's Internet companies, and it has been used to generate higher-quality data in business pipelines. Multi-stream joining can be performed in two models: (1) All-In-One (AIO) Joining and (2) Step-By-Step (SBS) Joining. Both models have advantages and disadvantages with regard to memory footprint, joining latency, deployment complexity, etc. In this work, we analyze the performance tradeoffs associated with these two models using Apache Samza.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114166284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Improving the Visualization of WordNet Large Lexical Database through Semantic Tag Clouds 利用语义标签云改进WordNet大型词汇数据库的可视化
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.14
E. G. Caldarola, A. M. Rinaldi
{"title":"Improving the Visualization of WordNet Large Lexical Database through Semantic Tag Clouds","authors":"E. G. Caldarola, A. M. Rinaldi","doi":"10.1109/BigDataCongress.2016.14","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.14","url":null,"abstract":"In the Big Data era, the visualization of large data sets is becoming an increasingly relevant task due to the great impact that data have from a human perspective. Since the visualization is the closer phase to the users within the data life cycles phases, there is no doubt that an effective, efficient and impressive representation of the analyzed data may result as important as the analytic process itself. Starting from previous experiences in importing, querying and visualizing WordNet database within Neo4J and Cytoscape, this work aims at improving the WordNet Graph visualization by exploiting the features and concepts behind tag clouds. The objective of this study is twofold: firstly, we argue that the proposed visualization strategy is able to put order in the messy and dense structure of nodes and edges of large knowledge bases as WordNet, showing as much as possible information from this knowledge source and in a clearer way; secondly, we think that the tag cloud approach applied to the synonyms rings reinforces the human cognition in recognizing the different usages of words in natural languages like English. In this regard, we also propose a formal strategy in order to evaluate the information perception in the use of our methodology by means of a questionnaire asked to a group of users. Finally, we compare these results with those resulting from the adoption of well known representations of WordNet within existing GUIs.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114865614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Analytics Toolkit for Business Big Data 商业大数据分析工具包
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.22
Fan Liang, W. Du
{"title":"Analytics Toolkit for Business Big Data","authors":"Fan Liang, W. Du","doi":"10.1109/BigDataCongress.2016.22","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.22","url":null,"abstract":"As large amount of data is increasing at high velocity, companies are searching for scalable and effective solutions for storing and mining their data. Moreover, modeling data as networks is of great interest in business applications. Social network analysis (SNA) measures the relationships and structures with a set of metrics by building graphs for capturing influential actors and patterns. In this paper, to analyze a large volume of business data using graph models, we propose a software system which combines the big data analytics and social network analysis techniques. The system's workflow consists of data collection, graph generation, graph reuse, network property calculation, SNA result interpretation and application integration. The system operations are executable in a Hadoop-based distributed cluster with high throughput on large-scale data.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123358229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QDrill: Query-Based Distributed Consumable Analytics for Big Data QDrill:基于查询的大数据分布式可消费分析
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.23
Shadi Khalifa, Patrick Martin, Dan Rope, Mike McRoberts, Craig Statchuk
{"title":"QDrill: Query-Based Distributed Consumable Analytics for Big Data","authors":"Shadi Khalifa, Patrick Martin, Dan Rope, Mike McRoberts, Craig Statchuk","doi":"10.1109/BigDataCongress.2016.23","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.23","url":null,"abstract":"Consumable analytics attempt to address the shortage of skilled data analysts in many organizations by offering analytic functionality in a form more familiar to in-house expertise. Providing consumable analytics for Big Data faces three main challenges. The first challenge is making the analytics algorithms run in a distributed fashion in order to analyze Big Data in a timely manner. The second challenge is providing an easy interface to allow in-house expertise to run these algorithms in a distributed fashion while minimizing the learning cycle and existing code rewrites. The third challenge is running the analytics on data of different formats stored on heterogeneous data stores. In this paper, we address these challenges in the proposed QDrill. We introduce the Analytics Adaptor extension for Apache Drill, a schema-free SQL query engine for non-relational storage. The Analytics Adaptor introduces the Distributed Analytics Query Language for invoking data mining algorithms from within the Drill standard SQL query statements. The adaptor allows using any sequential single-node data mining library (e.g. WEKA) and makes its algorithms run in a distributed fashion without having to rewrite them. We evaluate QDrill against Apache Mahout. The evaluation shows that QDrill outperforms Mahout in Updatable model training and scoring phase while almost keeping the same performance for Non-Updatable model training. QDrill is more scalable and offers an easier interface, no storage overhead and the whole algorithms repository of WEKA, with the ability to extend to use algorithms from other data mining libraries.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131880353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Hybrid Approach to Quality Evaluation across Big Data Value Chain 跨大数据价值链质量评估的混合方法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.65
M. Serhani, Hadeel T. El Kassabi, Ikbal Taleb, Al Ramzana Nujum
{"title":"An Hybrid Approach to Quality Evaluation across Big Data Value Chain","authors":"M. Serhani, Hadeel T. El Kassabi, Ikbal Taleb, Al Ramzana Nujum","doi":"10.1109/BigDataCongress.2016.65","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.65","url":null,"abstract":"While the potential benefits of Big Data adoption are significant, and some initial successes have already been realized, there remain many research and technical challenges that must be addressed to fully realize this potential. The Big Data processing, storage and analytics, of course, are major challenges that are most easily recognized. However, there are additional challenges related for instance to Big Data collection, integration, and quality enforcement. This paper proposes a hybrid approach to Big Data quality evaluation across the Big Data value chain. It consists of assessing first the quality of Big Data itself, which involve processes such as cleansing, filtering and approximation. Then, assessing the quality of process handling this Big Data, which involve for example processing and analytics process. We conduct a set of experiments to evaluate Quality of Data prior and after its pre-processing, and the Quality of the pre-processing and processing on a large dataset. Quality metrics have been measured to access three Big Data quality dimensions: accuracy, completeness, and consistency. The results proved that combination of data-driven and process-driven quality evaluation lead to improved quality enforcement across the Big Data value chain. Hence, we recorded high prediction accuracy and low processing time after we evaluate 6 well-known classification algorithms as part of processing and analytics phase of Big Data value chain.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132109225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Energy Consumption Prediction with Big Data: Balancing Prediction Accuracy and Computational Resources 基于大数据的能耗预测:平衡预测精度与计算资源
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.27
Katarina Grolinger, Miriam A. M. Capretz, Luke Seewald
{"title":"Energy Consumption Prediction with Big Data: Balancing Prediction Accuracy and Computational Resources","authors":"Katarina Grolinger, Miriam A. M. Capretz, Luke Seewald","doi":"10.1109/BigDataCongress.2016.27","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.27","url":null,"abstract":"In recent years, advances in sensor technologies and expansion of smart meters have resulted in massive growth of energy data sets. These Big Data have created new opportunities for energy prediction, but at the same time, they impose new challenges for traditional technologies. On the other hand, new approaches for handling and processing these Big Data have emerged, such as MapReduce, Spark, Storm, and Oxdata H2O. This paper explores how findings from machine learning with Big Data can benefit energy consumption prediction. An approach based on local learning with support vector regression (SVR) is presented. Although local learning itself is not a novel concept, it has great potential in the Big Data domain because it reduces computational complexity. The local SVR approach presented here is compared to traditional SVR and to deep neural networks with an H2O machine learning platform for Big Data. Local SVR outperformed both SVR and H2O deep learning in terms of prediction accuracy and computation time. Especially significant was the reduction in training time, local SVR training was an order of magnitude faster than SVR or H2O deep learning.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128239130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
JVM Configuration Management and Its Performance Impact for Big Data Applications JVM配置管理及其对大数据应用的性能影响
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.64
S. Sahin, Wenqi Cao, Qi Zhang, Ling Liu
{"title":"JVM Configuration Management and Its Performance Impact for Big Data Applications","authors":"S. Sahin, Wenqi Cao, Qi Zhang, Ling Liu","doi":"10.1109/BigDataCongress.2016.64","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.64","url":null,"abstract":"Big data applications are typically programmed using garbage collected languages, such as Java, in order to take advantage of garbage collected memory management, instead of explicit and manual management of application memory, e.g., dangling pointers, memory leaks, dead objects. However, application performance in Java like garbage collected languages is known to be highly correlated with the heap size and performance of language runtime such as Java Virtual Machine (JVM). Although different heap resizing techniques and garbage collection algorithms are proposed, most of existing solutions require modification to JVM, guest OS kernel, host OS kernel or hypervisor. In this paper, we evaluate and analyze the effects of tuning JVM heap structure and garbage collection parameters on application performance, without requiring any modification to JVM, guest OS, host OS and hypervisor. Our extensive measurement study shows a number of interesting observations: (i) Increasing heap size may not increase application performance for all cases and at all times, (ii) Heap space error may not necessarily indicate that heap is full, (iii) Heap space errors can be resolved by tuning heap structure parameters without enlarging heap, and (iv) JVM of small heap sizes may achieve the same application performance by tuning JVM heap structure and GC parameters without any modification to JVM, VM and OS kernel. We conjecture that these results can help software developers of big data applications to achieve high performance big data computing by better management and configuration of their JVM runtime.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134483571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信