2016 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
Analyzing Future Nodes in a Knowledge Network 分析知识网络中的未来节点
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.57
Sukhwan Jung, T. Lai, Aviv Segev
{"title":"Analyzing Future Nodes in a Knowledge Network","authors":"Sukhwan Jung, T. Lai, Aviv Segev","doi":"10.1109/BigDataCongress.2016.57","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.57","url":null,"abstract":"The paper proposes new methods for knowledge prediction using network analytics and introduces pEgonet, sub-networks within knowledge networks consisting of to-beneighbors of new knowledge. Preliminary results show that it is feasible to predict how future knowledge is added in the knowledge network by utilizing basic properties of pEgonet. The paper presents initial work which will be expanded to derive a method to predict labelled future knowledge, with its impact and structures.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132751316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Empirical Investigation of Mobile Network Traffic Data for Resource Management 面向资源管理的移动网络流量数据实证研究
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.44
Man Si, Chung-Horng Lung, S. Ajila, Wayne Ding
{"title":"An Empirical Investigation of Mobile Network Traffic Data for Resource Management","authors":"Man Si, Chung-Horng Lung, S. Ajila, Wayne Ding","doi":"10.1109/BigDataCongress.2016.44","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.44","url":null,"abstract":"Since the emergence of mobile networks, the number of mobile subscriptions has continued to increase year after year. To efficiently assign mobile network resources such as spectrum (which is expensive), the network operator needs to process and analyze information and statistics about each base station and the traffic that passes through it. This paper presents an application of data analytics by focusing on processing and analyzing two datasets from a commercial trial mobile network. A detailed description that uses Apache Hadoop and the Mahout machine learning library to process and analyze the datasets is presented. The analysis provides insights about the resource usage of network devices. This information is of great importance to network operators for efficient and effective management of resources and for supporting high-quality of user experience. Furthermore, an investigation has been conducted that evaluates the impact of executing the Mahout clustering algorithms with various system and workload parameters on a Hadoop cluster. The results demonstrate the value of performance data analysis. Specifically, the execution time can be significantly reduced using data pre-processing and some machine learning techniques, and Hadoop. The investigation provides useful information for the network operators for future real-time data analytics.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123522374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking 中文微博话题检测与跟踪的一种改进单遍算法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.39
Danfeng Yan, Enzheng Hua, Bo Hu
{"title":"An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking","authors":"Danfeng Yan, Enzheng Hua, Bo Hu","doi":"10.1109/BigDataCongress.2016.39","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.39","url":null,"abstract":"Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130343523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce TwigStack-MR:一种使用MapReduce的分布式XML小枝查询方法
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.79
Hongjie Fan, Han Yang, Zhiyi Ma, Junfei Liu
{"title":"TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce","authors":"Hongjie Fan, Han Yang, Zhiyi Ma, Junfei Liu","doi":"10.1109/BigDataCongress.2016.79","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.79","url":null,"abstract":"Twig pattern query is the core operation of XML process, which directly affects the efficiency of XML data query. It is a challenge to manipulate massive XML data, especially on distributed cluster, such as how to effectively ensure the completeness and correctness of the query results, and minimize communication costs between the various machines. In this paper, we present TwigStack-MR, which simultaneously processes several twig pattern queries for a massive volume of XML data based on MapReduce framework. We first split the large scale XML data file into file-splits as input to the distributed storage system. Then we present the distributed twig algorithm, processing different subtrees of the document tree in parallel. Finally we use the MapReduce framework, full characteristics of distributed environments, to process twig query efficiently. The experimental results show that our approach is efficient and scalable on this issue.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127276307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SD-HDFS: Secure Deletion in Hadoop Distributed File System SD-HDFS: Secure Deletion in Hadoop Distributed File System
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.30
B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski
{"title":"SD-HDFS: Secure Deletion in Hadoop Distributed File System","authors":"B. Agrawal, R. Hansen, Chunming Rong, T. Wiktorski","doi":"10.1109/BigDataCongress.2016.30","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.30","url":null,"abstract":"Sensitive information that is stored in Hadoop clusters can potentially be retrieved without permission or access granted. In addition, the ability to recover deleted data from Hadoop clusters represents a major security threat. Hadoop clusters are used to manage large amounts of data both within and outside of organizations. As a result, it has become important to be able to locate and remove data effectively and efficiently. In this paper, we propose Secure Delete, a holistic framework that propagates file information to the block management layer via an auxiliary communication path. The framework tracks down undeleted data blocks and modifies the normal deletion operation in the Hadoop Distributed File System (HDFS). We introduce CheckerNode, which generates a summary report from all DataNodes and compares the block information with the metadata from the NameNode. If the metadata do not contain the entries for the data blocks, unsynchronized blocks are automatically deleted. However, deleted data could still be recovered using digital forensics tools. We also describe a novel secure deletion technique in HDFS that generates a random pattern and writes multiple times to the disk location of the data block.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127413098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards Data Analytics of Pathogen-Host Protein-Protein Interaction: A Survey 病原体-宿主蛋白-蛋白相互作用的数据分析研究综述
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BIGDATACONGRESS.2016.60
Huaming Chen, Jun Shen, Lei Wang, Jiangning Song
{"title":"Towards Data Analytics of Pathogen-Host Protein-Protein Interaction: A Survey","authors":"Huaming Chen, Jun Shen, Lei Wang, Jiangning Song","doi":"10.1109/BIGDATACONGRESS.2016.60","DOIUrl":"https://doi.org/10.1109/BIGDATACONGRESS.2016.60","url":null,"abstract":"\"Big Data\" is immersed in many disciplines, including computer vision, economics, online resources, bioinformatics and so on. Increasing researches are conducted on data mining and machine learning for uncovering and predicting related domain knowledge. Protein-protein interaction is one of the main areas in bioinformatics as it is the basis of the biological functions. However, most pathogen-host protein-protein interactions, which would be able to reveal much more infectious mechanisms between pathogen and host, are still up for further investigation. Considering a decent feature representation of pathogen-host protein-protein interactions (PHPPI), currently there is not a well structured database for research purposes, not even for infection mechanism studies for different species of pathogens. In this paper, we will survey the PHPPI researches and construct a public PHPPI dataset by ourselves for future research. It results in an utterly big and imbalanced data set associated with high dimension and large quantity. Several machine learning methodologies are also discussed in this paper to imply possible analytics solutions in near future. This paper contributes to a new, yet challenging, research area in applying data analytic technologies in bioinformatics, by learning and predicting pathogen-host protein-protein interactions.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Edge-Computing-Aware Deployment of Stream Processing Tasks Based on Topology-External Information: Model, Algorithms, and a Storm-Based Prototype 基于拓扑外部信息的流处理任务的边缘计算感知部署:模型、算法和基于风暴的原型
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.40
Apostolos Papageorgiou, Ehsan Poormohammady, Bin Cheng
{"title":"Edge-Computing-Aware Deployment of Stream Processing Tasks Based on Topology-External Information: Model, Algorithms, and a Storm-Based Prototype","authors":"Apostolos Papageorgiou, Ehsan Poormohammady, Bin Cheng","doi":"10.1109/BigDataCongress.2016.40","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.40","url":null,"abstract":"Stream Processing Frameworks (SPF, e.g., Apache Storm) are solutions that facilitate and manage the execution of processing topologies that consist of multiple parallelizable steps (or tasks) and involve continuous data exchange among these tasks. Stemming from the world of Cloud-centric Big Data processing, SPFs often fail to address certain requirements of Internet-of-Things systems. For example, existing deployment solutions ignore the fact that topology tasks can also be involved in other interactions and data-intensive communication flows, which are not taking place between the tasks, but between a task and another Internet-of-things entity, such as an actuator, a database, or a user. This paper describes SPF extensions for taking these interactions into account. The extensions are described both generically and as extensions of Apache Storm. In a simple evaluation upon a topology which involves topology-external interactions, we demonstrate how our solution can eliminate latency requirements violations and reduce Cloud-to-edge bandwidth consumption to 1/3 compared to Apache Storm.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129014091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Cloud Computing Intelligent Data-Driven Model: Connecting the Dots to Combat Global Terrorism 云计算智能数据驱动模型:连接点以打击全球恐怖主义
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.69
G. Goteng, Xueyu Tao
{"title":"Cloud Computing Intelligent Data-Driven Model: Connecting the Dots to Combat Global Terrorism","authors":"G. Goteng, Xueyu Tao","doi":"10.1109/BigDataCongress.2016.69","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.69","url":null,"abstract":"We used directed graph to come up with an interconnected network of terrorists' activities based on data obtained from the Global Terrorism Database (GTD) from 2005 to 2015. We developed an analytical model called CloudTerrorAlert (CTA) and implemented it within a cloud-based environment that analyzes GTD data to aid collaboration and decision making by counter-terrorist security agents around the world. Our CTA algorithm compares three sets of data for prediction - communication (emails and phone calls), transaction (money transfers and arms purchases), and transportation (movements across boundaries and countries) using a proposed probability threshold that is at least 0.3 to make a decision on whether or not a terror attack is about to occur. A prototype of the model which connects the distributed data on terrorists' activities as implemented proved that the system could have very significant impact on using cloud-based technologies in the fight against global terrorism.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117226877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From Big Data to Great Services 从大数据到伟大的服务
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.28
Jianwei Yin, Yan Tang, Wei Lo, Zhaohui Wu
{"title":"From Big Data to Great Services","authors":"Jianwei Yin, Yan Tang, Wei Lo, Zhaohui Wu","doi":"10.1109/BigDataCongress.2016.28","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.28","url":null,"abstract":"Big Data is increasingly adopted by a wide range of service industries to improve the quality and value of their services, e.g., inventory that matches well the supply and demand, and pricing that reflects well the market needs. Customers benefit from higher quality of service enabled by Big Data. Service providers get higher profits from more precise control of costs and accurate knowledge of customer needs. In this paper, we define the next generation high quality services as Great Services, characterized by 4P Quality-of-Service (QoS) dimensions: Panorama, Penetration, Prediction and Personalization, which go much further than current services. The transformation of Big Data into Great Services would be difficult and expensive without methodical techniques and software tools. We call the intermediate step Deep Knowledge, which is generated by Big Data (with the 4V challenges - Volume, Velocity, Variety, and Veracity) and used in the creation of Great Services. Deep Knowledge is distinguished from traditional Big Data by 4C properties (Complexity, Cross-domain, Customization, and Convergence). In order to achieve the 4P QoS dimensions of Great Services, we need Deep Knowledge with 4C properties. In this paper, we describe an informal characterization of Great Services with 4P QoS dimensions with examples, and outline the techniques and tools that facilitate the transformation of Big Data into Deep Knowledge with 4C properties, and then the use of Deep Knowledge in Great Services.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115508994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Marketing Campaigns in Twitter Using a Pattern Based Diffusion Policy 使用基于模式的扩散策略的Twitter营销活动
2016 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2016-06-01 DOI: 10.1109/BigDataCongress.2016.24
E. Kafeza, C. Makris, Pantelis Vikatos
{"title":"Marketing Campaigns in Twitter Using a Pattern Based Diffusion Policy","authors":"E. Kafeza, C. Makris, Pantelis Vikatos","doi":"10.1109/BigDataCongress.2016.24","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.24","url":null,"abstract":"In this paper we introduce a novel methodology to achieve information diffusion within a social graph that activates a realistic number of users. Our approach combines the predicted patterns of diffusion for each node with propagation heuristics in order to achieve an effective cover of the graph. The novelty of our methodology is based on the use of history information to predict users' diffusion patterns and on our proposed PBD heuristics for achieving a realistic information spread. Moreover, we use a methodology for calculating the actual diffusion of a message in a social media graph. To validate our approach we present a set of experimental results. Our methodology is useful to marketers who are interested to use social influence and run effective marketing campaigns.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115901240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信