{"title":"Urban Area Vehicle Number Estimation Based on RTMS Data","authors":"Yue Hu, Yuanchao Shu, Peng Cheng, Jiming Chen","doi":"10.1109/BigDataCongress.2016.59","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.59","url":null,"abstract":"Along with the increase of vehicle ownership, the traffic problem has a serious impact on people's daily life. Not only the traffic congestion, but also the parking problem troubles urban daily traveling. Therefore it is important to obtain the parking demand to help the government to make a rational decision on traffic planning and management. This paper focuses on estimating the vehicle number in a certain area (i.e., the spaces surrounded by the arterial roads) in each time slot to analyze the area parking demand, using RTMS (Remote Traffic Microwave Sensor) data. We first propose a basic method to calculate the AVN (Area Vehicle Number) based on the inflow and outflow traffic of the area. In order to correct the error caused by minor roads without RTMS data, we propose an advanced method to improve the estimation accuracy by exploiting the road traffic correlation from a network perspective. Comprehensive evaluation is conducted to verify our design based on large amount of RTMS data from the Hangzhou city during one month. The estimation results also demonstrate interesting human behaviors among various urban areas.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132356373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hooman Peiro Sajjad, A. H. Payberah, Fatemeh Rahimian, Vladimir Vlassov, Seif Haridi
{"title":"Boosting Vertex-Cut Partitioning for Streaming Graphs","authors":"Hooman Peiro Sajjad, A. H. Payberah, Fatemeh Rahimian, Vladimir Vlassov, Seif Haridi","doi":"10.1109/BigDataCongress.2016.10","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.10","url":null,"abstract":"While the algorithms for streaming graph partitioning are proved promising, they fall short of creating timely partitions when applied on large graphs. For example, it takes 415 seconds for a state-of-the-art partitioner to work on a social network graph with 117 millions edges. We introduce an efficient platform for boosting streaming graph partitioning algorithms. Our solution, called HoVerCut, is Horizontally and Vertically scalable. That is, it can run as a multi-threaded process on a single machine, or as a distributed partitioner across multiple machines. Our evaluations, on both real-world and synthetic graphs, show that HoVerCut speeds up the process significantly without degrading the quality of partitioning. For example, HoVerCut partitions the aforementioned social network graph with 117 millions edges in 11 seconds that is about 37 times faster.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122248189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research of Clouding Based Pulse Monitoring and Data Analysis Framework for Efficient Physical Training","authors":"Hongliang Yuan, Jun Wang, Jun Liu, Shiliang Li","doi":"10.1109/BigDataCongress.2016.70","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.70","url":null,"abstract":"For the problems that we can't monitor abnormal conditions of heart rate continuously, a clouding based pulse monitoring and data analysis framework has been proposed. Source of the framework is composed of multiple ZigBee based pulse monitoring sensors, customized gateways and back-end system. Individuals' pulse information are collected by sensors and passed to back-end clouding system to support big data analysis of the training conditions. To guarantee collecting efficient pulse signal, we have researched photo electricity based dynamic and continuous heart rate monitoring methods as well as comprehensive anti-jamming methods. Finally, by using according big data analysis methods we have built up the training model by the standards such as different age, different mood and so on. Results shows the system can be used to improve the physical training level, accumulate the training data of the individuals and support more efficient and scientific training plans.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125937723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruce Wallace, R. Goubran, F. Knoefel, S. Marshall, M. Porter, Andrew Smith
{"title":"Driver Unique Acceleration Behaviours and Stability over Two Years","authors":"Bruce Wallace, R. Goubran, F. Knoefel, S. Marshall, M. Porter, Andrew Smith","doi":"10.1109/BigDataCongress.2016.36","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.36","url":null,"abstract":"The identification of characteristic individual driving behaviours is an emerging challenge that occurs within longitudinal studies of drivers to distinguish different drivers of a shared vehicle. It also has application in the insurance industry where insurance risk and associated owner premium depends on the diversity or lack thereof of drivers for a vehicle such as a vehicle driven/never driven by secondary drivers that have higher risk driving behaviours. Lastly, emerging self driving vehicles could allow the owner to personalize the vehicle behaviour to drive more like them increasing owner acceptance of the technology. In this paper, a big data set of driving data for 14 drivers is analyzed - a single year of data includes over 250,000 km and almost 5000 hours of driving for the 14 drivers. Analytics methods are presented that identify acceleration events within the data for the drivers and it then proposes a two-phase relationship model for these events that is indicative of unique drivers' behaviour. The results show that the two-phase acceleration relationship for maximum and mean acceleration allows 84.6% and 80.2% of the 91 driver pairs that can be formed from the 14 drivers to be distinguished (p<;5%). The paper shows the stability of two-phase acceleration and deceleration relationships for the 14 drivers as the second year of events for each of the 14 drivers have a mean correlation with the first year relationships of 0.971 or higher.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122816514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast and Efficient Entity Resolution Approach for Preserving Privacy in Mobile Data","authors":"Ioannis Boutsis, V. Kalogeraki","doi":"10.1109/BigDataCongress.2016.29","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.29","url":null,"abstract":"With the advent of mobile networking and the widespread adoption of smartphone devices, a number of location-based services have emerged, where users actively participate by sharing and receiving mobility data. However, the collection and analysis of user mobility data, such as user location information and trajectory data, especially when exploited together with external sources, such as social networks that often provide rich and publicly available information, can reveal sensitive user information. This paper proposes an approach based on entity resolution which enables users to disclose their mobility information without compromising their privacy, even if these data are linked with external publicly available information. We present detailed experimental results using four real datasets to illustrate that our approach is practical, efficient and effectively preserves privacy by eliminating potential links among the data.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130297812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GS1 Connected Car Using EPCIS-ONS System","authors":"Bongjin Sohn, Sungpil Woo, Jiyong Han, Hyeeun Cho, Jaewook Byun, Daeyoung Kim","doi":"10.1109/BigDataCongress.2016.66","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.66","url":null,"abstract":"Recently, a number of car companies focus on collecting and analyzing data from connected car to provide accurate and variety of integrated information services to users. To provide integrated connected car services, however, there are still two main problems in existing systems. First, there are not well-defined standard car-related data models for data sharing which is basic prerequisite of integrated service. Second, each company stores the data to the closed server and blocks the access so that the data is available only inside the respective data silos. Because of these reasons, it is hard to combine various data to provide integrated information service. To solve the problems, we propose two solutions. 1) Define events of the car life cycle, and data models for data sharing. 2) Establish open system for data sharing and service registration based on GS1 standard. Upon proposed models and system, we have implemented an example service, traffic light control, as a third party service to show the feasibility of our system. Therefore, this research enables to provide open system to car companies and third party service providers to make services associated with the connected car.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132053983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Method to Price Your Information Asset in the Information Market","authors":"D. Rao, W. Ng","doi":"10.1109/BigDataCongress.2016.46","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.46","url":null,"abstract":"Big Data is not just a buzz-word anymore. It is an important asset for organizations who can profit from the use and analysis of this big data. This has given a major impetus to data brokers and marketing research firms to trawl and collate all this information from often unsuspecting users and sell this big data in neat packages to major organizations, all for a price. Everyone ends up making money off this big data except the users to whom this information actually belongs to originally. But to make users understand the importance of their information, we need to first make them aware of the value of their information. Once they realize this, they can then monetize, if they choose to on their information eliminating the presence of middle men like the data broker firms. Towards this, in our paper, we present our idea of valuing information based on Shannon's information value. In order to protect user privacy, we allow users to introduce distortion to their data which represents the 'risk' parameter from the buyer's point of view. Then we use the Sharpe ratio to compare the values of the information with and without the risk (in this case, it is the distortion introduced), thus allowing the buyer the flexibility to choose if he wants to invest in that information. In support of our idea, we also present our preliminary results detailing the working of the model and conclude with the potential of the idea of valuating information.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132714267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Incremental Graph Analysis","authors":"Upa Gupta, L. Fegaras","doi":"10.1109/BigDataCongress.2016.18","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.18","url":null,"abstract":"Distributed frameworks, such as MapReduce and Spark, have been developed by industry and research groups to analyze the vast amount of data that is being generated on a daily basis. Many graphs of interest, such as the Web graph and Social Networks, increase their size daily at an unprecedented scale and rate. To cope with this vast amount of data, researchers have been using distributed processing frameworks to analyze these graphs extensively. Most of these graph algorithms are iterative in nature. In our previous work, we introduced an efficient design pattern to handle a family of iterative graph algorithms in a distributed framework. Unfortunately, in most of these iterative algorithms, such as for Page-Rank, if the graph is modified with the addition or deletion of edges or vertices, the Page-Rank has to be recomputed from scratch. In this paper, we are introducing an improved design pattern for such algorithms to handle graph updates in an incremental fashion. Our method is to separate the graph topology from the graph analysis results. At each iteration step, each node participating in this graph analysis task, in addition to reading a single graph partition, it reads all the current analysis results from the distributed file system (DFS). These results are correlated with the local graph partition using a special merge-join and the new improved analysis results are calculated and stored to the DFS, one partition from each worker node. To handle continuous updates, an update function collects the changes to the graph and applies them to the graph partitions in a streaming fashion. Once the changes are made, the iterative algorithm is resumed to process the new updated data. Since a large part of the graph analysis task has already been completed on the existing data, the new updates require fewer iterations to compute the new graph analysis results as the iterative algorithm will converge faster.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132236136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genetic Information Privacy in the Age of Data-Driven Medicine","authors":"Jingquan Li","doi":"10.1109/BigDataCongress.2016.45","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.45","url":null,"abstract":"As the costs for genomic sequencing continue to decrease, genetic testing is increasingly being used to confirm or assist in detection and treatment of many diseases. Companies such as 23andMe and Navigenics offer genetic tests using genome-wide technology direct to consumers over the Internet. As sources of genetic information proliferate, issues of privacy protection are increasingly problematic in relation to the use and disclosure of genetic information. This paper aims to identify the most important privacy threats to genetic information, and explain how to use privacy techniques and policies to mitigate the threats. This paper first describes the problem of genetic information privacy in the age of data-driven medicine. It then identifies the most important threats to genetic information and presents a case study that demonstrates how these threats might be intrinsic to genetic testing entities. Since existing privacy protection approaches are inadequate to address the threats, we develop a comprehensive privacy and security framework that integrates policy considerations and innovative data anonymization technologies and sketch countermeasures which can be taken to protect genetic privacy. We conclude with a discussion of the implications of the study and directions of future research.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131587595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyong Yuan, Min Li, Sudeep Gaddam, Xiaolin Li, Yinan Zhao, Jingzhe Ma, J. Ge
{"title":"DeepSky: Identifying Absorption Bumps via Deep Learning","authors":"Xiaoyong Yuan, Min Li, Sudeep Gaddam, Xiaolin Li, Yinan Zhao, Jingzhe Ma, J. Ge","doi":"10.1109/BigDataCongress.2016.34","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2016.34","url":null,"abstract":"The pervasive interstellar grains provide significant insights to help us understand the formation and evolution of stars, planetary systems, and galaxies, and could potentially lead us to the secret of the origin of life. One of the most effective ways to analyze the dusts is via their interaction and interference on some background light. The observable extinction curves and spectral features carry the information about the size and composition of the dusts. Among the features, the broad 2175 Å absorption bump is one of the most significant spectroscopic interstellar extinction features. Traditionally, astronomers apply conventional statistical and signal processing techniques to detect the existence of absorption bumps. These approaches require labor-intensive preprocessing and the co-existence of some other reference features to alleviate the influence from the noises. Conventional approaches not only involve substantial labor cost in complicated workflows, but also demand well-trained expertise to make subtle and error-prone conditional decisions. In this paper, we propose to leverage deep learning to automate the detection workflow without minute feature engineering. We design and analyze deep convolutional neural networks for detecting absorption bumps. We further propose the framework of deep learning mechanisms and models (collectively called DeepSky) for scientific discovery in astronomy. The prototype of DeepSky demonstrates efficient and effective results using limited labeled data. With well-designed data augmentation, our trained model achieved about 99% accuracy in prediction using the real-world data.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123178513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}