{"title":"coSense","authors":"Stephan Schmeißer, Gregor Schiele","doi":"10.1145/3395233","DOIUrl":"https://doi.org/10.1145/3395233","url":null,"abstract":"We present coSense—the collaborative, fault-tolerant, and adaptive sensing middleware for the Internet-of-Things (IoT). By actively harnessing the greatest asset of the IoT, the sheer number of devices, coSense is able to provide easy data acquisition with quality-of-service-based data cleaning by combining unsupervised learning and information fusion. It can also greatly improve sensor accuracy and fault tolerance to produce measurements specifically tailored for modern data-driven IoT empowered applications. In this article, we focus on the general concepts behind coSense and evaluate the accuracy gain based on a real-world dataset.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"125 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76111713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liming Fang, Hongwei Zhu, Boqing Lv, Zhe Liu, W. Meng, Yu Yu, S. Ji, Zehong Cao
{"title":"HandiText","authors":"Liming Fang, Hongwei Zhu, Boqing Lv, Zhe Liu, W. Meng, Yu Yu, S. Ji, Zehong Cao","doi":"10.1145/3385189","DOIUrl":"https://doi.org/10.1145/3385189","url":null,"abstract":"The Internet of Things (IoT) is a new manifestation of data science. To ensure the credibility of data about IoT devices, authentication has gradually become an important research topic in the IoT ecosystem. However, traditional graphical passwords and text passwords can cause user’s serious memory burdens. Therefore, a convenient method for determining user identity is needed. In this article, we propose a handwriting recognition authentication scheme named HandiText based on behavior and biometrics features. When people write a word by hand, HandiText captures their static biological features and dynamic behavior features during the writing process (writing speed, pressure, etc.). The features are related to habits, which make it difficult for attackers to imitate. We also carry out algorithms comparisons and experiments evaluation to prove the reliability of our scheme. The experiment results show that the Long Short-Term Memory has the best classification accuracy, reaching 99% while keeping relatively low false-positive rate and false-negative rate. We also test other datasets, the average accuracy of HandiText reach 98%, with strong generalization ability. Besides, the 324 users we investigated indicated that they are willing to use this scheme on IoT devices.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"38 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80780363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping Road Safety Features from Streetview Imagery","authors":"Arpan Man Sainju, Zhe Jiang","doi":"10.1145/3362069","DOIUrl":"https://doi.org/10.1145/3362069","url":null,"abstract":"Each year, an average of around 6 million car accidents occur in the United States. Road safety features (e.g., concrete barriers, metal crash barriers, rumble strips) play an important role in preventing or mitigating vehicle crashes. Accurate maps of road safety features is an important component of safety management systems for federal or state transportation agencies, helping traffic engineers identify locations to invest in safety infrastructure. In current practice, mapping road safety features is largely done manually (e.g., observations on the road or visual interpretation of streetview imagery), which is both expensive and time consuming. In this article, we propose a deep learning approach to automatically map road safety features from streetview imagery. Unlike existing convolutional neural networks that classify each image individually, we propose to further add a recurrent neural network (long short-term memory) to capture geographic context of images (spatial autocorrelation effect along linear road network paths). Evaluations on real-world streetview imagery show that our proposed model outperforms several baseline methods.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"117 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90703243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, P. Nurmi, S. Tarkoma, H. Vo
{"title":"GeoMatch","authors":"Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, P. Nurmi, S. Tarkoma, H. Vo","doi":"10.1145/3402904","DOIUrl":"https://doi.org/10.1145/3402904","url":null,"abstract":"We develop GeoMatch as a novel, scalable, and efficient big-data pipeline for large-scale map matching on Apache Spark. GeoMatch improves existing spatial big-data solutions by utilizing a novel spatial partitioning scheme inspired by Hilbert space-filling curves. Thanks to its partitioning scheme, GeoMatch can effectively balance operations across different processing units and achieve significant performance gains. GeoMatch also incorporates a dynamically adjustable error-correction technique that provides robustness against positioning errors. We demonstrate the effectiveness of GeoMatch through rigorous and extensive empirical benchmarks that consider large-scale urban spatial datasets ranging from 166,253 to 3.78B location measurements. We separately assess execution performance and accuracy of map matching and develop a benchmark framework for evaluating large-scale map matching. Results of our evaluation show up to 27.25-fold performance improvements compared to previous works while achieving better processing accuracy than current solutions. We also showcase the practical potential of GeoMatch with two urban management applications. GeoMatch and our benchmark framework are open-source.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"15 1","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75206470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue on Urban Computing and Smart Cities","authors":"Yanhua Li, Jie Bao, Zhi-Li Zhang, S. Benjaafar","doi":"10.1145/3412392","DOIUrl":"https://doi.org/10.1145/3412392","url":null,"abstract":"In recent years, the urban networks infrastructure has undergone a fast expansion, which increasingly generates a large amount of data, such as human mobility data, human transactions data, regional weather and air quality data, and social connection data. These heterogeneous data sources convey rich information about a city and can enable intelligent solutions to solve various urban challenges, such as urban facility planning, air pollution, and so on. While, on one hand, these big urban data can help us to tackle big urban challenges, on the other hand, it is challenging how to manage, analyze, and make sense of the big urban data. The Urban Data Sciences special issue aims to publish work on multidisciplinary research across the areas of computer science, electrical engineering, environmental science, urban planning and development, social sciences, operation research, and industrial engineering on technologies, case studies, novel approaches, and visionary ideas related to data science solutions and data-driven applications to address real-world challenges for enabling smart cities. The objective of this special issue is to publish leading work in urban data science and present future challenges in this area. This special issue received 22 high-quality submissions, and 9 of them were accepted. As a result, the acceptance ratio is 40%. The topics of the accepted articles are briefly introduced below. The article titled “Mapping Road Safety Features from Streetview Imagery: A Deep Learning Approach” focuses on the problem of road safety feature mapping. The authors utilize Google Streetview imagery as the data source, using CNN for extracting semantic features from individual images and LSTM for modeling linear spatial autocorrelation effect between those images along a road network path. The authors validate the proposed framework on the Streetview imagery dataset in Alabama, which outperforms various baselines. In the article titled “User and Entity Behavior Analysis under Urban Big Data,” the authors proposed a malicious behavior detection mechanism, as well as a prediction method, based on multi-dimensions historical data and the deep learning approaches. The article titled “A Unified Framework for Robust and Efficient Hotspot Detection in Smart Cities” presents a unified framework for spatial hotspot detection that integrates a nondeterministic normalization based scan statistic and the likelihood ratio based framework. The proposed approach is capable of addressing the two limitations of traditional spatial scan statistics– based approaches, including the effect of spatial non-determinism and robustness against false positives. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed approach. The article titled “Task Allocation in Hybrid Big Data Analytics for Urban IoT Applications” investigates the task allocation problem for the Internet-of-Things (IoT) environment related to transportation b","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"329 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75473648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovery of Spatio-Temporal Patterns in Multivariate Spatial Time Series","authors":"Gene P. K. Wu, Keith C. C. Chan","doi":"10.1145/3374748","DOIUrl":"https://doi.org/10.1145/3374748","url":null,"abstract":"With the advancement of the computing technology and its wide range of applications, collecting large sets of multivariate time series in multiple geographical locations introduces a problem of identifying interesting spatio-temporal patterns. We consider a new spatial structure of the data in the pattern discovery process due to the dependent nature of the data. This article presents an information-theoretic approach to detect the temporal patterns from the multivariate time series in multiple locations. Based on their occurrences of discovered temporal patterns, we propose a method to identify interesting spatio-temporal patterns by a statistical significance test. Furthermore, the identified spatio-temporal patterns can be used for clustering and classification. For evaluating the performance, a simulated dataset is tested to validate the quality of the identified patterns and compare with other approaches. The result indicates the approach can effectively identify useful patterns to characterize the dataset for further analysis in achieving good clustering quality. Furthermore, experiments on real-world datasets and case studies have been conducted to illustrate the applicability and the practicability of the proposed approach.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"80 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86498568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Steadman, N. Griffiths, S. Jarvis, M. Bell, Shaun Helman, Caroline Wallbank
{"title":"kD-STR: A Method for Spatio-Temporal Data Reduction and Modelling","authors":"L. Steadman, N. Griffiths, S. Jarvis, M. Bell, Shaun Helman, Caroline Wallbank","doi":"10.1145/3439334","DOIUrl":"https://doi.org/10.1145/3439334","url":null,"abstract":"Analysing and learning from spatio-temporal datasets is an important process in many domains, including transportation, healthcare and meteorology. In particular, data collected by sensors in the environment allows us to understand and model the processes acting within the environment. Recently, the volume of spatio-temporal data collected has increased significantly, presenting several challenges for data scientists. Methods are therefore needed to reduce the quantity of data that needs to be processed in order to analyse and learn from spatio-temporal datasets. In this article, we present the -Dimensional Spatio-Temporal Reduction method (D-STR) for reducing the quantity of data used to store a dataset whilst enabling multiple types of analysis on the reduced dataset. D-STR uses hierarchical partitioning to find spatio-temporal regions of similar instances, and models the instances within each region to summarise the dataset. We demonstrate the generality of D-STR with three datasets exhibiting different spatio-temporal characteristics and present results for a range of data modelling techniques. Finally, we compare D-STR with other techniques for reducing the volume of spatio-temporal data. Our results demonstrate that D-STR is effective in reducing spatio-temporal data and generalises to datasets that exhibit different properties.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 31"},"PeriodicalIF":0.0,"publicationDate":"2020-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3439334","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42708027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vishal Chakraborty, Théo Delemazure, B. Kimelfeld, Phokion G. Kolaitis, Kunal Relia, Julia Stoyanovich
{"title":"Algorithmic Techniques for Necessary and Possible Winners","authors":"Vishal Chakraborty, Théo Delemazure, B. Kimelfeld, Phokion G. Kolaitis, Kunal Relia, Julia Stoyanovich","doi":"10.1145/3458472","DOIUrl":"https://doi.org/10.1145/3458472","url":null,"abstract":"We investigate the practical aspects of computing the necessary and possible winners in elections over incomplete voter preferences. In the case of the necessary winners, we show how to implement and accelerate the polynomial-time algorithm of Xia and Conitzer. In the case of the possible winners, where the problem is NP-hard, we give a natural reduction to Integer Linear Programming (ILP) for all positional scoring rules and implement it in a leading commercial optimization solver. Further, we devise optimization techniques to minimize the number of ILP executions and, oftentimes, avoid them altogether. We conduct a thorough experimental study that includes the construction of a rich benchmark of election data based on real and synthetic data. Our findings suggest that, the worst-case intractability of the possible winners notwithstanding, the algorithmic techniques presented here scale well and can be used to compute the possible winners in realistic scenarios.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2020-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3458472","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49346304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TRIPDECODER: Study Travel Time Attributes and Route Preferences of Metro Systems from Smart Card Data","authors":"Xiancai Tian, Baihua Zheng, Yazhe Wang, Hsiao-Ting Huang, Chih-Chieh Hung","doi":"10.1145/3430768","DOIUrl":"https://doi.org/10.1145/3430768","url":null,"abstract":"In this article, we target at recovering the exact routes taken by commuters inside a metro system that are not captured by an Automated Fare Collection (AFC) system and hence remain unknown. We strategically propose two inference tasks to handle the recovering, one to infer the travel time of each travel link that contributes to the total duration of any trip inside a metro network and the other to infer the route preferences based on historical trip records and the travel time of each travel link inferred in the previous inference task. As these two inference tasks have interrelationship, most of existing works perform these two tasks simultaneously. However, our solution TripDecoder adopts a totally different approach. TripDecoder fully utilizes the fact that there are some trips inside a metro system with only one practical route available. It strategically decouples these two inference tasks by only taking those trip records with only one practical route as the input for the first inference task of travel time and feeding the inferred travel time to the second inference task as an additional input, which not only improves the accuracy but also effectively reduces the complexity of both inference tasks. Two case studies have been performed based on the city-scale real trip records captured by the AFC systems in Singapore and Taipei to compare the accuracy and efficiency of TripDecoder and its competitors. As expected, TripDecoder has achieved the best accuracy in both datasets, and it also demonstrates its superior efficiency and scalability.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3430768","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45298469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}