{"title":"Topic-aware Neural Linguistic Steganography Based on Knowledge Graphs","authors":"Yamin Li, Jun Zhang, Zhongliang Yang, Ru Zhang","doi":"10.1145/3418598","DOIUrl":"https://doi.org/10.1145/3418598","url":null,"abstract":"The core challenge of steganography is always how to improve the hidden capacity and the concealment. Most current generation-based linguistic steganography methods only consider the probability distribution between text characters, and the emotion and topic of the generated steganographic text are uncontrollable. Especially for long texts, generating several sentences related to a topic and displaying overall coherence and discourse-relatedness can ensure better concealment. In this article, we address the problem of generating coherent multi-sentence texts for better concealment, and a topic-aware neural linguistic steganography method that can generate a steganographic paragraph with a specific topic is present. We achieve a topic-controllable steganographic long text generation by encoding the related entities and their relationships from Knowledge Graphs. Experimental results illustrate that the proposed method can guarantee both the quality of the generated steganographic text and its relevance to a specific topic. The proposed model can be widely used in covert communication, privacy protection, and many other areas of information security.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":" ","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3418598","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46673255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongchao Gao, Yujia Li, Jiao Dai, Xi Wang, Jizhong Han, Ruixuan Li
{"title":"Multi-granularity Deep Local Representations for Irregular Scene Text Recognition","authors":"Hongchao Gao, Yujia Li, Jiao Dai, Xi Wang, Jizhong Han, Ruixuan Li","doi":"10.1145/3446971","DOIUrl":"https://doi.org/10.1145/3446971","url":null,"abstract":"Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"2 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2021-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3446971","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48504907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue on Urban Computing and Smart Cities","authors":"Yanhua Li, Jie Bao, Zhi-Li Zhang, S. Benjaafar","doi":"10.1145/3441679","DOIUrl":"https://doi.org/10.1145/3441679","url":null,"abstract":"In recent years, the urban networks infrastructure has undergone a fast expansion, which increasingly generates a large amount of data, such as human mobility data, human transactions data, regional weather and air quality data, and social connection data. These heterogeneous data sources convey rich information about a city and can enable intelligent solutions to solve various urban challenges, such as urban facility planning, air pollution, and so on. While on the one hand, these big urban data can help us to tackle big urban challenges, on the other hand, it is challenging to manage, analyze, and make sense of the big urban data. The Urban Data Sciences special issue aims to publish work on multidisciplinary research across the areas of computer science, electrical engineering, environmental science, urban planning and development, social sciences, operation research, and industrial engineering on technologies, case studies, novel approaches, and visionary ideas related to data science solutions and data-driven applications to address real-world challenges for enabling smart cities. The objective of this special issue is to publish leading work in urban data science and present future challenges in this area. This special issue received 22 high-quality submissions, and 4 of them were accepted. The topics of the accepted articles are briefly introduced below. In the article titled “Transfer Urban Human Mobility via POI Embedding over Multiple Cities,” the authors proposed an embedding mechanism to fuse human mobility data and city POI data to improve the prediction performance with limited training data. Moreover, a deep learning architecture is proposed to combining CNN with LSTM to simultaneously capture both the spatiotemporal and geographical information from the enriched trajectories. The proposed method is evaluated with four citywide datasets. The article titled “Empty Vehicle Redistribution with Time Windows in Autonomous Taxi Systems” addresses the topic of autonomous vehicle reservation strategies. The proposed approach is dynamic management of the vehicles using an Index-Based Redistribution Time Limited algorithm. The proposed algorithm improves existing algorithms by incorporating expected passenger arrivals and predicted waiting times limitations. In the article titled “Scalable Belief Updating for Urban Air Quality Modeling and Prediction,” the authors propose a scalable belief updating framework to predict future air quality and a nonparameter approach for statistical model learning. The proposed prediction model enables iterative updates for large-scale data. The authors analyzed the distribution of various pollutants and the influences of meteorology. Moreover, in the last article of the special issue, entitled “WattScale: A Data-driven Approach for Energy Efficiency Analytics of Buildings at Scale,” the authors presented a datadriven approach to identify the least energy-efficient buildings from a large population of building","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"51 1","pages":"1 - 2"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80935421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srinivasan Iyengar, Stephen Lee, David E. Irwin, P. Shenoy, B. Weil
{"title":"WattScale","authors":"Srinivasan Iyengar, Stephen Lee, David E. Irwin, P. Shenoy, B. Weil","doi":"10.1145/3406961","DOIUrl":"https://doi.org/10.1145/3406961","url":null,"abstract":"Buildings consume over 40% of the total energy in modern societies, and improving their energy efficiency can significantly reduce our energy footprint. In this article, we present WattScale, a data-driven approach to identify the least energy-efficient buildings from a large population of buildings in a city or a region. Unlike previous methods such as least-squares that use point estimates, WattScale uses Bayesian inference to capture the stochasticity in the daily energy usage by estimating the distribution of parameters that affect a building. Further, it compares them with similar homes in a given population. WattScale also incorporates a fault detection algorithm to identify the underlying causes of energy inefficiency. We validate our approach using ground truth data from different geographical locations, which showcases its applicability in various settings. WattScale has two execution modes—(i) individual and (ii) region-based, which we highlight using two case studies. For the individual execution mode, we present results from a city containing >10,000 buildings and show that more than half of the buildings are inefficient in one way or another indicating a significant potential from energy improvement measures. Additionally, we provide probable cause of inefficiency and find that 41%, 23.73%, and 0.51% homes have poor building envelope, heating, and cooling system faults, respectively. For the region-based execution mode, we show that WattScale can be extended to millions of homes in the U.S. due to the recent availability of representative energy datasets.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"21 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86047985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Belief Updating for Urban Air Quality Modeling and Prediction","authors":"Xiuming Liu, E. Ngai, D. Zachariah","doi":"10.1145/3402903","DOIUrl":"https://doi.org/10.1145/3402903","url":null,"abstract":"Air pollution is one of the major concerns in global urbanization. Data science can help to understand the dynamics of air pollution and build reliable statistical models to forecast air pollution levels. To achieve these goals, one needs to learn the statistical models which can capture the dynamics from the historical data and predict air pollution in the future. Furthermore, the large size and heterogeneity of today’s big urban data pose significant challenges on the scalability and flexibility of the statistical models. In this work, we present a scalable belief updating framework that is able to produce reliable predictions, using over millions of historical hourly air pollutant and meteorology records. We also present a non-parametric approach to learn the statistical model which reveals interesting periodical dynamics and correlations of the dataset. Based on the scalable belief update framework and the non-parametric model learning approach, we propose an iterative update algorithm to accelerate Gaussian process, which is notorious for its prohibitive computation with large input data. Finally, we demonstrate how to integrate information from heterogeneous data by regarding the beliefs produced by other models as the informative prior. Numerical examples and experimental results are presented to validate the proposed method.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"116 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2021-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76656827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renhe Jiang, Xuan Song, Z. Fan, Tianqi Xia, Zhaonan Wang, Quanjun Chen, Z. Cai, R. Shibasaki
{"title":"Transfer Urban Human Mobility via POI Embedding over Multiple Cities","authors":"Renhe Jiang, Xuan Song, Z. Fan, Tianqi Xia, Zhaonan Wang, Quanjun Chen, Z. Cai, R. Shibasaki","doi":"10.1145/3416914","DOIUrl":"https://doi.org/10.1145/3416914","url":null,"abstract":"Rapidly developing location acquisition technologies provide a powerful tool for understanding and predicting human mobility in cities, which is very significant for urban planning, traffic regulation, and emergency management. However, with the existing methodologies, it is still difficult to accurately predict millions of peoples’ mobility in a large urban area such as Tokyo, Shanghai, and Hong Kong, especially when collected data used for model training are often limited to a small portion of the total population. Obviously, human activities in city are closely linked with point-of-interest (POI) information, which can reflect the semantic meaning of human mobility. This motivates us to fuse human mobility data and city POI data to improve the prediction performance with limited training data, but current fusion technologies can hardly handle these two heterogeneous data. Therefore, we propose a unique POI-embedding mechanism, that aggregates the regional POIs by categories to generate an artificial POI-image for each urban grid and enriches each trajectory snippet to a four-dimensional tensor in an analogous manner to a short video. Then, we design a deep learning architecture combining CNN with LSTM to simultaneously capture both the spatiotemporal and geographical information from the enriched trajectories. Furthermore, transfer learning is employed to transfer mobility knowledge from one city to another, so that we can fully utilize other cities’ data to train a stronger model for the target city with only limited data available. Finally, we achieve satisfactory performance of human mobility prediction at the citywide level using a limited amount of trajectories as training data, which has been validated over five urban areas of different types and scales.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"32 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87831178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Babicheva, Matej Cebecauer, D. Barth, W. Burghout, L. Kloul
{"title":"Empty Vehicle Redistribution with Time Windows in Autonomous Taxi Systems","authors":"T. Babicheva, Matej Cebecauer, D. Barth, W. Burghout, L. Kloul","doi":"10.1145/3416915","DOIUrl":"https://doi.org/10.1145/3416915","url":null,"abstract":"In this article, we investigate empty vehicle redistribution algorithms with time windows for personal rapid transit or autonomous station-based taxi services, from a passenger service perspective. We present an Index Based Redistribution Time Limited algorithm that improves upon existing algorithms by incorporating expected passenger arrivals and predicted waiting times limitations. We evaluate 17 variations of algorithms on a test case in Stockholm, Sweden. The results show that the combination of Send The Nearest and Index Based Redistribution Time Limited algorithms provides promising results for both Poisson arrivals and real demand, outperforming the other tested methods, in terms of passenger waiting time and number of passengers not served within their time windows.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"80 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2021-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88167152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristian Molinaro, Chiara Pulice, Anja Subasic, Abigail Bartolome, V. S. Subrahmanian
{"title":"STAR: Summarizing Timed Association Rules","authors":"Cristian Molinaro, Chiara Pulice, Anja Subasic, Abigail Bartolome, V. S. Subrahmanian","doi":"10.1145/3419107","DOIUrl":"https://doi.org/10.1145/3419107","url":null,"abstract":"Timed association rules (TARs) generalize classical association rules (ARs) so that we can express temporal dependencies of the form “If X is true at time t , then Y will likely be true at time (t + τ ).” As with ARs, solving the TAR mining problem can generate huge numbers of rules. We show that methods to summarize ARs cannot work directly with TARs, and we develop two notions—strong and weak summaries—to summarize a set of TARs. We show that the problems of finding strong/weak summaries are NP-hard, and we provide polynomial-time approximation algorithms. We show experimentally that the coverage provided by our summarization methods is very high. Both technical measures based on coverage and human experiments on six World Bank datasets using 100 subjects from Mechanical Turk and a separate experiment with terrorism experts on a terrorism dataset show that while both summarization methods perform well, weak summaries are preferred, despite their taking more time to compute than strong summaries.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"23 1","pages":"6:1-6:36"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73262083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Zoppi, A. Ceccarelli, Tommaso Capecchi, A. Bondavalli
{"title":"Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape","authors":"T. Zoppi, A. Ceccarelli, Tommaso Capecchi, A. Bondavalli","doi":"10.1145/3441140","DOIUrl":"https://doi.org/10.1145/3441140","url":null,"abstract":"Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise 17 unsupervised anomaly detection algorithms on 11 attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines, and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their low computational complexity. Further, we detail how attacks with unstable, distributed, or non-repeatable behavior such as Fuzzing, Worms, and Botnets are more difficult to detect. Ultimately, we digress on capabilities of algorithms in detecting anomalies generated by a wide pool of unknown attacks, showing that achieved metric scores do not vary with respect to identifying single attacks.","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":" ","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3441140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42462511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the Special Issue on Retrieving and Learning from Internet of Things Data","authors":"Haibo Hu, Rik Sarkar, Zhengzhang Chen","doi":"10.1145/3426368","DOIUrl":"https://doi.org/10.1145/3426368","url":null,"abstract":"","PeriodicalId":93404,"journal":{"name":"ACM/IMS transactions on data science","volume":"26 1","pages":"1 - 1"},"PeriodicalIF":0.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80786481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}