{"title":"Practical Fully-Decentralized Secure Aggregation for Personal Data Management Systems","authors":"Julien Mirval, Luc Bouganim, I. S. Popa","doi":"10.1145/3468791.3468821","DOIUrl":"https://doi.org/10.1145/3468791.3468821","url":null,"abstract":"Personal Data Management Systems (PDMS) are flourishing, boosted by legal and technical means like smart disclosure, data portability and data altruism. A PDMS allows its owner to easily collect, store and manage data, directly generated by her devices, or resulting from her interactions with companies or administrations. PDMSs unlock innovative usages by crossing multiple data sources from one or many users, thus requiring aggregation primitives. Indeed, aggregation primitives are essential to compute statistics on user data, but are also a fundamental building block for machine learning algorithms. This paper proposes a protocol allowing for secure aggregation in a massively distributed PDMS environment, which adapts to selective participation and PDMSs characteristics, and is reliable with respect to failures, with no compromise on accuracy. Preliminary experiments show the effectiveness of our protocol which can adapt to several contexts with varying PDMSs characteristics in terms of communication speed or CPU resources and can adjust the aggregation strategy to the estimated selective participation.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126486671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Gaussian Process Model Inference Classification for Time Series Data","authors":"Fabian Berns, Joschka Hannes Strueber, C. Beecks","doi":"10.1145/3468791.3468839","DOIUrl":"https://doi.org/10.1145/3468791.3468839","url":null,"abstract":"One of the prominent types of time series analytics is classification, which entails identifying expressive class-wise features for determining class labels of time series data. In this paper, we propose a novel approach for time series classification called Local Gaussian Process Model Inference Classification (LOGIC). Our idea consists in (i) approximating the latent, class-wise characteristics of given time series data by means of Gaussian processes and (ii) aggregating these characteristics into a feature representation to (iii) provide a model-agnostic interface for state-of-the-art feature classification mechanisms. By making use of a fully-connected neural network as classification model, we show that the LOGIC model is able to compete with state-of-the-art approaches.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132052469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UNSUPERVISED ANOMALY DETECTION FOR TIME SERIES WITH OUTLIER EXPOSURE","authors":"Jiaming Feng, Zheng Huang, Jie Guo, Weidong Qiu","doi":"10.1145/3468791.3468793","DOIUrl":"https://doi.org/10.1145/3468791.3468793","url":null,"abstract":"It is of great practical significance to accurately model and analyze abnormal events in time series. For example, the identification of anomaly patterns on infrastructure sensor curves helps locate equipment failures. In this paper, we propose an unsupervised anomaly detection approach for time series, which can comprehensively consider both point anomalies and subsequence anomalies. We innovatively introduce RNN into the architecture of Adversarial Autoencoder to better analyze anomaly events based on overall relationship of time series. In addition, we innovatively apply the Outlier Exposure technique for the performance optimization of anomaly detector. Meanwhile, a WGAN-based method is utilized to generate anomaly datasets through normal distribution learning. Finally, we apply the proposed method for fraud detection on a financial statement dataset and intrusion detection on a network traffic dataset. Experimental results demonstrates that our model can comprehensively consider different anomaly types in time series, and achieve promising detection performance overall. In the experiment of fraud detection, the LSTM integrated AAE model achieves an F1 score of 0.810, while the Outlier Exposure enhanced model achieves an F1 score of 0.894. This indicates that our method can improve the performance of current audit systems and facilitate discovering malicious behaviors.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131094464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Depth-First Traversal by Graph Ordering","authors":"Qiuyi Lyu, M. Sha, Bin Gong, Kuangda Lyu","doi":"10.1145/3468791.3468796","DOIUrl":"https://doi.org/10.1145/3468791.3468796","url":null,"abstract":"Cache efficiency is an important factor in the performance of graph processing due to the irregular memory access patterns caused by the sparse nature of graphs. To increase the cache hit rate, prior studies proposed a variety of preprocessing approaches based on the reordering, which permutes the vertexes’ labels to improve the locality of graph structures. However, the locality enhancement of existing reordering approaches does not bring much performance benefit in depth-first traversal, which is widely adopted in a majority of graph processing applications. Furthermore, the state-of-the-art reordering approach suffers from an obvious overhead on preprocessing which will greatly limit the application of their approach. In this paper, we propose SeqDFS, a depth-first graph traversal method that optimizes the cache efficiency by adjusting the order of vertexes visited and can be further extended to dynamic scenarios. We conduct extensive experiments on 16 real-world datasets and 3 representative depth-first graph applications, of which the results show that our proposal achieves a significant speed-up on both directed and undirected graphs.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design","authors":"C. Jermaine","doi":"10.1145/3468791.3472262","DOIUrl":"https://doi.org/10.1145/3468791.3472262","url":null,"abstract":"ACM Reference Format: Chris Jermaine. 2021. The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design. In 33rd International Conference on Scientific and Statistical Database Management, July 06–07, 2021, Tampa, FL, USA. ACM, New York, NY, USA, 1 page. https://doi.org/10.1145/3468791.3472262 Systems for machine learning such as TensorFlow and PyTorch have greatly increased the complexity of the models that can be prototyped, tested, and moved into production, as well as reducing the time and effort required to do this. However, the systems have significant limitations. In these systems, a matrix multiplication (or a 2-D convolution, or any of the operations offered by the system) is a black-box operation that must actually be executed somewhere. As such, if there are multiple GPUs available to execute the multiplication the system cannot “figure out” how to automatically distribute the multiplication over them. It has to run an available matrix multiply somewhere, on some hardware. If there is one GPU available but the inputs are too large to fit in the GPU RAM, the system cannot automatically decompose the operation to perform the computation in stages, moving parts of the matrices on and off of the GPU as needed, to stay within the available memory budget. In this talk, I will argue that relations make a compelling implementation abstraction for building ML systems. Modern ML computations often manipuate matrices and tensors. A tensor can be decomposed into a binary relation between (key, payload) pairs, where key identifies the sub-tensor stored in payload (payload could be a scalar value, but more likely, it is a multidimensional array). Such a simple binary relation allows many (or perhaps all) common ML computations to be expressed relationally. For example, consider two, 2× 104 by 2× 104 matrices, decomposed into relations having 400 tuples each:","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frequent Itemsets Mining with a Guaranteed Local Differential Privacy in Small Datasets","authors":"Sharmin Afrose, T. Hashem, Mohammed Eunus Ali","doi":"10.1145/3468791.3468807","DOIUrl":"https://doi.org/10.1145/3468791.3468807","url":null,"abstract":"In this paper, we propose an iterative approach to estimate the frequent itemsets with high accuracy while satisfying the local differential privacy (LDP). The key component behind the improved accuracy of the estimated frequent itemsets by our approach is our novel two-level randomization technique for guaranteeing the LDP. Our randomization technique exploits the correlation of the presence of items in a user’s itemset, which has not been considered before. We present a mathematical proof that shows that our approach satisfies the LDP constraint. Extensive experiments are performed to validate the effectiveness and efficiency of our proposed algorithms using real datasets.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126901608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rahaman, Wei Shao, F. Salim, A. Turky, A. Song, Jeffrey Chan, Junliang Jiang, D. Bradbrook
{"title":"MoParkeR : Multi-objective Parking Recommendation","authors":"M. Rahaman, Wei Shao, F. Salim, A. Turky, A. Song, Jeffrey Chan, Junliang Jiang, D. Bradbrook","doi":"10.1145/3468791.3468810","DOIUrl":"https://doi.org/10.1145/3468791.3468810","url":null,"abstract":"Existing parking recommendation solutions mainly focus on finding and suggesting parking spaces based on the unoccupied options only. However, there are other factors associated with parking spaces that can influence someone’s choice of parking such as fare, parking rule, walking distance to destination, travel time, likelihood to be unoccupied at a given time. More importantly, these factors may change over time and conflict with each other which makes the recommendations produced by current parking recommender systems ineffective. In this paper, we propose a novel problem called multi-objective parking recommendation. We present a solution by designing a multi-objective parking recommendation engine called MoParkeR that considers various conflicting factors together. Specifically, we utilise a non-dominated sorting technique to calculate a set of Pareto-optimal solutions, consisting of recommended trade-off parking spots. We conduct extensive experiments using two real-world datasets to show the applicability of our multi-objective recommendation methodology.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123726914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanchuan Chang, Jianzhong Qi, E. Tanin, Xingjun Ma, H. Samet
{"title":"Sub-trajectory Similarity Join with Obfuscation","authors":"Yanchuan Chang, Jianzhong Qi, E. Tanin, Xingjun Ma, H. Samet","doi":"10.1145/3468791.3468822","DOIUrl":"https://doi.org/10.1145/3468791.3468822","url":null,"abstract":"User trajectory data is becoming increasingly accessible due to the prevalence of GPS-equipped devices such as smartphones. Many existing studies focus on querying trajectories that are similar to each other in their entirety. We observe that trajectories partially similar to each other contain useful information about users’ travel patterns which should not be ignored. Such partially similar trajectories are critical in applications such as epidemic contact tracing. We thus propose to query trajectories that are within a given distance range from each other for a given period of time. We formulate this problem as a sub-trajectory similarity join query named as the STS-Join. We further propose a distributed index structure and a query algorithm for STS-Join, where users retain their raw location data and only send obfuscated trajectories to a server for query processing. This helps preserve user location privacy which is vital when dealing with such data. Theoretical analysis and experiments on real data confirm the effectiveness and the efficiency of our proposed index structure and query algorithm.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"91 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129983071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic View Selection in Graph Databases","authors":"Chao Zhang, Jiaheng Lu, Qingsong Guo, Xinyong Zhang, Xiaochun Han, Minqi Zhou","doi":"10.1145/3468791.3468794","DOIUrl":"https://doi.org/10.1145/3468791.3468794","url":null,"abstract":"Recently, several works have studied the problem of view selection in graph databases. However, existing methods cannot fully exploit the graph properties of views, e.g., supergraph views and common subgraph views, which leads to a low view utility and duplicate view content. To address the problem, we propose an extended graph view that persists all the edge-induced subgraphs to answer the subgraph and supergraph queries simultaneously. Furthermore, we present the graph gene algorithm (GGA), which relies on a set of view transformations to reduce the view space and optimize the view benefit. Extensive experiments on real-life and synthetic datasets demonstrated GGA outperformed other selection methods in both effectiveness and efficiency.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114554656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thanasis Vergoulis, Konstantinos Zagganas, Loukas Kavouras, M. Reczko, S. Sartzetakis, Theodore Dalamagas
{"title":"SCHeMa: Scheduling Scientific Containers on a Cluster of Heterogeneous Machines","authors":"Thanasis Vergoulis, Konstantinos Zagganas, Loukas Kavouras, M. Reczko, S. Sartzetakis, Theodore Dalamagas","doi":"10.1145/3468791.3468813","DOIUrl":"https://doi.org/10.1145/3468791.3468813","url":null,"abstract":"In the era of data-driven science, conducting computational experiments that involve analysing large datasets using heterogeneous computational clusters, is part of the everyday routine for many scientists. Moreover, to ensure the credibility of their results, it is very important for these analyses to be easily reproducible by other researchers. Although various technologies, that could facilitate the work of scientists in this direction, have been introduced in the recent years, there is still a lack of open-source platforms that combine them to this end. In this work, we describe and demonstrate SCHeMa, an open-source platform that facilitates the execution and reproducibility of computational analysis on heterogeneous clusters, leveraging containerization, experiment packaging, workflow management, and machine learning technologies.","PeriodicalId":312773,"journal":{"name":"33rd International Conference on Scientific and Statistical Database Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}