{"title":"Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications","authors":"Breno Dantas Cruz, A. Paul, Z. Song, E. Tilevich","doi":"10.1109/SMDS49396.2020.00009","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00009","url":null,"abstract":"As a solution to the sensor data deluge, edge computing processes sensor data by means of local devices. Many of these devices are resource-scarce in terms of the available processing capabilities and battery power. To achieve the required design trade-offs of edge applications, developers must be able to understand the performance and resource utilization of data processing algorithms. An increasing number of edge-based applications use machine learning (ML) as their key functionality. However, the performance and resource utilization of ML algorithms remain poorly understood, thus hindering the system design of edge-based ML applications. In addition, developers often cannot access real-world edge-based test beds during the design phase. To address this problem, we present an approach for estimating the performance of edge-based ML applications, with a particular application to clustering. To that end, we first comprehensively evaluate the performance and resource utilization of widely used clustering algorithms deployed in a representative edge environment. Second, we identify which properties of these algorithms are correlated with their performance and resource utilization. Finally, we apply our findings to create Stargazer, a Deep Neural Network that given a clustering algorithm's computational load and input data size, estimates how this algorithm would perform and utilize resources in an edge-based application. Our tool provides viable decision-making support for addressing the multifaceted design challenges of edge-based ML applications.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122783546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M2NN: Rare Event Inference through Multi-variate Multi-scale Attention","authors":"Manjusha Ravindranath, K. Candan, M. Sapino","doi":"10.1109/SMDS49396.2020.00014","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00014","url":null,"abstract":"With the increasing availability of sensory data, inferring the existence of relevant events in the observations is becoming a critical task for smart data service delivery in applications that rely on such data sources. Yet, existing solutions tend to fail when the events that are being inferred are rare, for instance when one attempts to infer seizure events in electroencephalogram (EEG) data. In this paper, we note that multi-variate time series often carry robust localized multi-variate temporal features that could, at least in theory, help identify these events; however, the lack of sufficient data to train for these events make it impossible for neural architectures to identify and make use of these features. To tackle this challenge, we propose an LSTM-based neural architecture, M2N N, with an attention mechanism that leverages robust multivariate temporal features that are extracted a priori and fed into the NN as a side information. In particular, multi-variate temporal features are extracted by simultaneously considering, at multiple scales, temporal characteristics of the time series along with external knowledge, including variate relationships that are known a priori. We then show that a single layer LSTM with dual-layer attention that leverages these multi-scale, multi-variate features provides significant gains in rare seizure detection on EEG data. In addition, in order to illustrate the broader applicability (and reproducibility) of M2N N, we also evaluate it in other publicly available rare event detection tasks, such as anomaly detection in manufacturing. We further show that the proposed M2N N technique is beneficial in tackling more traditional inference problems, such as travel-time prediction, where rare accident events can cause congestions.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conflict-Free Replicated Relations for Multi-Synchronous Database Management at Edge","authors":"Weihai Yu, C. Ignat","doi":"10.1109/SMDS49396.2020.00021","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00021","url":null,"abstract":"In a cloud-edge environment, edge devices may not always be connected to the network. Still, applications may need to access the data on edge devices even when they are not connected. With support for multi-synchronous access, data on an edge device are kept synchronous with the data in the cloud as long as the device is online. When the device is offline, the application can still access the data on the device, asynchronously with concurrent data updates either in the cloud or on other edge devices. Conflict-free Replicated Data Types (CRDTs) emerged as a technology for multi-synchronous data access. CRDTs guarantee that when all sites have applied the same set of updates, the replicated data converge. However, CRDTs have not been successfully applied to relational databases (RDBs) for multi-synchronous access. In this paper, we present Conflict-free Replicated Relations (CRRs) that apply CRDTs to RDBs for support of multi-synchronous data access. With CRR, existing RDB applications, with very little modification, can be enhanced with multi-synchronous access. We also present a prototype implementation of CRR with some preliminary performance results.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"44 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133453940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies","authors":"Feng Wang, Yongning Tang, Lixin Gao, Guang Cheng","doi":"10.1109/SMDS49396.2020.00012","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00012","url":null,"abstract":"As 5G/IoT networks constantly growing and evolving, proliferated network traffic bring an unprecedented challenge to detecting and identifying flow anomalies, such as heavy hitters, heavy changes and superspreaders. Many flow data analytics have been proposed to tackle the problem. Sketch-based approaches are the most commonly used flow analytics service, in which a compressed data structure is used to keep a summary of the original data and estimate traffic statistics such as flow size for all traffic flows. However, those approaches either induce information losses due to sampling or incur computational and space overheads for key recovery. In this paper, we propose a new lightweight traffic analytics service, called BC-sketch, for faster and more accurate detection of heavy keys using very small number of counters. BC-sketch provides reversible sketch using an extensible data structure designed to accommodate different sketch-based solutions. BC-sketch can be efficiently provisioned as a traffic analytics service in resource constrained IoT devices, or integrated to various virtual network environments as a virtual service to detect heavy hitter, superspreader and heavy change. To demonstrate its effectiveness, we use BC-sketch to detect heavy hitters, superspreaders, and heavy changes. Both theoretical analysis and experimental evaluations show that BC-sketch can provide higher precision for identifying those traffic anomalies with low memory and computational overheads.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115074832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CNN Approaches to Classify Multivariate Time Series Using Class-specific Features","authors":"Yifan Hao, H. Cao, Erick Draayer","doi":"10.1109/SMDS49396.2020.00008","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00008","url":null,"abstract":"Many smart data services (e.g., smart energy, smart homes) collect and utilize time series data (e.g., energy production and consumption, human body movement) to conduct data analysis. Among such analysis tasks, classification is a widely utilized technique to provide data-driven solutions. Most existing classification methods extract a single set of features from the data and use this feature set for classification across multiple classes. This often ignores the reality that different and class-specific subsets of the initial feature set may better facilitate classification. In this paper, we propose two convolutional neural network (CNN) models using class-specific variables to solve the multi-class classification problem over multivariate time series (MTS) data. A new loss function is introduced for training the CNN models. We compare our proposed methods with 13 baseline approaches using 14 real datasets. The extensive experimental results show that our new approaches can not only outperform other methods on classification accuracy, but also successfully identify important class-specific variables.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130674046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang
{"title":"SMDS 2020 Organizing Committee","authors":"Mohand-Saïd Hacid, Zhaohui Wu, Laurence T. Yang, Hongbing Wang, Nabil El Ioini, Kenneth Fletcher, M. Gergatsoulis, Daniel Grosu, Jin-Kao Hao, M. Sapino, Huasong Shan, Kurt Tutschku, Sudharshan S. Vazhkudai, M. A. Vega-Rodríguez, S. Ventura, Jian Wang, Haohua Wang, Jianwu Wang, Shangguang Wang","doi":"10.1109/smds49396.2020.00025","DOIUrl":"https://doi.org/10.1109/smds49396.2020.00025","url":null,"abstract":"Program Committee Amani Abu Jabal, Purdue University Jacky Akoka, CEDRIC-CNAM & IMT-TEM Mohsen Amini Salehi, University of Louisiana Lafayette Rui Araujo, University of Coimbra Claudio Ardagna, Universita' degli Studi di Milano Mohan Baruwal Chhetri, CSIRO Paolo Bellavista, University of Bologna Nik Bessis, Edge Hill University Frank Blaauw, University of Groningen Luca Cagliero, Politecnico di Torino Jian Cao, Shanghai Jiao Tong University Chia-Hui Chang, National Central University Feng Chen, Louisiana State University Tao Chen, Loughborough University Yong Chen, Tianjin University Shizhan Chen, Tianjin University Lisi Chen, Hong Kong Baptist University Bo Cheng, Beijing University of Posts & Telecommunications Lizhen Cui, Shandong University Edward Curry, NUI Galway Harshad Deshmukh, Google Sheng Di, ANL Zhijun Ding, Tongji University Weilong Ding, North China University of Technology Mario Jose Divan, UNLPam Schahram Dustdar, Vienna University of Technology Nabil El Ioini Kenneth Fletcher, University of Massachusetts Boston Matthew Forshaw, Newcastle University Mohamed Gaber, Birmingham City University Mikel Galar, Universidad Pública de Navarra Mengmeng Ge, Deakin University","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125610472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data","authors":"Mahmudul Hassan, S. Bansal","doi":"10.1109/SMDS49396.2020.00023","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00023","url":null,"abstract":"The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114368927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time System for Short- and Long-Term Prediction of Vehicle Flow","authors":"S. Bilotta, P. Nesi, I. Paoli","doi":"10.1109/SMDS49396.2020.00019","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00019","url":null,"abstract":"Nowadays, traffic management and sustainable mobility are becoming one of the central topics for intelligent transportation systems (ITS). Thanks to the today's technologies, it is possible to collect real-time data to monitor the traffic situation in some specific areas. An important challenge in ITS is the ability to predict road traffic variables. The short-term predictions of traffic aspects are a complex nonlinear task that has been the subject of many research efforts in the past few decades. Accessing to precise traffic flow data is mandatory for a large number of applications which have to guarantee high level of services such as: traffic flow reconstruction, which in turn is used to perform what-if analysis, conditioned routing, etc. They have to be reliable and precise for sending rescue teams and fire brigades. This paper proposes a solution for a short- and long-term traffic flow prediction estimation by using and comparing a number of machine learning approaches. The solution has been developed in the context of Sii-Mobility smart city mobility and transport national project and it is in use in other EC projects and solution such as Snap4City PCP EC and TRAFAIR CEF, but also for REPLICATE H2020 SCC1 and control room in Florence area.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"406 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123067928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Latent Feelings-aware RNN Model for User Churn Prediction with only Behaviour data","authors":"Meng Xi, Zhiling Luo, Naibo Wang, Jianrong Tao, Ying Li, Jianwei Yin","doi":"10.1109/SMDS49396.2020.00011","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00011","url":null,"abstract":"User Churn Prediction is a cutting-edge research area in the web service industry, it is the key for managing the user in the virtual world and provide feedback information for improving the corresponding web service. At present, most of the relevant work is to design a questionnaire to collect data of users' characteristics and feelings and then develop a general model by finding relevance. However, that kind of methods requires quite a time and manpower, and most web services can only obtain logs of users' behaviours and have no access to users' feature data. Therefore, it is a big challenge to conduct user churn prediction with only behavior data and get users' latent feelings from their action data in order to improve the accuracy of churn prediction. In this paper, a novel Latent Feelings-aware RNN model, namely LaFee, has been proposed to solve the user churn prediction problem by using only behaviour data. The latent feelings, proven to be satisfaction and aspiration, can be estimated through the intermediate variable of the trained LaFee. We also designed experiments on a real dataset and the results show that our methods outperform the baselines.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam
{"title":"EdgeInfer: Robust Truth Inference under Data Poisoning Attack","authors":"Farnaz Tahmasebian, Li Xiong, Mani Sotoodeh, V. Sunderam","doi":"10.1109/SMDS49396.2020.00013","DOIUrl":"https://doi.org/10.1109/SMDS49396.2020.00013","url":null,"abstract":"As crowdsourcing is becoming more widely used for annotating data from a large group of users, attackers have strong incentives to manipulate the system. Deriving the true answer of tasks in crowdsourcing systems based on user-provided data is susceptible to data poisoning attacks, whereby malicious users may intentionally or strategically report incorrect information to mislead the system into inferring the wrong truth for a set of tasks. Recent work has proposed several attacks on the crowdsourcing systems and showed that existing truth inference methods may be vulnerable to such attacks. In this paper, we propose solutions to enhance the robustness of existing truth inference methods. Our solutions base on 1) detecting and augmenting the answers for the boundary tasks in which users could not reach a strong consensus and hence are subjective to potential manipulation, and 2) enhancing inference method with a stronger prior. We empirically evaluate these defense mechanisms by designing attack scenarios that aim to decrease the accuracy of the system. Experiments show that our method is effective and significantly improves the robustness of the system under attack.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126600721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}