{"title":"Visited Websites May Reveal Users’ Demographic Information and Personality","authors":"Cheng-You Lien, Guo-Jhen Bai, Hung-Hsuan Chen","doi":"10.1145/3350546.3352525","DOIUrl":"https://doi.org/10.1145/3350546.3352525","url":null,"abstract":"This study shows that simple supervised learning algorithms can easily predict a user’s personality and demographic information based on the features derived from the users’ browsing logs, even when the logs are not recorded with the finest granularity (i.e., each visited URL of a user). This is different from the analytical formula of Cambridge Analytica (CA), which reported that it needs to know each user’s detailed liked objects (e.g., articles, pages, etc.) on Facebook with a fine granularity (i.e., CA needs to know the liked articles, not only the types of the articles) to predict user information. However, we employed only the visited website categories to predict a user’s gender, age, relationship status, and big six personality scores, which is an authoritative index to represent an individual’s personality in six dimensions. We also show that applying simple clustering as a preprocessing step enhances the predictive power. As a result, the data collectors, even when storing only a coarse granularity of the visited URLs of the users, may leverage such information to identify a user’s preferences/tastes and her/his private information without notifying users.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123868736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Napoleon-Christos I. Oikonomou, Dimitrios Lampoudis, A. Symeonidis
{"title":"Cenote: A Big Data Management and Analytics Infrastructure for theWeb of Things","authors":"Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Napoleon-Christos I. Oikonomou, Dimitrios Lampoudis, A. Symeonidis","doi":"10.1145/3350546.3352531","DOIUrl":"https://doi.org/10.1145/3350546.3352531","url":null,"abstract":"In the era of Big Data, Cloud Computing and Internet of Things, most of the existing, integrated solutions that attempt to solve their challenges are either proprietary, limit functionality to a predefined set of requirements, or hide the way data are stored and accessed. In this work we propose Cenote, an open source Big Data management and analytics infrastructure for the Web of Things that overcomes the above limitations. Cenote is built on component-based software engineering principles and provides an all-inclusive solution based on components that work well individually. CCS CONCEPTS • Software and its engineering $rightarrow$ Data flow architectures.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129063750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SemKeyphrase: An Unsupervised Approach to Keyphrase Extraction from MOOC Video Lectures","authors":"A. Albahr, D. Che, M. Albahar","doi":"10.1145/3350546.3352535","DOIUrl":"https://doi.org/10.1145/3350546.3352535","url":null,"abstract":"The Massive Open Online Courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful and convenient for learners. One such challenge is how to automatically extract a set of keyphrases from MOOC video lectures that can help students quickly identify a suitable knowledge without spending too much time and expedite their learning process. In this paper, we propose SemKeyphrase, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. SemKeyphraseincorporates a new ranking algorithm, called PhaseRank, that involves two phases on ranking candidate keyphrases. Experiment results on a real-world dataset of MOOC video lectures show that our proposed approach outperforms the state-of-the-art methods by 16% or more in terms of F1 score.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128614282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Dynamic Mixed Membership Stochastic Blockmodel","authors":"Zheng Yu, M. Pietrasik, M. Reformat","doi":"10.1145/3350546.3352511","DOIUrl":"https://doi.org/10.1145/3350546.3352511","url":null,"abstract":"Latent community models are successful at statistically modeling network data by assigning network entities to communities and modelling entity relations as the relations of their communities. In this paper, we describe the limitation of these models in inferring relations between two communities when the entity relations between these communities are unobserved. We propose a solution to this problem by factorizing the community relations matrix into two community feature matrices, thereby adding a dependency between community relations. We introduce the deep dynamic mixed membership stochastic blockmodel based network (DDBN) to demonstrate the feasibility of such an approach. Our model marries the mixed membership stochastic blockmodel (MMSB) with deep neural networks for rich feature extraction and introduces a temporal dependency in latent features using a long short-term memory unit for dynamic network modeling. We evaluate our model on the link prediction task in static and dynamic networks and find that our model achieves comparable results with state-of-the-art methods.CCS CONCEPTS• Computing methodologies → Neural networks; Learning in probabilistic graphical models; Factorization methods.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116393345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FIF: A NLP-based Feature Identification Framework for Data Warehouses","authors":"A. Prabhune, Ashish Chouhan","doi":"10.1145/3350546.3352530","DOIUrl":"https://doi.org/10.1145/3350546.3352530","url":null,"abstract":"In a data warehouse, selecting the relevant features is an iterative process that is laborious, time-consuming, and error-prone due to selection bias introduced by either the data expert or the data-analyst. In order to address this challenge, this paper introduces FIF, a Feature Identification Framework that uses Natural Language Processing (NLP) to analyze the hypotheses, identify the relevant feature space and predict the appropriate data mining task and model. The FIF is designed on the principles of microservices architecture pattern, comprising of five core groups of microservices: (a) NLP Pre-processor, (b) Attribute Identifier, (c) Feature Identifier, (d) Topic Modeller, and (e) Data Mining Task Evaluator. Finally, FIF is evaluated with five hypotheses against our data warehouse. CCS CONCEPTS • Information systems → Data warehouses; Wrappers (data mining); Document topic models; Similarity measures; • Computing methodologies → Feature selection; Natural language processing.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122389860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitrios Katsaros, G. Stavropoulos, Dimitrios Papakostas
{"title":"Which machine learning paradigm for fake news detection?","authors":"Dimitrios Katsaros, G. Stavropoulos, Dimitrios Papakostas","doi":"10.1145/3350546.3352552","DOIUrl":"https://doi.org/10.1145/3350546.3352552","url":null,"abstract":"Fake news detection/classification is gradually becoming of paramount importance to out society in order to avoid the so-called reality vertigo, and protect in particular the less educated persons. Various machine learning techniques have been proposed to address this issue. This article presents a comprehensive performance evaluation of eight machine learning algorithms for fake news detection/classification. CCS CONCEPTS • General and reference → Evaluation; • Human-centered computing → Collaborative and social computing design and evaluation methods; Social network analysis.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116662053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicoleta Tantalaki, S. Souravlas, M. Roumeliotis, S. Katsavounis
{"title":"Linear Scheduling of Big Data Streams on Multiprocessor Sets in the Cloud","authors":"Nicoleta Tantalaki, S. Souravlas, M. Roumeliotis, S. Katsavounis","doi":"10.1145/3350546.3352507","DOIUrl":"https://doi.org/10.1145/3350546.3352507","url":null,"abstract":"Nowadays, there is an accelerating need to efficiently and timely handle large amounts of data that arrives continuously. Streams of big data led to the emergence of Distributed Stream Processing Systems (DSPS) that assign processing tasks to the available resources (dynamically or not) and route streaming data between them. Efficient scheduling of processing tasks of data flows can reduce application latencies and eliminate network congestion. In this work, we propose a linear complexity scheme for the task allocation and scheduling problem to improve system’s performance, load balancing and memory efficiency, in applications where there is need for heavy communication (all-to-all) between the tasks assigned to pairs of components.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129480962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-parameter streaming outlier detection","authors":"Theodoros Toliopoulos, A. Gounaris","doi":"10.1145/3350546.3352520","DOIUrl":"https://doi.org/10.1145/3350546.3352520","url":null,"abstract":"Distance-based outlier detection techniques is a wide-spread methodology for anomaly detection. Despite their effectiveness, a main limitation is that they heavily rely on the dataset and the parameters chosen in order to establish the right status of each data point. These parameters typically include, but are not limited to, the neighborhood radius and threshold. In continuous streaming environments, the need for real-time analysis does not permit for an algorithm to be restarted multiple times with different parameters until the right combination is specified. This gives rise to the need for one technique that combines an arbitrary number of parameterizations with the use of minimal yet sufficient computer resources. In this work we both compare the state-of-the-art techniques for handling multiple queries in distance-based outlier detection algorithms and we propose a novel technique for multi-parameter distance-based outlier detection tailored to distributed continuous streaming environments, such as Spark and Flink. CCS CONCEPTS • Information systems $rightarrow$Data stream mining;• Computing methodologies$rightarrow$Anomaly detection; Massively parallel algorithms.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133525559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Shi, Jianan Zhong, Qing Bao, Hongjun Qiu, Jiming Liu
{"title":"EpiRep: Learning Node Representations through Epidemic Dynamics on Networks","authors":"B. Shi, Jianan Zhong, Qing Bao, Hongjun Qiu, Jiming Liu","doi":"10.1145/3350546.3360738","DOIUrl":"https://doi.org/10.1145/3350546.3360738","url":null,"abstract":"Understanding the dynamic properties of epidemic spreading on complex social networks is essential to make effective and efficient public health policies for epidemic prevention and control. In recent years, the concept of network embedding has attracted lots of attention to deal with various network analytic tasks, the purpose of which is to encode relationships or information of networked elements into a low-dimensional vector space. However, most existing embedding methods have focused mainly on preserving static network information, such as structural proximity, node/edge attributes, and labels. On the contrary, in this paper, we focus on the embedding problem of preserving dynamic characteristics of epidemic spreading on social networks. We propose a novel embedding method, namely EpiRep, to learn node representations of a network by maximizing the likelihood of preserving groups of infected nodes due to the epidemics starting from every single node on the network. Specifically, the Susceptible-Infectious model is adopted to simulate the epidemic dynamics on networks, and the Continuous Bag-of-Words model with negative sampling is used to obtain node representations. Experimental results show that the EpiRep method outperforms two benchmark random-walk based embedding methods in terms of node clustering and classification on several synthetic and real-world networks. The proposed method and findings in this paper may offer new insight for source identification and infection prevention in the face of epidemic spreading on social networks.CCS CONCEPTS • Computer systems organization → Embedded systems; Redundancy; Robotics; • Networks → Network reliability.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131926958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rishabh Rustogi, Abhishek Agarwal, Ayush Prasad, S. Saurabh
{"title":"Machine Learning Based Web-Traffic Analysis for Detection of Fraudulent Resource Consumption Attack in Cloud","authors":"Rishabh Rustogi, Abhishek Agarwal, Ayush Prasad, S. Saurabh","doi":"10.1145/3350546.3352567","DOIUrl":"https://doi.org/10.1145/3350546.3352567","url":null,"abstract":"Attackers can orchestrate a fraudulent resource consumption (FRC) attack by wittingly consuming metered resources of the cloud servers for a long duration of time. The skillful over-consumption of the resources results in significant financial burden to the client. These attacks differ in intent but not in content, hence they are hard to detect. In this paper, we propose a novel scheme for the detection of the FRC attack on a cloud based web-server. We first divide the web-pages into a number of quantiles based on their popularity index. Next, we compute the number of requests per hour for each of these quantiles. Discrete Wavelet Transform is then applied to these quantiles to remove any high-frequency anomaly and smoothen the time series data. The n-tuple data from these quantiles along with their label (attack or normal) is used to train an Artificial Neural Network model. Our trained model for low percent of FRC attack (5%) obtained an accuracy of 98.51% with a precision of 0.983 and recall of 0.987 in detecting the FRC attack. CCS CONCEPTS • Security and privacy → Intrusion/anomaly detection and malware mitigation; → Computing methodologies → Supervised learning by classification.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131460242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}