{"title":"Exploiting a Determinant-Based Metric to Evaluate a Word-Embeddings Matrix of Items","authors":"Ludovico Boratto, S. Carta, G. Fenu, Roberto Saia","doi":"10.1109/ICDMW.2016.0143","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0143","url":null,"abstract":"In order to generate effective results, it is essential for a recommender system to model the information about the user interests (user profiles). A profile usually contains preferences that reflect the recommendation technique, so collaborative systems represent a user with the ratings given to items, while content-based approaches assign a score to semantic/text-based features of the evaluated items. Even though semantic technologies are rapidly evolving and word embeddings (i.e., vector representations of the words in a corpus) are effective in numerous information filtering tasks, at the moment collaborative approaches (such as SVD) still generate more accurate recommendations. However, this might happen because, by employing classic profiles in form of vectors that collect all the preferences of a user, the power of word embeddings at modeling texts could be affected. In this paper we represent a profile as a matrix of word-embedding vectors of the items a user evaluated, and present a novel determinant-based metric that measures the similarity between an unevaluated item and those in the matrix-based user profile, in order to generate effective content-based recommendations. Experiments performed on three datasets show the capability of our approach to perform a better ranking of the items w.r.t. collaborative filtering, both when compared to a latent-factor-based approach (SVD) and to a classic neighborhood user-based system.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131365253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huey-Eng Chua, S. Bhowmick, L. Tucker-Kellogg, C. Dewey
{"title":"TENET: A Machine Learning-Based System for Target Characterization in Signaling Networks","authors":"Huey-Eng Chua, S. Bhowmick, L. Tucker-Kellogg, C. Dewey","doi":"10.1109/ICDMW.2016.0186","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0186","url":null,"abstract":"Target characterization of a biological network identifies characteristics that distinguish targets (nodes that can serve as molecular targets of drugs) from other nodes. In this demonstration, we present TENET (Target charactErization using NEtwork Topology), a software that facilitates topological features-based characterization of known targets in signaling networks modelling dynamic interactions within biological systems. TENET is based on a support vector machine (SVM)-based approach and generates a characterization model. These models specify topological features that can discriminate known targets and how these features are combined to quantify the likelihood of a node being a target. Hence, TENET can be used for prioritizing targets and for identifying novel candidate targets that share similar characteristics with known targets. The interactive user interface that TENET provides facilitates users' study and understanding of topological characteristics of targets in signaling networks.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121348517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Score Look-Alike Audiences","authors":"Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Róbert Ormándi, Datong Chen","doi":"10.1109/ICDMW.2016.0097","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0097","url":null,"abstract":"Look-alike models, which are efficient tools for finding similar users from a smaller user set, are quickly revolutionizing the online programmatic advertising industry. The datasets in these contexts exhibit extremely sparse feature spaces on a massive scale, so traditionally, the state-of-the-art look-alike models have used pairwise similarities to construct these similar user sets. One of the key challenges of the similarity-based models is that they do not provide a way to measure the potential value of the users to an advertiser, which is crucial in an advertising context. We propose methods to score users within the expanded audience in a way which relates directly to the business metric that the advertiser wants to optimize. We present three scoring models and show that, through empirical evaluation using real-world, large-scale data, by incorporating the potential value of a user to an advertiser into our scoring model, we can significantly improve the performance of the look-alike models over methods which only use pairwise similarities of users.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121427798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepthi Rajashekar, A. N. Zincir-Heywood, M. Heywood
{"title":"Smart Phone User Behaviour Characterization Based on Autoencoders and Self Organizing Maps","authors":"Deepthi Rajashekar, A. N. Zincir-Heywood, M. Heywood","doi":"10.1109/ICDMW.2016.0052","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0052","url":null,"abstract":"Building applications that are cognizant of temporal and spatial changes in human behaviour under a one-class learning restriction represents a requirement for many user centric systems. We are particularly motivated to demonstrate the utility of algorithms for the self identification of smart phones. A framework is designed to quantify: (i) the dissimilarity in behaviours among any two users, (ii) the exclusivity of each user's behaviour (inclass) from the world (outclass). A central element of the proposed framework is to first identify a discriminating representation for each user. To this end, an autoencoder is employed in which the goal is to identify an encoding that rebuilds the original data with maximum accuracy/ least loss. The hypothesis of this work is that such an autoencoding step provides an effective mechanism for discovering good data representations prior to the application of a data description technique, such as clustering. Both the autoencoder and the clustering steps are performed relative to a single user. We construct a user specific behavioural model using the most frequently used applications, cell towers and websites. We demonstrate that relative to the most up-to-date publicly available smart phone data set, the resulting behavioural models are capable of uniquely identifying each user under a one-class learning constraint.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-sentiment Modeling with Scalable Systematic Labeled Data Generation via Word2Vec Clustering","authors":"Dhruv Mayank, Kanchana Padmanabhan, K. Pal","doi":"10.1109/ICDMW.2016.0139","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0139","url":null,"abstract":"Social networks are now a primary source for news and opinions on topics ranging from sports to politics. Analyzing opinions with an associated sentiment is crucial to the success of any campaign (product, marketing, or political). However, there are two significant challenges that need to be overcome. First, social networks produce large volumes of data at high velocities. Using traditional (semi-) manual methods to gather training data is, therefore, impractical and expensive. Second, humans express more than two emotions, therefore, the typical binary good/bad or positive/negative classifiers are no longer sufficient to address the complex needs of the social marketing domain. This paper introduces a hugely scalable approach to gathering training data by using emojis as proxy for user sentiments. This paper also introduces a systematic Word2Vec based clustering method to generate emoji clusters that arguably represent different human emotions (multi-sentiment). Finally, this paper also introduces a threshold-based formulation to predicting one or two class labels (multi-label) for a given document. Our scalable multi-sentiment multi-label model produces a cross-validation accuracy of 71.55% (± 0.22%). To compare against other models in the literature, we also trained a binary (positive vs. negative) classifier. It produces a cross-validation accuracy of 84.95% (± 0.17%), which is arguably better than several results reported in literature thus far.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125285052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WEFEST: Word Embedding Feature Extension for Short Text Classification","authors":"Lei Sang, Fei Xie, Xiaojian Liu, Xindong Wu","doi":"10.1109/ICDMW.2016.0101","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0101","url":null,"abstract":"Short text classification is a crucial task for information retrieval, social medial text categorization, and many other applications. In reality, due to the inherent sparsity and the limited information available in the short texts, learning and classifying short texts is a significant challenge. In this paper, we propose a new framework, WEFEST, which expands short texts using word embedding for classification. WEFEST is rooted on the deep language model, which learns a new word embedding space, by using word correlations, such that semantically related words also have close feature vectors in the new space. By using word embedding features to help expand the short tests, WEFEST can enrich the word density in the short texts for effective learning, by following three major steps. First, each short text in the training dataset is enriched by using pre-trained word feature embedding. Then the semantic similarity between two short texts is calculated by using the statistical frequency information retrieved from the trained model. Finally, we use the nearest neighbor algorithm to achieve short text classification. Experimental results on Chinese news title dataset validate the effectiveness of the proposed method.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125309763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nechifor, Ioana Stefan, Marten Fischer, D. Puiu
{"title":"Event Detection for Urban Dynamic Data Streams","authors":"S. Nechifor, Ioana Stefan, Marten Fischer, D. Puiu","doi":"10.1109/ICDMW.2016.0016","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0016","url":null,"abstract":"This paper presents a framework for processing the data generated by Smart City sensors and IoT data streams in real-time. The scope of processing is to detect various event patterns from the raw data. The framework is extensible because at any moment new data sources can be registered or new specific event detection mechanism can be deployed. The framework offers a HTTP interface which can be used to provide details about each data stream. In order to connect to the heterogeneous data source end points and fetching the observations a concept of simple adaptable data wrappers is introduced. Having the streams registered into the framework, the domain expert can deploy (using a Java API) the event detection mechanism. The domain expert (maybe with some help from an application developer) has only to develop the data wrappers and event detection modules. Once the modules are developed, they can be deployed any time and on any numbers for different sensors of the same type, respective similar events to be detected.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"44 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117095051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Chaturvedi, E. Cambria, Soujanya Poria, Rajiv Bajpai
{"title":"Bayesian Deep Convolution Belief Networks for Subjectivity Detection","authors":"I. Chaturvedi, E. Cambria, Soujanya Poria, Rajiv Bajpai","doi":"10.1109/ICDMW.2016.0134","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0134","url":null,"abstract":"Subjectivity detection aims to distinguish natural language as either opinionated (positive or negative) or neutral. In word vector based convolutional neural network models, a word meaning is simply a signal that helps to classify larger entities such as a document. Previous works do not usually consider prior distribution when using sliding windows to learn word embedding's and, hence, they are unable to capture higher-order and long-range features in text. In this paper, we employ dynamic Gaussian Bayesian networks to learn significant network motifs of words and concepts. These motifs are used to pre-train the convolutional neural network and capture the dynamics of discourse across several sentences.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121929356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ID-Link, an Enabler for Medical Data Marketplace","authors":"Ryuji Ito","doi":"10.1109/ICDMW.2016.0117","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0117","url":null,"abstract":"Business value would be brought from data exchange and individual skill is indispensable to be aware of new idea by combining different data that brings benefit for new market. Based on the concept model, a data marketplace has been discussed in the area of commercial vehicles in Japan toward efficiency of commercial distribution. Similar to the model, a scheme of data marketplace in healthcare industry is introduced in this paper that has already been in operation, which is called as \"ID-Link\". The scheme is now developing a new business model to expand B2C business on the basis of ID-Link. Authors believe ID-Link would be a reference model for further type of data marketplace for business.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124034044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query-Based Evolutionary Graph Cuboid Outlier Detection","authors":"Ayushi Dalmia, Manish Gupta, Vasudeva Varma","doi":"10.1109/ICDMW.2016.0020","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0020","url":null,"abstract":"Graph-OLAP is an online analytical framework which allows us to obtain various projections of a graph, each of which helps us view the graph along multiple dimensions and multiple levels. Given a series of snapshots of a temporal heterogeneous graph, we aim to find interesting projections of the graph which have anomalous evolutionary behavior. Detecting anomalous projections in a series of such snapshots can be helpful for an analyst to understand the regions of interest from the temporal graph. Identifying such semantically related regions in the graph allows the analyst to derive insights from temporal graphs which enables her in making decisions. While most of the work on temporal outlier detection is performed on nodes, subgraphs and communities, we are the first to propose detection of evolutionary graph cuboid outliers. Further, we perform this detection in a query sensitive manner. Thus, an evolutionary graph cuboid outlier is a projection (or cuboid) of a snapshot of the temporal graph such that it contains an unexpected number of matches for the query with respect to other cuboids both in the same snapshot as well as in the other snapshots. Identifying such outliers is challenging because (1) the number of cuboids per snapshot could be large, and (2) number of snapshots could itself be large. We model the problem by predicting the outlier score for each cuboid in each snapshot. We propose to build subspace ensemble regression models to learn (a) the behavior of a cuboid across different snapshots, and (b) the behavior of all the cuboids in a given snapshot. Experimental results on both synthetic and real datasets show the effectiveness of the proposed algorithm in discovering evolutionary graph cuboid outliers.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131247471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}