2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)最新文献_第9页

Exploiting a Determinant-Based Metric to Evaluate a Word-Embeddings Matrix of Items 利用基于行列式的度量来评估项目的词嵌入矩阵

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0143

Ludovico Boratto, S. Carta, G. Fenu, Roberto Saia

{"title":"Exploiting a Determinant-Based Metric to Evaluate a Word-Embeddings Matrix of Items","authors":"Ludovico Boratto, S. Carta, G. Fenu, Roberto Saia","doi":"10.1109/ICDMW.2016.0143","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0143","url":null,"abstract":"In order to generate effective results, it is essential for a recommender system to model the information about the user interests (user profiles). A profile usually contains preferences that reflect the recommendation technique, so collaborative systems represent a user with the ratings given to items, while content-based approaches assign a score to semantic/text-based features of the evaluated items. Even though semantic technologies are rapidly evolving and word embeddings (i.e., vector representations of the words in a corpus) are effective in numerous information filtering tasks, at the moment collaborative approaches (such as SVD) still generate more accurate recommendations. However, this might happen because, by employing classic profiles in form of vectors that collect all the preferences of a user, the power of word embeddings at modeling texts could be affected. In this paper we represent a profile as a matrix of word-embedding vectors of the items a user evaluated, and present a novel determinant-based metric that measures the similarity between an unevaluated item and those in the matrix-based user profile, in order to generate effective content-based recommendations. Experiments performed on three datasets show the capability of our approach to perform a better ranking of the items w.r.t. collaborative filtering, both when compared to a latent-factor-based approach (SVD) and to a classic neighborhood user-based system.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131365253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TENET: A Machine Learning-Based System for Target Characterization in Signaling Networks 宗旨:基于机器学习的信令网络目标表征系统

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0186

Huey-Eng Chua, S. Bhowmick, L. Tucker-Kellogg, C. Dewey

引用次数: 0

Score Look-Alike Audiences 为相似的观众打分

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0097

Qiang Ma, Eeshan Wagh, Jiayi Wen, Zhen Xia, Róbert Ormándi, Datong Chen

引用次数: 16

Smart Phone User Behaviour Characterization Based on Autoencoders and Self Organizing Maps 基于自编码器和自组织地图的智能手机用户行为表征

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0052

Deepthi Rajashekar, A. N. Zincir-Heywood, M. Heywood

{"title":"Smart Phone User Behaviour Characterization Based on Autoencoders and Self Organizing Maps","authors":"Deepthi Rajashekar, A. N. Zincir-Heywood, M. Heywood","doi":"10.1109/ICDMW.2016.0052","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0052","url":null,"abstract":"Building applications that are cognizant of temporal and spatial changes in human behaviour under a one-class learning restriction represents a requirement for many user centric systems. We are particularly motivated to demonstrate the utility of algorithms for the self identification of smart phones. A framework is designed to quantify: (i) the dissimilarity in behaviours among any two users, (ii) the exclusivity of each user's behaviour (inclass) from the world (outclass). A central element of the proposed framework is to first identify a discriminating representation for each user. To this end, an autoencoder is employed in which the goal is to identify an encoding that rebuilds the original data with maximum accuracy/ least loss. The hypothesis of this work is that such an autoencoding step provides an effective mechanism for discovering good data representations prior to the application of a data description technique, such as clustering. Both the autoencoder and the clustering steps are performed relative to a single user. We construct a user specific behavioural model using the most frequently used applications, cell towers and websites. We demonstrate that relative to the most up-to-date publicly available smart phone data set, the resulting behavioural models are capable of uniquely identifying each user under a one-class learning constraint.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Multi-sentiment Modeling with Scalable Systematic Labeled Data Generation via Word2Vec Clustering 基于Word2Vec聚类的可扩展系统标记数据生成的多情感建模

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0139

Dhruv Mayank, Kanchana Padmanabhan, K. Pal

{"title":"Multi-sentiment Modeling with Scalable Systematic Labeled Data Generation via Word2Vec Clustering","authors":"Dhruv Mayank, Kanchana Padmanabhan, K. Pal","doi":"10.1109/ICDMW.2016.0139","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0139","url":null,"abstract":"Social networks are now a primary source for news and opinions on topics ranging from sports to politics. Analyzing opinions with an associated sentiment is crucial to the success of any campaign (product, marketing, or political). However, there are two significant challenges that need to be overcome. First, social networks produce large volumes of data at high velocities. Using traditional (semi-) manual methods to gather training data is, therefore, impractical and expensive. Second, humans express more than two emotions, therefore, the typical binary good/bad or positive/negative classifiers are no longer sufficient to address the complex needs of the social marketing domain. This paper introduces a hugely scalable approach to gathering training data by using emojis as proxy for user sentiments. This paper also introduces a systematic Word2Vec based clustering method to generate emoji clusters that arguably represent different human emotions (multi-sentiment). Finally, this paper also introduces a threshold-based formulation to predicting one or two class labels (multi-label) for a given document. Our scalable multi-sentiment multi-label model produces a cross-validation accuracy of 71.55% (± 0.22%). To compare against other models in the literature, we also trained a binary (positive vs. negative) classifier. It produces a cross-validation accuracy of 84.95% (± 0.17%), which is arguably better than several results reported in literature thus far.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125285052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

WEFEST: Word Embedding Feature Extension for Short Text Classification 用于短文本分类的词嵌入功能扩展

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0101

Lei Sang, Fei Xie, Xiaojian Liu, Xindong Wu

{"title":"WEFEST: Word Embedding Feature Extension for Short Text Classification","authors":"Lei Sang, Fei Xie, Xiaojian Liu, Xindong Wu","doi":"10.1109/ICDMW.2016.0101","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0101","url":null,"abstract":"Short text classification is a crucial task for information retrieval, social medial text categorization, and many other applications. In reality, due to the inherent sparsity and the limited information available in the short texts, learning and classifying short texts is a significant challenge. In this paper, we propose a new framework, WEFEST, which expands short texts using word embedding for classification. WEFEST is rooted on the deep language model, which learns a new word embedding space, by using word correlations, such that semantically related words also have close feature vectors in the new space. By using word embedding features to help expand the short tests, WEFEST can enrich the word density in the short texts for effective learning, by following three major steps. First, each short text in the training dataset is enriched by using pre-trained word feature embedding. Then the semantic similarity between two short texts is calculated by using the statistical frequency information retrieved from the trained model. Finally, we use the nearest neighbor algorithm to achieve short text classification. Experimental results on Chinese news title dataset validate the effectiveness of the proposed method.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125309763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Event Detection for Urban Dynamic Data Streams 城市动态数据流的事件检测

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0016

S. Nechifor, Ioana Stefan, Marten Fischer, D. Puiu

引用次数: 1

Bayesian Deep Convolution Belief Networks for Subjectivity Detection 主体性检测的贝叶斯深度卷积信念网络

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0134

I. Chaturvedi, E. Cambria, Soujanya Poria, Rajiv Bajpai

引用次数: 17

ID-Link, an Enabler for Medical Data Marketplace ID-Link，医疗数据市场的推动者

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0117

Ryuji Ito

引用次数: 3

Query-Based Evolutionary Graph Cuboid Outlier Detection 基于查询的进化图长方体离群点检测

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) Pub Date : 2016-12-01 DOI: 10.1109/ICDMW.2016.0020

Ayushi Dalmia, Manish Gupta, Vasudeva Varma

{"title":"Query-Based Evolutionary Graph Cuboid Outlier Detection","authors":"Ayushi Dalmia, Manish Gupta, Vasudeva Varma","doi":"10.1109/ICDMW.2016.0020","DOIUrl":"https://doi.org/10.1109/ICDMW.2016.0020","url":null,"abstract":"Graph-OLAP is an online analytical framework which allows us to obtain various projections of a graph, each of which helps us view the graph along multiple dimensions and multiple levels. Given a series of snapshots of a temporal heterogeneous graph, we aim to find interesting projections of the graph which have anomalous evolutionary behavior. Detecting anomalous projections in a series of such snapshots can be helpful for an analyst to understand the regions of interest from the temporal graph. Identifying such semantically related regions in the graph allows the analyst to derive insights from temporal graphs which enables her in making decisions. While most of the work on temporal outlier detection is performed on nodes, subgraphs and communities, we are the first to propose detection of evolutionary graph cuboid outliers. Further, we perform this detection in a query sensitive manner. Thus, an evolutionary graph cuboid outlier is a projection (or cuboid) of a snapshot of the temporal graph such that it contains an unexpected number of matches for the query with respect to other cuboids both in the same snapshot as well as in the other snapshots. Identifying such outliers is challenging because (1) the number of cuboids per snapshot could be large, and (2) number of snapshots could itself be large. We model the problem by predicting the outlier score for each cuboid in each snapshot. We propose to build subspace ensemble regression models to learn (a) the behavior of a cuboid across different snapshots, and (b) the behavior of all the cuboids in a given snapshot. Experimental results on both synthetic and real datasets show the effectiveness of the proposed algorithm in discovering evolutionary graph cuboid outliers.","PeriodicalId":373866,"journal":{"name":"2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131247471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3