2017 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献_第6页

Robust Projective Dictionary Learning by Joint Label Embedding and Classification 基于联合标签嵌入和分类的鲁棒投影字典学习

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.72

Weiming Jiang, Zhao Zhang, Jie Qin, Mingbo Zhao, Fanzhang Li, Shuicheng Yan

{"title":"Robust Projective Dictionary Learning by Joint Label Embedding and Classification","authors":"Weiming Jiang, Zhao Zhang, Jie Qin, Mingbo Zhao, Fanzhang Li, Shuicheng Yan","doi":"10.1109/ICDMW.2017.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.72","url":null,"abstract":"In this paper, we propose a new discriminative dictionary learning framework, called robust Label Embedding Projective Dictionary Learning (LE-PDL), for data classification. LE-PDL can learn a discriminative dictionary and the blockdiagonal representations without using the l0-norm or l1-norm sparsity regularization, since the l0 or l1-norm constraint on the coding coefficients used in the existing DL methods makes the training phase time-consuming. To enhance the performance, we also consider label information of the dictionary atoms in the learning process of LE-PDL to encourage the intra-class atoms to deliver similar profiles and enforce the coefficient matrix to be block-diagonal. Besides, our LE-PDL also involves an underlying projection to bridge data with their coefficients by extracting special features from given data. Then, we can train a classifier based on the extracted features so that the classification and representation powers are jointly considered. So, the classification approach of our model is efficient, since it avoids the extra time-consuming sparse reconstruction process with trained dictionary for each new test data as most existing DL methods. Besides, a robust l2,1-norm is regularized on the classifier and the non-negative constraint is used for the coding coefficients to enhance the performance. Experimental results show the effectiveness of our formulation.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123615099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A New Method for Stock Price Prediction Based on MRFs and SSVM 基于mrf和SSVM的股票价格预测新方法

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.113

Lin Lai, Chang Li, Wen Long

引用次数: 8

RESTRAC: REference Sequence Based Space TRAnsformation for Clustering RESTRAC:基于参考序列的聚类空间变换

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.66

A. T. Islam, S. Pramanik, Vahid Mirjalili, S. Sural

{"title":"RESTRAC: REference Sequence Based Space TRAnsformation for Clustering","authors":"A. T. Islam, S. Pramanik, Vahid Mirjalili, S. Sural","doi":"10.1109/ICDMW.2017.66","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.66","url":null,"abstract":"Effective mining of large amount of DNA and RNA fragments obtained from next generation sequencing technologies, depends on the availability of efficient analytical tools to process them. One of the important aspects of this analysis, dealing with huge number of fragments, is partitioning them based on their level of similarities. In this paper we propose a space transformation based clustering approach to achieve this partitioning. In this approach, we transform each sequence by a set of reference sequences into a point in a multidimensional vector space and do the clustering in this vector space. We show through extensive analysis that the proposed transformation very closely preserve the clustering properties of the sequences using edit distance. Time for this transformation is linear with the number of sequences. The amount of time saving for this clustering is significant because in this approach edit distance calculations between two sequences are replaced by vector distance calculations between two corresponding feature vectors. We used agglomerative hierarchical clustering using single and average linkage because they are frequently used by the bioinformatics community. Agglomerative hierarchical clustering runs in quadratic time with the number of sequences and clustering time for this approach in the edit space can be prohibitive for large number of sequences. There exists greedy heuristic methods that perform clustering much faster but at the cost of significantly reduced cluster quality. We have applied our method to 16S rRNA fragment datasets obtained from different environmental samples. In these experiments, RESTRAC achieves up to five hundred times speed-up for single linkage and up to five times speed-up for average linkage while preserving good cluster quality.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130508143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Feature Selection in Learning Using Privileged Information 利用特权信息学习中的特征选择

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.131

R. Izmailov, Blerta Lindqvist, Peter Lin

{"title":"Feature Selection in Learning Using Privileged Information","authors":"R. Izmailov, Blerta Lindqvist, Peter Lin","doi":"10.1109/ICDMW.2017.131","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.131","url":null,"abstract":"The paper considers the problem of feature selection in learning using privileged information (LUPI), where some of the features (referred to as privileged ones) are only available for training, while being absent for test data. In the latest implementation of LUPI, these privileged features are approximated using regressions constructed on standard data features, but this approach could lead to polluting the data with poorly constructed and/or noisy features. This paper proposes a privileged feature selection method that addresses some of these issues. Since not many LUPI datasets are currently available in open access, while calibration of parameters of the proposed method requires testing it on a wide variety of datasets, a modified version of the method for traditional machine learning paradigm (i.e., without privileged features) was also studied. This lead to a novel mechanism of error rate reduction by constructing and selecting additional regression-based features capturing mutual relationships among standard features. The results on calibration datasets demonstrate the efficacy of the proposed feature selection method both for standard classification problems (tested on multiple calibration datasets) and for LUPI (for several datasets described in the literature).","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116558740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Controversy Detection Using Reactions on Social Media 利用社交媒体上的反应进行争议检测

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.121

Allaparthi Sriteja, Prakhar Pandey, Vikram Pudi

引用次数: 9

Global Distribution of Watches: A Network Analysis of Trade Relations 手表的全球分布:贸易关系的网络分析

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.86

P. Donzé, Ken Ishibashi, Bo Wu, Yuta Kaneko, Kei Miyazaki, Keiji Takai

引用次数: 1

High-Dimensional Density Estimation for Data Mining Tasks 数据挖掘任务的高维密度估计

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.74

Alexander P. Kuleshov, A. Bernstein, Y. Yanovich

引用次数: 2

Employer Industry Classification Using Job Postings 雇主行业分类使用职位公告

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.30

Mahak Goindani, Qiaoling Liu, Josh Chao, V. Jijkoun

{"title":"Employer Industry Classification Using Job Postings","authors":"Mahak Goindani, Qiaoling Liu, Josh Chao, V. Jijkoun","doi":"10.1109/ICDMW.2017.30","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.30","url":null,"abstract":"In the recruitment domain, knowing the employer industry of jobs is important to get an insight about the demand in each industry. The existing system at CareerBuilder uses an employer name normalization system and an employer knowledge base to infer the employer industry of a job. However, errors may occur during the computation of the job employer and in the construction of the employer knowledge base with the industry attributes. Since the knowledge base is huge, it is not possible to manually detect the errors. Therefore, in this paper we use Machine Learning techniques to automatically detect the errors. With the observation that the main jobs posted by an employer often relate to the employer industry, e.g., truck driver jobs often correspond to employers belonging to the transportation industry, we develop a system that classifies the industry of an employer using job posting data. We aggregate job postings from an employer and use job titles and employer names as features for predicting the industry of the employer. We used two models for classification: (1) Support Vector Machine, and (2) Gradient Boosted Decision Trees, and observed that while both the models perform similarly in classifying job employers that were correctly computed, GBDT is more effective than SVM in identifying job employers that were wrongly computed. We also show the utility of our system in detecting normalization errors and knowledge base errors.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114634336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Evaluation of Non-linearity in MIR Spectroscopic Data for Compressed Learning 压缩学习中MIR光谱数据的非线性评价

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.77

Dixon Vimalajeewa, D. Berry, Eric Robson, C. Kulatunga

{"title":"Evaluation of Non-linearity in MIR Spectroscopic Data for Compressed Learning","authors":"Dixon Vimalajeewa, D. Berry, Eric Robson, C. Kulatunga","doi":"10.1109/ICDMW.2017.77","DOIUrl":"https://doi.org/10.1109/ICDMW.2017.77","url":null,"abstract":"Mid-Infrared (MIR) spectroscopy has emerged as the most economically viable technology to determine milk values as well as to identify a set of animal phenotypes related to health, feeding, well-being and environment. However, Fourier transform-MIR spectra incurs a significant amount of redundant data. This creates critical issues such as increased learning complexity while performing Fog and Cloud based data analytics in smart farming. These issues can be resolved through data compression using unsupervisory techniques like PCA, and perform analytics in the compressed-domain i.e. without decompressing. Compression algorithms should preserve non-linearity of MIRS data (if exists), since emerging advanced learning algorithms can improve their prediction accuracy. This study has investigated the non-linearity between the feature variables in the measurement-domain as well as in two compressed domains using standard Linear PCA and Kernel PCA. Also, the non-linearity between the feature variables and the commonly used target milk quality parameters (Protein, Lactose, Fat) has been analyzed. The study evaluates the prediction accuracy using PLS and LS-SVM respectively as linear and nonlinear predictive models.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130605623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Measuring Network Structure Metrics as a Proxy for Socio-Political Activity in Social Media 衡量网络结构指标作为社会政治活动在社交媒体中的代理

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI: 10.1109/ICDMW.2017.120

Selvas Mwanza, H. Suleman

引用次数: 2