2018 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献_第4页

Robust Binary Classification via ℓ0-SVM 基于l0 - svm的鲁棒二值分类

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00180

Jianxiong Tang, N. Zhang, Qia Li

{"title":"Robust Binary Classification via ℓ0-SVM","authors":"Jianxiong Tang, N. Zhang, Qia Li","doi":"10.1109/ICDMW.2018.00180","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00180","url":null,"abstract":"Binary classification is one of the fundamental problems in data mining and the support vector machine (SVM) has been successfully used in binary classification problems. In many applications, the data sets are often polluted by label noise especially in the case that human experts are involved. The performance of the classical SVM may be not enough satisfactory for label noisy data sets. From the view of maximum likelihood estimation, the 0-1 loss function is an appropriate loss function for label noisy data. However, the existence of minimizers of the corresponding optimization problem is not guaranteed. In this paper, we bring the idea of 0-1 loss as well as the hinge loss and propose the ℓ0-norm hinge loss. The function value of the ℓ0-norm hinge loss is 1 as the product of the label and the projection of a sample is less than some small positive number. Otherwise, the function value is 0. Based on the l0-norm hinge loss, we first propose the linear ℓ0-SVM and then design the nonlinear ℓ0-SVM by introducing the kernel function. Compared with the classical SVM, the piecewise constant property of the ℓ0-norm hinge loss makes it robust for label noise. The optimization problems in both linear and nonlinear ℓ0-SVMs are ensured to have minimizers. To solve the corresponding optimization problem, we first utilize the penalty method to decompose the ℓ0-norm and the corresponding linear mapping. Then, the block coordinate decent method with convergence in the sense that the objective function value decreases and converges can be adjusted to solve the penalty problem. Experiments show that the proposed ℓ0-SVM performs well in applications.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Online and Semi-Online Vector Scheduling on A Single Machine with Rejection 带拒绝的单机在线和半在线矢量调度

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00141

Qianna Cui, Haiwei Pan

引用次数: 0

Semi-Supervised Psychometric Scoring of Document Collections 文献收集的半监督心理测量评分

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00194

Burak Suyunu, Gonul Ayci, Mine Ögretir, A. Cemgil, S. Uskudarli, Hamza Zeytinoglu, Bülent Özel, Arman Boyaci

{"title":"Semi-Supervised Psychometric Scoring of Document Collections","authors":"Burak Suyunu, Gonul Ayci, Mine Ögretir, A. Cemgil, S. Uskudarli, Hamza Zeytinoglu, Bülent Özel, Arman Boyaci","doi":"10.1109/ICDMW.2018.00194","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00194","url":null,"abstract":"We describe a generic computational approach that can be used in developing methods for psychometric profiling. Our approach is based on semi-supervised analysis of document collections using topic modeling. The method depends on a supervisor providing a set of seed documents, grouped by abstract themes, such as Schwartz values or personality traits; and possibly a separate background document corpus. Instead of casting the problem into a standard classification framework, we interpret the group labels as a guide for finding distinguishing features. During training, we train each group of documents associated with a theme separately by using nonnegative matrix factorization to obtain theme specific topic distributions. In the analysis, we decompose a new document using the model learned during training to arrive at the theme scores. We demonstrate our approach on two psychometric profiling theories (Schwartz and Big Five) and evaluate our Schwartz scores with leave-one-out cross-validation method and compare Big Five scores to independent surveys, which are much more costly to carry out.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124581832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Network-Based Approach to Enhance Electricity Load Forecasting 基于网络的电力负荷预测方法研究

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00046

Etienne Gael Tajeuna, M. Bouguessa, Shengrui Wang

{"title":"A Network-Based Approach to Enhance Electricity Load Forecasting","authors":"Etienne Gael Tajeuna, M. Bouguessa, Shengrui Wang","doi":"10.1109/ICDMW.2018.00046","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00046","url":null,"abstract":"In the field of energy analysis, time series forecasting techniques are widely used to predict customer electricity consumptions. To enhance the electricity forecasting accuracy, in current approaches, clustering techniques are first applied to identify groups of customers exhibiting the same electricity load profile, from which a representative consumption pattern can be extracted. This pattern is later used to predict customers' subsequent electricity consumption. In the vast majority of clustering approaches, authors use the entire data set as input to identify customer consumption groups. However, electricity load data vary extremely rapidly and can thus be dominated by outdated historical information which may influence the effective cluster status at a given time-stamp. To overcome this constraint, instead of using the entire data set, we propose an adaptive process which involves tracking the evolution of identified customer consumption groups at different time-stamps. A network structure is used to model the interrelation between customer electricity load profiles. The network is then split into subnetworks that are treated as customer electricity consumption clusters. Representative subseries, called master subseries, are extracted to track the evolution of clusters over time. Finally, the master subseries are used as a knowledge base for forecasting customers' electricity consumption at later time-stamps and automatically predicting future cluster status. The load forecasting is done using a seasonal autoregressive integrated moving average model, which is compared to a multi-layer perceptron, support vector regression, lasso regression, bayesian ridge regression and K-nearest neighbor regression models.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

LiSense: Monitoring City Street Lighting During Night using Smartphone Sensors LiSense:夜间使用智能手机传感器监控城市街道照明

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00092

Munshi Yusuf Alam, Shahrukh Imam, H. Anurag, Sujoy Saha, S. Nandi, M. Saha

引用次数: 5

Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures 基于多核CPU架构的地理空间数据集并行k均值聚类

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00118

R. Mills, Vamsi Sripathi, J. Kumar, S. Sreepathi, F. Hoffman, W. Hargrove

{"title":"Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures","authors":"R. Mills, Vamsi Sripathi, J. Kumar, S. Sreepathi, F. Hoffman, W. Hargrove","doi":"10.1109/ICDMW.2018.00118","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00118","url":null,"abstract":"The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architectures—such as the Intel Knights Landing Xeon Phi and Skylake Xeon processors—with many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applications of the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hardening Encrypted Patient Names Against Cryptographic Attacks Using Cellular Automata 使用元胞自动机加固加密的患者姓名以抵御加密攻击

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00082

R. Schnell, C. Borgs

引用次数: 7

High Dimensional Clustering: A Strongly Connected Component Clustering Solution (SCCC) 高维聚类:一种强连通组件聚类解决方案

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00159

Mihir Shekhar, Lini T. Thomas, K. Karlapalem

引用次数: 1

Preliminary Case Study on Data Utilization and Collaboration on the Web 网络上数据利用与协作的初步案例研究

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00033

Daiji Iwasa

{"title":"Preliminary Case Study on Data Utilization and Collaboration on the Web","authors":"Daiji Iwasa","doi":"10.1109/ICDMW.2018.00033","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00033","url":null,"abstract":"Recently, data holders collect various kinds of data owing to the improvement of Internet of Things (IoT) technologies. On the other hand, data analytic can observe even large data owing to the spread of analysis methods/tools and strong computing power. The critical point for accelerating data utilization is the communication of data stakeholders. Data analysts should consider the purpose for which data holders collect data. However, the communication among stakeholders is hard in case some of them are not familiar with data. Innovators Marketplace on Data Jackets (IMDJ) [5] is a workshop method to tackle this problem. In IMDJ workshop, participants state their requirements and create a scenario for solving these requirements based on Data Jackets. Data Jacket (DJ) [4] is a framework to describe structured information about data in natural language, which enable for those who are not familiar with data to discuss based on data. In this paper, we introduce a platform called Web-IMDJ for conducting IMDJ workshop on the web. Web-IMDJ not only reduces the burden of workshops but enables to participate in workshop remotely. By conducting workshop on Web-IMDJ as case study, we found that the number of ideas is as many as previous IMDJ and the capacity of participants is superior in Web-IMDJ.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130743447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mining Connections Between Domains through Latent Space Mapping 利用潜在空间映射挖掘领域之间的联系

2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00157

Yingjing Lu

{"title":"Mining Connections Between Domains through Latent Space Mapping","authors":"Yingjing Lu","doi":"10.1109/ICDMW.2018.00157","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00157","url":null,"abstract":"Exploring ways to connect data is crucial to building knowledge graphs to associate data from different domains together. Humans, for example, can learn to associate flour with bread because bread is made of flour so that they can recall information of flour given a piece of bread even though bread and flour have few common features. In data mining, this ability can be translated to the way to connect images, texts, audios from different classes or domains together. Most works so far assume shared feature representations between domains we want to connect together. Another limitation yet to be improved is that for each defined mapping scheme, we often have to train a new model end-to-end among all sample data, which is often expensive. In this work, we present a model that aims to simultaneously address the two limitations. We use unconditionally trained Variational Autoencoders(VAEs) to project high dimensional data into the latent space and present a novel generative model that transfer latent representation of data from one domain to another by any custom schema. The model makes no assumption on any shared representation among different domains. The VAEs that encodes entire datasets, being the largest training overhead in this model, can be reused to support any new mapping schema without any retraining.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117236938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0