2018 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
Robust Binary Classification via ℓ0-SVM 基于l0 - svm的鲁棒二值分类
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00180
Jianxiong Tang, N. Zhang, Qia Li
{"title":"Robust Binary Classification via ℓ0-SVM","authors":"Jianxiong Tang, N. Zhang, Qia Li","doi":"10.1109/ICDMW.2018.00180","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00180","url":null,"abstract":"Binary classification is one of the fundamental problems in data mining and the support vector machine (SVM) has been successfully used in binary classification problems. In many applications, the data sets are often polluted by label noise especially in the case that human experts are involved. The performance of the classical SVM may be not enough satisfactory for label noisy data sets. From the view of maximum likelihood estimation, the 0-1 loss function is an appropriate loss function for label noisy data. However, the existence of minimizers of the corresponding optimization problem is not guaranteed. In this paper, we bring the idea of 0-1 loss as well as the hinge loss and propose the ℓ0-norm hinge loss. The function value of the ℓ0-norm hinge loss is 1 as the product of the label and the projection of a sample is less than some small positive number. Otherwise, the function value is 0. Based on the l0-norm hinge loss, we first propose the linear ℓ0-SVM and then design the nonlinear ℓ0-SVM by introducing the kernel function. Compared with the classical SVM, the piecewise constant property of the ℓ0-norm hinge loss makes it robust for label noise. The optimization problems in both linear and nonlinear ℓ0-SVMs are ensured to have minimizers. To solve the corresponding optimization problem, we first utilize the penalty method to decompose the ℓ0-norm and the corresponding linear mapping. Then, the block coordinate decent method with convergence in the sense that the objective function value decreases and converges can be adjusted to solve the penalty problem. Experiments show that the proposed ℓ0-SVM performs well in applications.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Online and Semi-Online Vector Scheduling on A Single Machine with Rejection 带拒绝的单机在线和半在线矢量调度
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00141
Qianna Cui, Haiwei Pan
{"title":"Online and Semi-Online Vector Scheduling on A Single Machine with Rejection","authors":"Qianna Cui, Haiwei Pan","doi":"10.1109/ICDMW.2018.00141","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00141","url":null,"abstract":"In this paper, we design an online algorithm for vector scheduling on a single machine with rejection and its competitive ratio is d, where d is the dimensions of vector. In addition, we consider two versions of semi-online vector scheduling on a single machine with rejection. In the first version, semi-online with rearrangement allows at most one job to be reassigned after scheduling all jobs, then we show a semi-online algorithm with competitive ratio 1/2 d+2 for d > 3. The second version is semi-online with rejection buffer whose length= 1, which can hold one job. When d > 3, we also give an algorithm with competitive ratio 1/2 d + 2.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129871153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Psychometric Scoring of Document Collections 文献收集的半监督心理测量评分
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00194
Burak Suyunu, Gonul Ayci, Mine Ögretir, A. Cemgil, S. Uskudarli, Hamza Zeytinoglu, Bülent Özel, Arman Boyaci
{"title":"Semi-Supervised Psychometric Scoring of Document Collections","authors":"Burak Suyunu, Gonul Ayci, Mine Ögretir, A. Cemgil, S. Uskudarli, Hamza Zeytinoglu, Bülent Özel, Arman Boyaci","doi":"10.1109/ICDMW.2018.00194","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00194","url":null,"abstract":"We describe a generic computational approach that can be used in developing methods for psychometric profiling. Our approach is based on semi-supervised analysis of document collections using topic modeling. The method depends on a supervisor providing a set of seed documents, grouped by abstract themes, such as Schwartz values or personality traits; and possibly a separate background document corpus. Instead of casting the problem into a standard classification framework, we interpret the group labels as a guide for finding distinguishing features. During training, we train each group of documents associated with a theme separately by using nonnegative matrix factorization to obtain theme specific topic distributions. In the analysis, we decompose a new document using the model learned during training to arrive at the theme scores. We demonstrate our approach on two psychometric profiling theories (Schwartz and Big Five) and evaluate our Schwartz scores with leave-one-out cross-validation method and compare Big Five scores to independent surveys, which are much more costly to carry out.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124581832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Network-Based Approach to Enhance Electricity Load Forecasting 基于网络的电力负荷预测方法研究
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00046
Etienne Gael Tajeuna, M. Bouguessa, Shengrui Wang
{"title":"A Network-Based Approach to Enhance Electricity Load Forecasting","authors":"Etienne Gael Tajeuna, M. Bouguessa, Shengrui Wang","doi":"10.1109/ICDMW.2018.00046","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00046","url":null,"abstract":"In the field of energy analysis, time series forecasting techniques are widely used to predict customer electricity consumptions. To enhance the electricity forecasting accuracy, in current approaches, clustering techniques are first applied to identify groups of customers exhibiting the same electricity load profile, from which a representative consumption pattern can be extracted. This pattern is later used to predict customers' subsequent electricity consumption. In the vast majority of clustering approaches, authors use the entire data set as input to identify customer consumption groups. However, electricity load data vary extremely rapidly and can thus be dominated by outdated historical information which may influence the effective cluster status at a given time-stamp. To overcome this constraint, instead of using the entire data set, we propose an adaptive process which involves tracking the evolution of identified customer consumption groups at different time-stamps. A network structure is used to model the interrelation between customer electricity load profiles. The network is then split into subnetworks that are treated as customer electricity consumption clusters. Representative subseries, called master subseries, are extracted to track the evolution of clusters over time. Finally, the master subseries are used as a knowledge base for forecasting customers' electricity consumption at later time-stamps and automatically predicting future cluster status. The load forecasting is done using a seasonal autoregressive integrated moving average model, which is compared to a multi-layer perceptron, support vector regression, lasso regression, bayesian ridge regression and K-nearest neighbor regression models.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LiSense: Monitoring City Street Lighting During Night using Smartphone Sensors LiSense:夜间使用智能手机传感器监控城市街道照明
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00092
Munshi Yusuf Alam, Shahrukh Imam, H. Anurag, Sujoy Saha, S. Nandi, M. Saha
{"title":"LiSense: Monitoring City Street Lighting During Night using Smartphone Sensors","authors":"Munshi Yusuf Alam, Shahrukh Imam, H. Anurag, Sujoy Saha, S. Nandi, M. Saha","doi":"10.1109/ICDMW.2018.00092","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00092","url":null,"abstract":"Adequate illumination of city streets during night hours is essential to ensure road safety. However, even for developed cities, monitoring streetlights still remain a tedious task that relies on manual inspection reports. Existing systems mostly rely on vehicle-mounted camera or sensors fitted at every light post that is not cost-effective and scalable. In contrary, in this paper, we develop a novel cost-effective system LiSense to monitor illumination levels of street lights and detect as well as localize malfunctioning light posts. The system utilizes ambient light and GPS sensors and uses crowdsourcing. Sensor trails collected by our App from 2-wheeler covering 160 km suburban city road detects all malfunctioning street lights more than 96% in accuracy with a mean localization error of 6 meters. To the best of our knowledge, this is the first of its kind approach to monitoring street light condition which is cost-effective, scalable and suitable for developing regions.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126008574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures 基于多核CPU架构的地理空间数据集并行k均值聚类
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00118
R. Mills, Vamsi Sripathi, J. Kumar, S. Sreepathi, F. Hoffman, W. Hargrove
{"title":"Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures","authors":"R. Mills, Vamsi Sripathi, J. Kumar, S. Sreepathi, F. Hoffman, W. Hargrove","doi":"10.1109/ICDMW.2018.00118","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00118","url":null,"abstract":"The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architectures—such as the Intel Knights Landing Xeon Phi and Skylake Xeon processors—with many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applications of the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hardening Encrypted Patient Names Against Cryptographic Attacks Using Cellular Automata 使用元胞自动机加固加密的患者姓名以抵御加密攻击
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00082
R. Schnell, C. Borgs
{"title":"Hardening Encrypted Patient Names Against Cryptographic Attacks Using Cellular Automata","authors":"R. Schnell, C. Borgs","doi":"10.1109/ICDMW.2018.00082","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00082","url":null,"abstract":"Linking information across different databases enables new research in the medical sciences. Recent EU privacy regulations recommend encrypting personal identifiers used for linking. In this contribution, a new method for hardening such a privacy-preserving record linkage technique (PPRL) against attacks is presented. The new hardening method prevents re-identifications and cryptographic attacks while still delivering acceptable linkage quality. Using real-world mortality data, we compare clear-text and several current PPRL methods with our newly proposed method. While all PPRL methods will have to balance security and quality, the use of a cellular automata transformation to protect against attacks will decrease the linkage quality only slightly, while preventing all currently known methods of decrypting Bloom filter-based private linkage keys.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127456555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
High Dimensional Clustering: A Strongly Connected Component Clustering Solution (SCCC) 高维聚类:一种强连通组件聚类解决方案
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00159
Mihir Shekhar, Lini T. Thomas, K. Karlapalem
{"title":"High Dimensional Clustering: A Strongly Connected Component Clustering Solution (SCCC)","authors":"Mihir Shekhar, Lini T. Thomas, K. Karlapalem","doi":"10.1109/ICDMW.2018.00159","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00159","url":null,"abstract":"High dimensional data is often challenging to cluster due to the curse of dimensionality leading to challenges in identifying clusters. The key challenge in high dimensional clustering is to develop a solution that identifies clusters which are as complete as they can be, while not merging well-separated clusters. We propose core points which represent local compact regions. The strongly connected component from the k-nearest neighbor graph of core points provides for a group of points that are strongly mutually connected. These mutually connected regions represent the core structure of the clusters. Our empirical analysis and experimental results present the rationale behind our solution and validate the goodness of the clusters against the state of the art high dimensional clustering algorithms. The novelty of our solution is to use the concept of reverse nearest neighbors to generate natural clusters in high dimensions.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"49 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133685495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Preliminary Case Study on Data Utilization and Collaboration on the Web 网络上数据利用与协作的初步案例研究
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00033
Daiji Iwasa
{"title":"Preliminary Case Study on Data Utilization and Collaboration on the Web","authors":"Daiji Iwasa","doi":"10.1109/ICDMW.2018.00033","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00033","url":null,"abstract":"Recently, data holders collect various kinds of data owing to the improvement of Internet of Things (IoT) technologies. On the other hand, data analytic can observe even large data owing to the spread of analysis methods/tools and strong computing power. The critical point for accelerating data utilization is the communication of data stakeholders. Data analysts should consider the purpose for which data holders collect data. However, the communication among stakeholders is hard in case some of them are not familiar with data. Innovators Marketplace on Data Jackets (IMDJ) [5] is a workshop method to tackle this problem. In IMDJ workshop, participants state their requirements and create a scenario for solving these requirements based on Data Jackets. Data Jacket (DJ) [4] is a framework to describe structured information about data in natural language, which enable for those who are not familiar with data to discuss based on data. In this paper, we introduce a platform called Web-IMDJ for conducting IMDJ workshop on the web. Web-IMDJ not only reduces the burden of workshops but enables to participate in workshop remotely. By conducting workshop on Web-IMDJ as case study, we found that the number of ideas is as many as previous IMDJ and the capacity of participants is superior in Web-IMDJ.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130743447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Connections Between Domains through Latent Space Mapping 利用潜在空间映射挖掘领域之间的联系
2018 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2018-11-01 DOI: 10.1109/ICDMW.2018.00157
Yingjing Lu
{"title":"Mining Connections Between Domains through Latent Space Mapping","authors":"Yingjing Lu","doi":"10.1109/ICDMW.2018.00157","DOIUrl":"https://doi.org/10.1109/ICDMW.2018.00157","url":null,"abstract":"Exploring ways to connect data is crucial to building knowledge graphs to associate data from different domains together. Humans, for example, can learn to associate flour with bread because bread is made of flour so that they can recall information of flour given a piece of bread even though bread and flour have few common features. In data mining, this ability can be translated to the way to connect images, texts, audios from different classes or domains together. Most works so far assume shared feature representations between domains we want to connect together. Another limitation yet to be improved is that for each defined mapping scheme, we often have to train a new model end-to-end among all sample data, which is often expensive. In this work, we present a model that aims to simultaneously address the two limitations. We use unconditionally trained Variational Autoencoders(VAEs) to project high dimensional data into the latent space and present a novel generative model that transfer latent representation of data from one domain to another by any custom schema. The model makes no assumption on any shared representation among different domains. The VAEs that encodes entire datasets, being the largest training overhead in this model, can be reused to support any new mapping schema without any retraining.","PeriodicalId":259600,"journal":{"name":"2018 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117236938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信