2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)最新文献

筛选
英文 中文
Clustering categorical data: A stability analysis framework 聚类分类数据:一个稳定性分析框架
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949452
I. Jarman, T. Etchells, P. Lisboa, Charlene Beynon, J. Martín-Guerrero
{"title":"Clustering categorical data: A stability analysis framework","authors":"I. Jarman, T. Etchells, P. Lisboa, Charlene Beynon, J. Martín-Guerrero","doi":"10.1109/CIDM.2011.5949452","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949452","url":null,"abstract":"Clustering to identify inherent structure is an important first step in data exploration. The k-means algorithm is a popular choice, but K-means is not generally appropriate for categorical data. A specific extension of k-means for categorical data is the k-modes algorithm. Both of these partition clustering methods are sensitive to the initialization of prototypes, which creates the difficulty of selecting the best solution for a given problem. In addition, selecting the number of clusters can be an issue. Further, the k-modes method is especially prone to instability when presented with ‘noisy’ data, since the calculation of the mode lacks the smoothing effect inherent in the calculation of the mean. This is often the case with real-world datasets, for instance in the domain of Public Health, resulting in solutions that can be radically different depending on the initialization and therefore lead to different interpretations. This paper presents two methodologies. The first addresses sensitivity to initializations using a generic landscape mapping of k-mode solutions. The second methodology utilizes the landscape map to stabilize the partition clusters for discrete data, by drawing a consensus sample in order to separate signal from noise components. Results are presented for the benchmark soybean disease dataset, an artificially generated dataset and a case study involving Public Health data.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
KB-CB-N classification: Towards unsupervised approach for supervised learning KB-CB-N分类:面向监督学习的无监督方法
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949435
Z. Abdallah, M. Gaber
{"title":"KB-CB-N classification: Towards unsupervised approach for supervised learning","authors":"Z. Abdallah, M. Gaber","doi":"10.1109/CIDM.2011.5949435","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949435","url":null,"abstract":"Data classification has attracted considerable research attention in the field of computational statistics and data mining due to its wide range of applications. K Best Cluster Based Neighbour (KB-CB-N) is our novel classification technique based on the integration of three different similarity measures for cluster based classification. The basic principle is to apply unsupervised learning on the instances of each class in the dataset and then use the output as an input for the classification algorithm to find the K best neighbours of clusters from the density, gravity and distance perspectives. Clustering is applied as an initial step within each class to find the inherent in-class grouping in the dataset. Different data clustering techniques use different similarity measures. Each measure has its own strength and weakness. Thus, combining the three measures can benefit from the strength of each one and eliminate encountered problems of using an individual measure. Extensive experimental results using eight real datasets have evidenced that our new technique typically shows improved or equivalent performance over other existing state-of-the-art classification methods.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Online autoregressive prediction in time series with delayed disclosure 时滞披露时间序列的在线自回归预测
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949440
J. Andreoli, Marie-Luise Schneider
{"title":"Online autoregressive prediction in time series with delayed disclosure","authors":"J. Andreoli, Marie-Luise Schneider","doi":"10.1109/CIDM.2011.5949440","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949440","url":null,"abstract":"We propose a supervised machine learning method to automate the classification of events within time series in a monitoring context. It is based on a generative stochastic model of the time series which combines a probabilistic autoregressive classifier to determine the class label of each event, and a hidden Markov model to capture the production of the events. Events can be described by arbitrary combinations of discrete and continuous features. While at training time (offline), it is assumed that the class labels of all the events are known, at inference time (online), when a prediction is to be made for an event, it is not assumed that the class labels of the preceding events are known. This makes prediction more complex due to the autoregressive nature of the model. Instead, we make and exploit a “delayed disclosure” assumption, namely that the class labels of all the events are eventually revealed, but the occurrence of an event and the revelation of its class are asynchronous. We report experimental results obtained by application of this approach to the monitoring of a fleet of distributed devices.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121920908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partially supervised k-harmonic means clustering 部分监督k调和均值聚类
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949424
T. Runkler
{"title":"Partially supervised k-harmonic means clustering","authors":"T. Runkler","doi":"10.1109/CIDM.2011.5949424","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949424","url":null,"abstract":"A popular algorithm for finding clusters in unlabeled data optimizes the k-means clustering model. This algorithm converges quickly but is sensitive to initialization. Two ways to overcome this drawback are fuzzification and harmonic means. We show that k-harmonic means is a special case of reformulated fuzzy k-means. The main focus of this paper is on partially supervised clustering. Partially supervised clustering finds clusters in data sets that contain both unlabeled and labeled data. We review partially supervised k-means, partially supervised fuzzy k-means, and introduce a partially supervised extension of k-harmonic means. Experiments with four benchmark data sets indicate that partially supervised k-harmonic means inherits the advantages of its completely unsupervised variant: It is significantly less sensitive to initialization than partially supervised k-means.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Increased classification accuracy and speedup through pair-wise feature selection for support vector machines 通过对支持向量机的成对特征选择,提高了分类精度和速度
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949457
K. Kramer, Dmitry Goldgof, L. Hall, A. Remsen
{"title":"Increased classification accuracy and speedup through pair-wise feature selection for support vector machines","authors":"K. Kramer, Dmitry Goldgof, L. Hall, A. Remsen","doi":"10.1109/CIDM.2011.5949457","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949457","url":null,"abstract":"Support vector machines are binary classifiers that can implement multi-class classifiers by creating a classifier for each possible combination of classes or for each class using a one class versus all strategy. Feature selection algorithms often search for a single set of features to be used by each of the binary classifiers. This ignores the fact that features that may be good discriminators for two particular classes might not do well for other class combinations. As a result, the feature selection process may not include these features in the common set to be used by all support vector machines. It is shown that by selecting features for each binary class combination, overall classification accuracy can be improved (as much as 2.1%), feature selection time can be significantly reduced (speed up of 3.2 times), and time required for training a multi-class support vector machine is reduced. Another benefit of this approach is that considerably less time is required for feature selection when additional classes are added to the training data. This is because the features selected for the existing class combinations are still valid, so that feature selection only needs to be run for the new class combinations created.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Partial generalized correlation for hyperspectral data 高光谱数据的部分广义相关
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949422
M. Strickert, B. Labitzke, V. Blanz
{"title":"Partial generalized correlation for hyperspectral data","authors":"M. Strickert, B. Labitzke, V. Blanz","doi":"10.1109/CIDM.2011.5949422","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949422","url":null,"abstract":"A variational approach is proposed for the unsupervised assessment of attribute variability of high-dimensional data given a differentiable similarity measure. The key question addressed is how much each data attribute contributes to an optimum transformation of vectors for reaching maximum similarity. This question is formalized and solved in a mathematically rigorous optimization framework for each data pair of interest. Trivially, for the Euclidean metric minimization to zero distance induces highest vector similarity, but in case of the linear Pearson correlation measure the highest similarity of one is desired. During optimization the not necessarily symmetric trajectories between two vectors are recorded and analyzed in terms of attribute changes and line integral. The proposed formalism allows to assess partial covariance and correlation characteristics of data attributes for vectors being compared by any differentiable similarity measure. Its potential for generating alternative and localized views such as for contrast enhancement is demonstrated for hyperspectral images from the remote sensing domain.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133717725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Periodic quick test for classifying long-term activities 对长期活动进行分类的定期快速测试
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949426
Pekka Siirtola, Heli Koskimäki, J. Röning
{"title":"Periodic quick test for classifying long-term activities","authors":"Pekka Siirtola, Heli Koskimäki, J. Röning","doi":"10.1109/CIDM.2011.5949426","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949426","url":null,"abstract":"A novel method to classify long-term human activities is presented in this study. The method consists of two parts: quick test and periodic classification. The quick test uses temporal information to improve recognition accuracy, while the periodic classification is based on the assumption that recognized activities are long-term. Periodic quick test (PQT) classification was tested using a data set consisting of six long-term sports exercises. The data were collected from six persons wearing a two-dimensional accelerometer on their wrist. The results show that the presented method is not only faster than a normal method, that does not use temporal information and does not assume that activities are long-term, but also more accurate. The results were compared with a normal sliding window technique which divides signal into smaller sequences and classifies each sequence into one of the six classes. The classification accuracy using a normal method was around 84% while using PQT the recognition rate was over 90%. In addition, the number of classified sequences using a normal method was over six times higher than using PQT.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116384545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
FGMAC: Frequent subgraph mining with Arc Consistency 基于弧一致性的频繁子图挖掘
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949436
Brahim Douar, M. Liquiere, C. Latiri, Y. Slimani
{"title":"FGMAC: Frequent subgraph mining with Arc Consistency","authors":"Brahim Douar, M. Liquiere, C. Latiri, Y. Slimani","doi":"10.1109/CIDM.2011.5949436","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949436","url":null,"abstract":"With the important growth of requirements to analyze large amount of structured data such as chemical compounds, proteins structures, XML documents, to cite but a few, graph mining has become an attractive track and a real challenge in the data mining field. Among the various kinds of graph patterns, frequent subgraphs seem to be relevant in characterizing graphsets, discriminating different groups of sets, and classifying and clustering graphs. Because of the NP-Completeness of subgraph isomorphism test as well as the huge search space, fragment miners are exponential in runtime and/or memory consumption. In this paper we study a new polynomial projection operator named AC-Projection based on a key technique of constraint programming namely Arc Consistency (AC). This is intended to replace the use of the exponential subgraph isomorphism. We study the relevance of frequent AC-reduced graph patterns on classification and we prove that we can achieve an important performance gain without or with non-significant loss of discovered pattern's quality.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131819513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiple query-dependent RankSVM aggregation for document retrieval 用于文档检索的多查询依赖的RankSVM聚合
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949420
Yang Wang, Min Lu, X. Pang, Maoqiang Xie, Yalou Huang
{"title":"Multiple query-dependent RankSVM aggregation for document retrieval","authors":"Yang Wang, Min Lu, X. Pang, Maoqiang Xie, Yalou Huang","doi":"10.1109/CIDM.2011.5949420","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949420","url":null,"abstract":"This paper is concerned with supervised rank aggregation, which aims to improve the ranking performance by combining the outputs from multiple rankers. However, there are two main shortcomings in previous rank aggregation approaches. Firstly, the learned weights for base rankers do not distinguish the differences among queries. This is suboptimal since queries vary significantly in terms of ranking. Besides, most current aggregation functions are unsupervised. A supervised aggregation function could further improve the ranking performance. In this paper, the significant difference existing among queries is taken into consideration, and a supervised rank aggregation approach is proposed. As a case study, we employ RankSVM model to aggregate the base rankers, referred to as Q.D.RSVM, and prove that Q.D.RSVM can set up query-dependent weights for different base rankers. Experimental results based on benchmark datasets show our approach outperforms conventional ranking approaches.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125509437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A GPU-based interactive bio-inspired visual clustering 基于gpu的交互式生物视觉聚类
2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949300
U. Erra, Bernardino Frola, V. Scarano
{"title":"A GPU-based interactive bio-inspired visual clustering","authors":"U. Erra, Bernardino Frola, V. Scarano","doi":"10.1109/CIDM.2011.5949300","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949300","url":null,"abstract":"In this work, we present an interactive visual clustering approach for the exploration and analysis of vast volumes of data. Our proposed approach is a bio-inspired collective behavioral model to be used in a 3D graphics environment. Our paper illustrates an extension of the behavioral model for clustering and a parallel implementation, using Compute Unified Device Architecture to exploit the computational power of Graphics Processor Units (GPUs). The advantage of our approach is that, as data enters the environment, the user is directly involved in the data mining process. Our experiments illustrate the effectiveness and efficiency provided by our approach when applied to a number of real and synthetic data sets.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130005845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信