2009 IEEE International Conference on Data Mining Workshops最新文献

筛选
英文 中文
Spatio-temporal Multi-dimensional Relational Framework Trees 时空多维关系框架树
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.95
Matthew Bodenhamer, Samuel Bleckley, Daniel Fennelly, A. Fagg, A. McGovern
{"title":"Spatio-temporal Multi-dimensional Relational Framework Trees","authors":"Matthew Bodenhamer, Samuel Bleckley, Daniel Fennelly, A. Fagg, A. McGovern","doi":"10.1109/ICDMW.2009.95","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.95","url":null,"abstract":"The real world is composed of sets of objects that move and morph in both space and time. Useful concepts can be defined in terms of the complex interactions between the multi-dimensional attributes of subsets of these objects and of the relationships that exist between them. In this paper, we present Spatiotemporal Multi-dimensional Relational Framework (SMRF) Trees, a new data mining technique that extends the successful Spatiotemporal Relational Probability Tree models. From a set of labeled, multi-object examples of a target concept, our algorithm infers both the set of objects that participate in the concept and the key object and relation attributes that describe the concept. In contrast to other relational model approaches, SMRF trees do not rely on pre-defined relations between objects. Instead, our algorithm infers the relations from the continuous attributes. In addition, our approach explicitly acknowledges the multi-dimensional nature of attributes such as position, orientation and color. Our method performs well in exploratory experiments, demonstrating its viability as a relational learning approach.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130760089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Efficient Dense Structure Mining Using MapReduce 基于MapReduce的高效密集结构挖掘
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.48
Shengqi Yang, Bai Wang, Haizhou Zhao, Bin Wu
{"title":"Efficient Dense Structure Mining Using MapReduce","authors":"Shengqi Yang, Bai Wang, Haizhou Zhao, Bin Wu","doi":"10.1109/ICDMW.2009.48","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.48","url":null,"abstract":"Structure mining plays an important part in the researches in biology, physics, Internet and telecommunications in recently emerging network science. As a main task in this area, the problem of structure mining on graph has attracted much interest and been studied in variant avenues in prior works. However, most of these works mainly rely on single chip computational capacity and have been constrained by local optimization. Thus it is an impossible mission for these methods to process massive graphs. In this paper, we propose an unified distributed method in solving some critical graph mining problems on top of a cluster system with the help of MapReduce. These problems include graph transformation, subgraph partition, maximal clique enumeration, connected component finding and community detection. All of these methods are implemented to fully utilize MapReduce execution mechanism, namely the “map-reduce” process. Moreover, considering how our algorithms can be applied in further “cloud” service, we employ several large scale datasets to demonstrate the efficiency and scalability of our solutions.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121372317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Probabilistic Labeled Semi-supervised SVM 概率标记半监督支持向量机
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.14
Mingjie Qian, F. Nie, Changshui Zhang
{"title":"Probabilistic Labeled Semi-supervised SVM","authors":"Mingjie Qian, F. Nie, Changshui Zhang","doi":"10.1109/ICDMW.2009.14","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.14","url":null,"abstract":"Semi-supervised learning has been paid increasing attention and is widely used in many fields such as data mining, information retrieval and knowledge management as it can utilize both labeled and unlabeled data. Laplacian SVM (LapSVM) is a very classical method whose effectiveness has been validated by large number of experiments. However, LapSVM is sensitive to labeled data and it exposes to cubic computation complexity which limit its application in large scale scenario. In this paper, we propose a multi-class method called Probabilistic labeled Semi-supervised SVM (PLSVM) in which the optimal decision surface is taught by probabilistic labels of all the training data including the labeled and unlabeled data. Then we propose a kernel version dual coordinate descent method to efficiently solve the dual problems of our Probabilistic labeled Semi-supervised SVM and decrease its requirement of memory. Synthetic data and several benchmark real world datasets show that PLSVM is less sensitive to labeling and has better performance over traditional methods like SVM, LapSVM (LapSVM) and Transductive SVM (TSVM).","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125275028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware HOCT:一种在现代硬件上训练线性CRF的高度可扩展算法
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.69
Tianyuan Chen, Lei Chang, Jianqing Ma, Wei Zhang, Feng Gao
{"title":"HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware","authors":"Tianyuan Chen, Lei Chang, Jianqing Ma, Wei Zhang, Feng Gao","doi":"10.1109/ICDMW.2009.69","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.69","url":null,"abstract":"This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"188 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122870551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonsmooth Bilevel Programming for Hyperparameter Selection 超参数选择的非光滑双层规划
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.74
Gregory M. Moore, Charles Bergeron, Kristin P. Bennett
{"title":"Nonsmooth Bilevel Programming for Hyperparameter Selection","authors":"Gregory M. Moore, Charles Bergeron, Kristin P. Bennett","doi":"10.1109/ICDMW.2009.74","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.74","url":null,"abstract":"We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via $T$-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters $C$ (tradeoff) and $epsilon$ (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121267959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Compressed Spectral Clustering 压缩光谱聚类
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.22
Bin Zhao, Changshui Zhang
{"title":"Compressed Spectral Clustering","authors":"Bin Zhao, Changshui Zhang","doi":"10.1109/ICDMW.2009.22","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.22","url":null,"abstract":"Compressed sensing has received much attention in both data mining and signal processing communities. In this paper, we provide theoretical results to show that compressed spectral clustering, separating data samples into different clusters directly in the compressed measurement domain, is possible. Specifically, we provide theoretical bounds guaranteeing that if the data is measured directly in the compressed domain, spectral clustering on the compressed data works almost as well as that in the data domain. Moreover, we show that for a family of well-known compressed sensing matrices, compressed spectral clustering is universal, i. e., clustering in the measurement domain works provided that the data are sparse in some, even unknown, basis. Finally, experimental results on both toy and real world data sets demonstrate that compressed spectral clustering achieves comparable clustering performance with traditional spectral clustering that works directly in the data domain, with much less computational time.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133137575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Study of Language Model for Image Retrieval 图像检索的语言模型研究
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.114
Bo Geng, Linjun Yang, Chao Xu
{"title":"A Study of Language Model for Image Retrieval","authors":"Bo Geng, Linjun Yang, Chao Xu","doi":"10.1109/ICDMW.2009.114","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.114","url":null,"abstract":"Recently, various language model approaches have been proposed in the information retrieval realm, with their promising performances in general document and Web page retrieval applications. Based on these achievements, in this paper, we investigate and discuss whether language model approaches can be adapted to content based image retrieval (CBIR), based on the “bag of visual words” image representation. A critical element of language model estimation is smoothing, which adjusts the maximum likelihood estimation to overcome the data sparseness problem. Therefore, we perform extensive studies over different smoothing methods, strategies, and parameters, by showing their impacts to the retrieval performances. Experiments are performed over two popular image retrieval databases, together with some insightful conclusions to facilitate the adaptation of language model approaches to CBIR.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"12 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124607996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
TubeTagger - YouTube-based Concept Detection TubeTagger -基于youtube的概念检测
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.41
A. Ulges, Markus Koch, Damian Borth, T. Breuel
{"title":"TubeTagger - YouTube-based Concept Detection","authors":"A. Ulges, Markus Koch, Damian Borth, T. Breuel","doi":"10.1109/ICDMW.2009.41","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.41","url":null,"abstract":"We present TubeTagger, a concept-based video retrieval system that exploits web video as an information source. The system performs a visual learning on YouTube clips (i. e., it trains detectors for semantic concepts like \"soccer\" or \"windmill\"), and a semantic learning on the associated tags (i.e., relations between concepts like \"swimming\" and \"water\" are discovered). This way, a text-based video search free of manual indexing is realized. We present a quantitative study on web-based concept detection comparing several features and statistical models on a large-scale dataset of YouTube content. Beyond this, we report several key findings related to concept learning from YouTube and its generalization to different domains, and illustrate certain characteristics of YouTube-learned concepts, like focus of interest and redundancy. To get a hands-on impression of web-based concept detection, we invite researchers and practitioners to test our web demo.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130162634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Kernel K-means Based Framework for Aggregate Outputs Classification 基于核k均值的汇总输出分类框架
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.33
Shuo Chen, Bin Liu, Mingjie Qian, Changshui Zhang
{"title":"Kernel K-means Based Framework for Aggregate Outputs Classification","authors":"Shuo Chen, Bin Liu, Mingjie Qian, Changshui Zhang","doi":"10.1109/ICDMW.2009.33","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.33","url":null,"abstract":"Aggregate outputs learning is a newly proposed setting in data mining and machine learning. It differs from the classical supervised learning setting in that, training samples are packed into bags with only the aggregate outputs (labels for classification or real values for regression) provided. This problem is associated with several kinds of application background. We focus on the aggregate outputs classification problem in this paper, and set up a framework based on kernel K-means to solve it. Two concrete algorithms based on our framework are proposed, each of which can cope with both binary and multi-class scenarios. The experimental results suggest that our algorithms outperform the state-of-art technique. Also, we propose a new setting for patch extraction in the Content Based Image Retrieval procedure by using the algorithm.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120907098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Improved Multi Label Classification in Hierarchical Taxonomies 层次分类法中改进的多标签分类
2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI: 10.1109/ICDMW.2009.110
Kunal Punera, Suju Rajan
{"title":"Improved Multi Label Classification in Hierarchical Taxonomies","authors":"Kunal Punera, Suju Rajan","doi":"10.1109/ICDMW.2009.110","DOIUrl":"https://doi.org/10.1109/ICDMW.2009.110","url":null,"abstract":"Hierarchical taxonomies are used to organize and retrieve information in many domains, especially those dealing with large and rapidly growing amounts of information. In many of these domains data also tends to be multi-label in nature. In this paper, we consider the problem of automated text classification in these scenarios. We present a post-processing based approach that performs smoothing on the output of an underlying one-vs-all ensemble. In order to do this we formulate a Regularized Unimodal Regression problem and give an exact algorithm to solve it. We evaluate the performance of our approach on several real-world large-scale multi-label hierarchical taxonomies and demonstrate that our proposed method provides significant gains over other related approaches.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"143 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120913315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信