2008 IEEE International Conference on Data Mining Workshops最新文献

筛选
英文 中文
Parameter Tuning for Differential Mining of String Patterns 字符串模式差分挖掘的参数调优
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.118
J. Besson, C. Rigotti, I. Mitasiunaite, Jean-François Boulicaut
{"title":"Parameter Tuning for Differential Mining of String Patterns","authors":"J. Besson, C. Rigotti, I. Mitasiunaite, Jean-François Boulicaut","doi":"10.1109/ICDMW.2008.118","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.118","url":null,"abstract":"Constraint-based mining has been proven to be extremely useful for supporting actionable pattern discovery. However, useful conjunctions of constraints that support domain driven mining tasks generally need to set several parameter values and how to tune these parameters remains fairly open. We study this problem for substring pattern discovery, when using a conjunction of maximal frequency, minimal frequency and size constraints. We propose a method, based on pattern space sampling, to estimate the number of patterns that satisfy such conjunctions. This permits the user to probe the parameter space in many points, and then to choose some initial promising parameter settings. Our empirical validation confirms that we efficiently obtain good approximations of the number of patterns that will be extracted.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131477702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Web Query Prediction by Unifying Model 基于统一模型的Web查询预测
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.53
Ning Liu, Jun Yan, Shuicheng Yan, Weiguo Fan, Zheng Chen
{"title":"Web Query Prediction by Unifying Model","authors":"Ning Liu, Jun Yan, Shuicheng Yan, Weiguo Fan, Zheng Chen","doi":"10.1109/ICDMW.2008.53","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.53","url":null,"abstract":"Recently, many commercial products, such as Google Trends and Yahoo! Buzz, are released to monitor the past search engine query frequency trend. However, little research has been devoted for predicting the upcoming query trend, which is of great importance in providing guidelines for future business planning. In this paper, a unified solution is presented for such a purpose. Besides the classical time series model, we propose to integrate the cosine signal hidden periodicities model to capture periodic information of query time series. Motivated by the fact that these models cannot capture the external accidental event factors which could significantly influence the query frequency, the query correlation model is also modified and integrated for predicting the upcoming query trend. Finally linear regression is utilized for model unification. Experiments based on 15,511,531 queries from a commercial search engine query log ranging within 283 days well validate the effectiveness of our proposed unified algorithm.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122946548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A New Graph-Based Algorithm for Clustering Documents 基于图的文档聚类新算法
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.69
Airel Pérez Suárez, José Francisco Martínez Trinidad, J. A. Carrasco-Ochoa, J. Medina-Pagola
{"title":"A New Graph-Based Algorithm for Clustering Documents","authors":"Airel Pérez Suárez, José Francisco Martínez Trinidad, J. A. Carrasco-Ochoa, J. Medina-Pagola","doi":"10.1109/ICDMW.2008.69","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.69","url":null,"abstract":"In this paper a new algorithm, called CStar, for document clustering is presented. This algorithm improves recently developed algorithms like generalized star (GStar) and ACONS algorithms, originally proposed for reducing some drawbacks presented in previous Star-like algorithms.The CStar algorithm uses the condensed star-shaped sub-graph concept defined by ACONS, but defines a new heuristic that allows to construct a new cover of the thresholded similarity graph and to reduce the drawbacks presented in GStar and ACONS algorithms. The experimentation over standard document collections shows that our proposal outperforms previously defined algorithms and other related algorithms used to document clustering.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124549868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multiple-Instance Regression with Structured Data 结构化数据的多实例回归
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.31
K. Wagstaff, T. Lane, A. Roper
{"title":"Multiple-Instance Regression with Structured Data","authors":"K. Wagstaff, T. Lane, A. Roper","doi":"10.1109/ICDMW.2008.31","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.31","url":null,"abstract":"We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bagpsilas internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125471924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Speeding up Array Query Processing by Just-In-Time Compilation 通过即时编译加速数组查询处理
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.73
C. Jucovschi, P. Baumann, Sorin Stancu-Mara
{"title":"Speeding up Array Query Processing by Just-In-Time Compilation","authors":"C. Jucovschi, P. Baumann, Sorin Stancu-Mara","doi":"10.1109/ICDMW.2008.73","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.73","url":null,"abstract":"Interpreted languages frequently suffer from higher processing times as compared to compiled approaches. Typically this happens when complex computations are performed. Array DBMSs, which extend database functionality with multidimensional array modeling and query support, find themselves in exactly this situation: queries often involve a large number of operations, and each such operation is applied to a large number of array elements.In this paper, we propose just-in-time compilation as an optimization method for an interpreted array query language. This is achieved by grouping suitable query nodes into complex operation nodes, for which C code is generated, compiled, and loaded during runtime.We present our approach based on the array DBMS rasdaman, discuss its benefits and its embedding into the rasdaman query evaluation, and show initial, rather promising benchmark results.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115608655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Actionable Knowledge Discovery for Threats Intelligence Support Using a Multi-dimensional Data Mining Methodology 基于多维数据挖掘方法的威胁情报支持的可操作知识发现
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.78
Olivier Thonnard, M. Dacier
{"title":"Actionable Knowledge Discovery for Threats Intelligence Support Using a Multi-dimensional Data Mining Methodology","authors":"Olivier Thonnard, M. Dacier","doi":"10.1109/ICDMW.2008.78","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.78","url":null,"abstract":"This paper describes a multi-dimensional knowledge discovery and data mining (KDD) methodology that aims at discovering actionable knowledge related to Internet threats, taking into account domain expert guidance and the integration of domain-specific intelligence during the data mining process. The objectives are twofold: i) to develop global indicators for assessing the prevalence of certain malicious activities on the Internet, and ii) to get insights into the modus operandi of new emerging attack phenomena, so as to improve our understanding of threats. In this paper, we first present the generic aspects of a domain-driven graph-based KDD methodology, which is based on two main components: a clique-based clustering technique and a concepts synthesis process using cliques' intersections. Then, to evaluate the applicability of this approach to our application domain, we use a large dataset of real-world attack traces collected since 2003. Our experimental results show that significant insights can be obtained into the domain of threat intelligence by using this multi-dimensional knowledge discovery method.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115071369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets 在高维和噪声数据集中寻找相干共簇
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.20
Meghana Deodhar, Joydeep Ghosh, Gunjan Gupta, Hyuk Cho, I. Dhillon
{"title":"Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets","authors":"Meghana Deodhar, Joydeep Ghosh, Gunjan Gupta, Hyuk Cho, I. Dhillon","doi":"10.1109/ICDMW.2008.20","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.20","url":null,"abstract":"Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally, since clusters could exist in different subspaces of the feature space, a co-clustering algorithm that simultaneously clusters objects and features is often more suitable as compared to one that is restricted to traditional \"one-sided\" clustering. We propose Robust Overlapping Co-clustering (ROCC), a scalable and very versatile framework that addresses the problem of efficiently mining dense, arbitrarily positioned, possibly overlapping co-clusters from large, noisy datasets. ROCC has several desirable properties that make it extremely well suited to a number of real life applications. Through extensive experimentation we show that our approach is significantly more accurate in identifying biologically meaningful co-clusters in microarray data as compared to several other prominent approaches that have been applied to this task. We also point out other interesting applications of the proposed framework in solving difficult clustering problems.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129651785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Research on Methodology of Classification Mining for Tumor Markers 肿瘤标记物分类挖掘方法研究
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.74
Wei Jiang, Min Yao, Jiekai Yu
{"title":"Research on Methodology of Classification Mining for Tumor Markers","authors":"Wei Jiang, Min Yao, Jiekai Yu","doi":"10.1109/ICDMW.2008.74","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.74","url":null,"abstract":"Reliability is one of the key issues in data mining. In the case of massive protein mass spectrum data from SELDI-TOF-MS, this paper proposes an effective and reliable method to extract tumor markers. First of all, an adaptive threshold approach based on wavelet transformation is put forward to eliminate the noise in raw data so as to furnish reliable foundation for tumor markers extraction. Then a kind of genetic algorithm based on SVM is designed to construct discriminating model in order to find the optimal combination of distinct protein peaks and obtain tumor markers. Finally, the method proposed in this paper is applied to extract tumor markers from the protein mass spectrum data that come from normal mouse serums and induced pancreatic cancer mouse serums to verify the feasibility and reliability of our method.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126038010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-training by Committee: A New Semi-supervised Learning Framework 委员会共同培训:一种新的半监督学习框架
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.27
Mohamed Farouk Abdel Hady, F. Schwenker
{"title":"Co-training by Committee: A New Semi-supervised Learning Framework","authors":"Mohamed Farouk Abdel Hady, F. Schwenker","doi":"10.1109/ICDMW.2008.27","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.27","url":null,"abstract":"For many data mining applications, it is necessary to develop algorithms that use unlabeled data to improve the accuracy of the supervised learning. Co-Training is a popular semi-supervised learning algorithm. It assumes that each example is represented by two or more redundantly sufficient sets of features (views) and these views are independent given the class. However, these assumptions are not satisfied in many real-world application domains. Therefore, we present a framework called co-training by committee (CoBC), in which a set of diverse classifiers are used to learn each other. The framework is a simple, general single-view semi-supervised learner that can use any ensemble learner to build diverse committees. Experimental studies on CoBC using bagging, AdaBoost and the random subspace method (RSM) as ensemble learners demonstrate that error diversity among classifiers leads to an effective co-training that requires neither redundant and independent views nor different learning algorithms.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115049251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Towards Combining Structured Pattern Mining and Graph Kernels 结构化模式挖掘与图核结合的研究
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.125
Fabrizio Costa, Björn Bringmann
{"title":"Towards Combining Structured Pattern Mining and Graph Kernels","authors":"Fabrizio Costa, Björn Bringmann","doi":"10.1109/ICDMW.2008.125","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.125","url":null,"abstract":"This paper presents a novel approach to feature construction for structured data in order to enhance graph prediction classification performance. To this end we combine graph mining techniques with graph kernel based classifiers. The main idea is to employ efficient mining techniques to extract a set of patterns correlated with the target concept and use these, or a selected subset of these, to annotate the original graph structures. A decomposition kernel is then defined on the enriched structured data instances. Experimental results on carcinogenic and toxicological activity prediction tasks for small molecules show that the proposed technique significantly increases classification performance.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124117574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信