2008 IEEE International Conference on Data Mining Workshops最新文献

筛选
英文 中文
A Comparative Study of Data Sampling and Cost Sensitive Learning 数据抽样与代价敏感学习的比较研究
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.119
Chris Seiffert, T. Khoshgoftaar, J. V. Hulse, Amri Napolitano
{"title":"A Comparative Study of Data Sampling and Cost Sensitive Learning","authors":"Chris Seiffert, T. Khoshgoftaar, J. V. Hulse, Amri Napolitano","doi":"10.1109/ICDMW.2008.119","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.119","url":null,"abstract":"Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a lower cost of misclassification. Two techniques that have been used to address both of these issues are cost sensitive learning and data sampling. In this work, we investigate the performance of two cost sensitive learning techniques and four data sampling techniques for minimizing classification costs when data is imbalanced. We present a comprehensive suite of experiments, utilizing 15 datasets with 10 cost ratios, which have been carefully designed to ensure conclusive, significant and reliable results.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123067592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Distributed Linear Programming and Resource Management for Data Mining in Distributed Environments 分布式环境下数据挖掘的分布式线性规划和资源管理
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.137
Haimonti Dutta, H. Kargupta
{"title":"Distributed Linear Programming and Resource Management for Data Mining in Distributed Environments","authors":"Haimonti Dutta, H. Kargupta","doi":"10.1109/ICDMW.2008.137","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.137","url":null,"abstract":"Advances in computing and communication has resulted in very large scale distributed environments in recent years. They are capable of storing large volumes of data and often have multiple compute nodes. However, the inherent heterogeneity of data components, the dynamic nature of distributed systems, the need for information synchronization and data fusion over a network and security and access control issues makes the problem of resource management and monitoring a tremendous challenge. In particular, centralized algorithms for management of resources and data may not be sufficient to manage complex distributed systems. In this paper, we present a distributed algorithm for resource and data management which builds on the traditional simplex algorithm used for solving linear optimization problems. Our distributed algorithm is an exact one meaning its results are identical if run in a centralized setting. We provide extensive analytical results and experiments on simulated data to demonstrate the performance of our algorithm.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Remarks to Logical Aspects of Measures of Interestingness of Association Rules 关联规则兴趣度度量的逻辑方面述评
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.45
J. Rauch
{"title":"Remarks to Logical Aspects of Measures of Interestingness of Association Rules","authors":"J. Rauch","doi":"10.1109/ICDMW.2008.45","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.45","url":null,"abstract":"Relations of logical calculi of association rules to measures of interestingness of association rules are studied. Logical calculi of association rules, 4ft-quantifiers and important classes of association rules are briefly introduced. New 4ft-quantifiers and association rules are defined by applications of suitable thresholds to several known measures of interestingness. It is proved that some of new 4ft-quantifiers constitute rules that belong to known classes of rules. It is shown that new interesting classes of rules can be defined on the basis of additional new 4ft-quantifiers. Some additional results concerning new classes of rules are proved. Open problems are introduced.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126073108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
G-REX: A Versatile Framework for Evolutionary Data Mining G-REX:进化数据挖掘的通用框架
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.117
Rikard König, U. Johansson, L. Niklasson
{"title":"G-REX: A Versatile Framework for Evolutionary Data Mining","authors":"Rikard König, U. Johansson, L. Niklasson","doi":"10.1109/ICDMW.2008.117","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.117","url":null,"abstract":"This paper presents G-REX, a versatile data mining framework based on genetic programming. What differs G-REX from other GP frameworks is that it doesn't strive to be a general purpose framework. This allows G-REX to include more functionality specific to data mining like preprocessing, evaluation- and optimization methods, but also a multitude of predefined classification and regression models. Examples of predefined models are decision trees, decision lists, k-NN with attribute weights, hybrid kNN-rules, fuzzy-rules and several different regression models. The main strength is, however, the flexibility, making it easy to modify, extend and combine all of the predefined functionality. G-REX is, in addition, available in a special Weka package adding useful evolutionary functionality to the standard data mining tool Weka.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123428173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Semantic Full-Text Search with ESTER: Scalable, Easy, Fast 语义全文搜索与ESTER:可扩展,简单,快速
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.101
H. Bast, Fabian M. Suchanek, Ingmar Weber
{"title":"Semantic Full-Text Search with ESTER: Scalable, Easy, Fast","authors":"H. Bast, Fabian M. Suchanek, Ingmar Weber","doi":"10.1109/ICDMW.2008.101","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.101","url":null,"abstract":"We present a demo of ESTER, a search engine that combines the ease of use, speed and scalability of full-text search with the powerful semantic capabilities of ontologies. ESTER supports full-text queries, ontological queries and combinations of these, yet its interface is as easy as can be: A standard search field with semantic information provided interactively as one types. ESTER works by reducing all queries to two basic operations: prefix search and join, which can be implemented very efficiently in terms of both processing time and index space.We demonstrate the capabilities of ESTER on a combination of the English Wikipedia with the Yago ontology, with response times below 100 milliseconds for most queries, and an index size of about 4 GB. The system can be run both stand-alone and as a Web application.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122187812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Stream-Close: Fast Mining of Closed Frequent Itemsets in High Speed Data Streams Stream-Close:高速数据流中封闭频繁项集的快速挖掘
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.51
Ranganath B. N., M. Murty
{"title":"Stream-Close: Fast Mining of Closed Frequent Itemsets in High Speed Data Streams","authors":"Ranganath B. N., M. Murty","doi":"10.1109/ICDMW.2008.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.51","url":null,"abstract":"With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116719669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Graph-Based Data Mining in Dynamic Networks: Empirical Comparison of Compression-Based and Frequency-Based Subgraph Mining 动态网络中基于图的数据挖掘:基于压缩和基于频率的子图挖掘的经验比较
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.68
C. You, L. Holder, D. Cook
{"title":"Graph-Based Data Mining in Dynamic Networks: Empirical Comparison of Compression-Based and Frequency-Based Subgraph Mining","authors":"C. You, L. Holder, D. Cook","doi":"10.1109/ICDMW.2008.68","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.68","url":null,"abstract":"We propose a dynamic graph-based relational mining approach using graph-rewriting rules to learns patterns in networks that structurally change over time. A dynamic graph containing a sequence of graphs over time represents dynamic properties as well as structural properties of the network. Our approach discovers graph-rewriting rules, which describe the structural transformations between two sequential graphs over time, and also learns description rules that generalize over the discovered graph-rewriting rules. The discovered graph-rewriting rules show how networks change over time, and the description rules in the graph-rewriting rules show temporal patterns in the structural changes. We apply our approach to biological networks to understand how the biosystems change over time. Our compression-based discovery of the description rules is compared with the frequent subgraph mining approach using several evaluation metrics.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
The Set Classification Problem and Solution Methods 集分类问题及其求解方法
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.113
Xia Ning, G. Karypis
{"title":"The Set Classification Problem and Solution Methods","authors":"Xia Ning, G. Karypis","doi":"10.1109/ICDMW.2008.113","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.113","url":null,"abstract":"This paper focuses on developing classification algorithms for problems in which there is a need to predict the class based on multiple observations (examples) of the same phenomenon (class). These problems give rise to a new classification problem, referred to as set classification, that requires the prediction of a set of instances given the prior knowledge that all the instances of the set belong to the same unknown class. This problem falls under the general class of problems whose instances have class label dependencies. Four methods for solving the set classification problem are developed and studied. The first is based on a straightforward extension of the traditional classification paradigm whereas the other three are designed to explicitly take into account the known dependencies among the instances of the unlabeled set during learning or classification. A comprehensive experimental evaluation of the various methods and their underlying parameters shows that some of them lead to significant gains in performance.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Discovering Implicit Redundancies in Network Communications for Detecting Inconsistent Values 发现网络通信中的隐式冗余以检测不一致值
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.15
B. Nassu, T. Nanya, Hiroshi Nakamura
{"title":"Discovering Implicit Redundancies in Network Communications for Detecting Inconsistent Values","authors":"B. Nassu, T. Nanya, Hiroshi Nakamura","doi":"10.1109/ICDMW.2008.15","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.15","url":null,"abstract":"Detecting inconsistent values received in a communication is a challenging problem faced in networked systems. Inconsistent values occur when a message contains incorrect data, even though the syntax is correct and there is no corruption due to transmission errors. In many cases, traditional schemes based on voting protocols or error detection codes cannot be used. An alternative is discovering implicit redundancies, or patterns that model a correct communication, and using these patterns to detect inconsistent values. However, existing techniques do not cover the inputs and sequential patterns needed by this problem. In this paper, we propose a novel technique that considers messages with multiple types and attributes, events involving variables, and a heuristic for reducing redundant information. Experiments show that the discovered redundancies can achieve reasonable error detection coverage in fields where sequential relations exist, without implying in a large number of false alarms or a high latency.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129905693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Semi-supervised Learning Algorithm for Recognizing Sub-classes 子类识别的半监督学习算法
2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.129
Ranga Raju Vatsavai, S. Shekhar, B. Bhaduri
{"title":"A Semi-supervised Learning Algorithm for Recognizing Sub-classes","authors":"Ranga Raju Vatsavai, S. Shekhar, B. Bhaduri","doi":"10.1109/ICDMW.2008.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.129","url":null,"abstract":"In many practical situations it is not feasible to collect labeled samples for all available classes in a domain. Especially in supervised classification of remotely sensed images it is impossible to collect ground truth information over large geographic regions for all thematic classes. As a result often analysts collect labels for aggregate classes (e.g., Forest, Agriculture, Urban). In this paper we present a novel learning scheme that automatically learns sub-classes (e.g., Hardwood, Conifer) from the user given aggregate classes. We model each aggregate class as finite Gaussian mixture instead of classical assumption of unimodal Gaussian per class. The number of components in each finite Gaussian mixture are automatically estimated. A semi-supervised learning is then used to recognize sub-classes by utilizing very few labeled samples per each sub-class and a large number of unlabeled samples. Experimental results on real remotely sensed image classification showed not only improved accuracy in aggregate class classification but the proposed method also recognized sub-classes accurately.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123465053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信