{"title":"Fuzzy-Granular Gene Selection from Microarray Expression Data","authors":"Yuanchen He, Yuchun Tang, Yanqing Zhang, Rajshekhar Sunderraman","doi":"10.1109/ICDMW.2006.84","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.84","url":null,"abstract":"Selecting informative and discriminative genes from huge microarray gene expression data is an important and challenging bioinformatics research topic. This paper proposes a fuzzy-granular method for the gene selection task. Firstly, genes are grouped into different function granules with the fuzzy C-means algorithm (FCM). And then informative genes in each cluster are selected with the signal to noise metric (S2N). With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. The simulation results on two publicly available microarray expression datasets show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115616800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncovering Potential Attribute Relevance via MIA-Processing in Data Mining","authors":"S. Chao, Yiping Li","doi":"10.1109/ICDMW.2006.162","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.162","url":null,"abstract":"The purpose of a classification learning algorithm is to accurately and efficiently map an input instance to an output class label, according to a set of labeled instances. In which data preprocessing, especially feature selection (FS) and continuous feature discretization (CFD), are considered as the significant issues. Since the quality of the data highly affects the result of a learning problem. Especially in medical domain, symptoms are interacted with each other; a compound symptom always could reveal more accurate diagnostic results. Therefore, a useless attribute by itself may become potentially relevant by providing hidden supportive information to other attributes. In this paper, our MIA-processing methods focus on uncovering hidden attributes relevance during FS and CFD. Our methods hence minimize the uncertainty and at the same time maximize the final classification accuracy. The empirical results demonstrate a comparison of performance of various classification algorithms on several real-life datasets from UCI repository","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124286224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Wei, J. Handley, Nathaniel Martin, Tong Sun, Eamonn J. Keogh
{"title":"Clustering Workflow Requirements Using Compression Dissimilarity Measure","authors":"Li Wei, J. Handley, Nathaniel Martin, Tong Sun, Eamonn J. Keogh","doi":"10.1109/ICDMW.2006.44","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.44","url":null,"abstract":"Xerox offers a bewildering array of printers and software configurations to satisfy the need of production print shops. A configuration tool in the hands of sales analysts elicits requirements from customers and recommends a list of product configurations. This tool generates special question and answer case logs that provide useful historical data. Given the unusual semi-structured question and answer format, this data is not amenable to any standard document clustering method. The authors discovered that a hierarchical agglomerative approach using a compression-based dissimilarity measure (CDM) provided readily interpretable clusters. The authors compared this method empirically to two reasonable alternatives, latent semantic analysis and probabilistic latent semantic analysis, and conclude that CDM offers an accurate and easily implemented approach to validate and augment our configuration tool","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121035456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Crypto-Based Approach to Privacy-Preserving Collaborative Data Mining","authors":"J. Zhan, S. Matwin","doi":"10.1109/ICDMW.2006.3","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.3","url":null,"abstract":"To conduct data mining, we often need to collect data from various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. In this paper, we propose a formal definition of privacy, develop a solution for privacy-preserving k-nearest neighbor classification which is one of data mining tasks, and show that our solution preserves data privacy according to our definition","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121256093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Communities from Complex Networks by the k-dense Method","authors":"Kazumi Saito, Takeshi Yamada, K. Kazama","doi":"10.1093/ietfec/e91-a.11.3304","DOIUrl":"https://doi.org/10.1093/ietfec/e91-a.11.3304","url":null,"abstract":"To understand the structural and functional properties of large-scale complex networks, it is crucial to efficiently extract a set of cohesive subnetworks as communities. There have been proposed several such community extraction methods in the literature, including the classical k-core decomposition method and, more recently, the k-clique based community extraction method. The k-core method, although computationally efficient, is often not powerful enough for uncovering a detailed community structure and it produces only coarse-grained and loosely connected communities. The k-clique method, on the other hand, can extract fine-grained and tightly connected communities but requires a substantial amount of computational load for large-scale complex networks. In this paper, we present a new notion of a subnetwork called k-dense, and propose an efficient algorithm for extracting k-dense communities. We applied our method to the two different types of networks assembled from real data, namely, from blog trackbacks and word associations, demonstrated that the k-dense method could extract communities almost as efficiently as the k-core method, while the qualities of the extracted communities are comparable to those obtained by the k-clique method","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"617 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123268767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Many Sorted Observational Calculi for Multi-Relational Data Mining","authors":"J. Rauch","doi":"10.1109/ICDMW.2006.106","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.106","url":null,"abstract":"Observational calculi approved as a tool for study logical properties of association rules. They are defined by modifications of classical predicate calculi. Many sorted observational calculi are introduced as a modification of classical many sorted predicate calculi. It is argued that such defined calculi are suitable for relational data mining. Results on decidability of these calculi are presented. Further research directions are outlined","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115207205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised and Semi-Supervised Two-class Support Vector Machines","authors":"Zhao Kim, Yingjie Tian, Nai-yang Deng","doi":"10.1109/ICDMW.2006.164","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.164","url":null,"abstract":"Support vector machines have been a dominant learning technique for almost ten years, moreover they have been applied to supervised learning problems. Recently two-class unsupervised and semi-supervised classification problems based on bounded c-support vector machines are relaxed to semi-definite programming (B.L. Xu et al., 2004). In this paper the authors present another version to two-class unsupervised and semi-supervised classification problems based on bounded v-support vector machines, which trained by convex relaxation of the training criterion: find a labeling that yield a maximum margin on the training data. But the problems have difficulty to compute, we will find their semi-definite relaxations that can approximate them well. Experimental results show that our new unsupervised and semi-supervised classification algorithms often obtain more accurate results than other unsupervised and semi-supervised methods","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116826523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The River-Rafting System for Knowledge Discovery Related to Persuasion Process Conversation Logs","authors":"W. Sunayama, K. Yada","doi":"10.1109/ICDMW.2006.156","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.156","url":null,"abstract":"The purpose of this research is to develop a framework to represent the content and process of persuasion communications for overdue payment collection, thus making it possible to examine how the skilled operators have used theme related keywords concerning motivations to pay, the payment methods and the payment confirmation in their negotiation to achieve higher collection success. This paper describes a basis for modeling a persuasion process. There has been no research or methods for dealing with large amounts of conversation logs for discovering useful knowledge about persuasion processes. In this paper, we report our successful efforts in discovering a part of the distinctive features of skilled worker techniques as indicated in their conversations related to overdue payment collection and the application of our methods to communication data related to a Japanese telecommunications company","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129832445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Wolf, B. Brors, Thomas Hofmann, Elisabeth Georgii
{"title":"Global Biclustering of Microarray Data","authors":"Thomas Wolf, B. Brors, Thomas Hofmann, Elisabeth Georgii","doi":"10.1109/ICDMW.2006.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.88","url":null,"abstract":"We consider the problem of simultaneously clustering genes and conditions of a gene expression data matrix. A bicluster is defined as a subset of genes that show similar behavior within a subset of conditions. Finding biclusters can be useful for revealing groups of genes involved in the same molecular process as well as groups of conditions where this process takes place. Previous work either deals with local, bicluster-based criteria or assumes a very specific structure of the data matrix (e.g. checkerboard or block-diagonal) (Ryan et al., 2005). In contrast, our goal is to find a set of flexibly arranged biclusters which is optimal in regard to a global objective function. As this is a NP-hard combinatorial problem, we describe several techniques to obtain approximate solutions. We benchmarked our approach successfully on the Alizadeh B-cell lymphoma data set (Alizadeh et al., 2000)","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130572081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approach to Outsourcing Data Mining Tasks while Protecting Business Intelligence and Customer Privacy","authors":"Ling Qiu, Yingjiu Li, Xintao Wu","doi":"10.1109/ICDMW.2006.26","DOIUrl":"https://doi.org/10.1109/ICDMW.2006.26","url":null,"abstract":"Data mining is playing an important role in decision making. It is beneficial to outsource data mining tasks if an organization does not have required expertise in-house. However, the organization may lose business intelligence and customer privacy during this outsourcing process. In this paper, we present a Bloom filter based solution to enable organizations to outsource their tasks of mining association rules while protecting their business intelligence and customer privacy. Our approach can achieve high precision in data mining by trading-off storage requirements","PeriodicalId":291862,"journal":{"name":"Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130626489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}