Sixth International Conference on Machine Learning and Applications (ICMLA 2007)最新文献

筛选
英文 中文
Clustering Categorical Data Based on Maximal Frequent Itemsets 基于最大频繁项集的分类数据聚类
Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang
{"title":"Clustering Categorical Data Based on Maximal Frequent Itemsets","authors":"Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang","doi":"10.1109/ICMLA.2007.11","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.11","url":null,"abstract":"Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115989283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using evolutionary sampling to mine imbalanced data 利用进化抽样挖掘不平衡数据
D. J. Drown, T. Khoshgoftaar, R. Narayanan
{"title":"Using evolutionary sampling to mine imbalanced data","authors":"D. J. Drown, T. Khoshgoftaar, R. Narayanan","doi":"10.1109/ICMLA.2007.73","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.73","url":null,"abstract":"Class imbalance tends to cause inferior performance in data mining learners. Evolutionary sampling is a technique which seeks to counter this problem by using genetic algorithms to evolve a reduced sample of a complete dataset to train a classification model. Evolutionary sampling works to remove noisy and duplicate instances so that the sampled training data will produce a superior classifier. We propose this novel technique as a method to handle severe class imbalance in data mining. This paper presents our research into the the use of evolutionary sampling with C4.5 decision trees and compares the technique's performance with random undersamp ling.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129680582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Modifying kernels using label information improves SVM classification performance 使用标签信息修改核可以提高SVM的分类性能
Martin Renqiang Min, A. Bonner, Zhaolei Zhang
{"title":"Modifying kernels using label information improves SVM classification performance","authors":"Martin Renqiang Min, A. Bonner, Zhaolei Zhang","doi":"10.1109/ICMLA.2007.84","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.84","url":null,"abstract":"Kernel learning methods based on kernel alignment with semidefinite programming (SDP) are often memory intensive and computationally expensive, thus often impractical for problems with large-size dataset. We propose a method using label information to modify kernels based on SVD and a linear mapping. As a result, the new kernel matrix reflects the label-dependent separability of the data in a better way than the original kernel matrix. In addition, our experimental results on USPS handwritten digits and the SCOP dataset, show that the SVM classifier based on the improved kernels has better performance than the SVM classifier based on the original kernels; moreover, SVM based on the improved profile kernel with pull-in homologs (see experiment section for explanations) produced the best results for remote homology detection on the SCOP dataset compared to the published results.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128766721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Text Mining and Ontology Applications in Bioinformatics and GIS 文本挖掘和本体在生物信息学和GIS中的应用
S. Navathe
{"title":"Text Mining and Ontology Applications in Bioinformatics and GIS","authors":"S. Navathe","doi":"10.1109/ICMLA.2007.122","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.122","url":null,"abstract":"Informatics and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe the development of: (1) a large public database of compounds and reactions (ChemDB); (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease. More broadly, we will discuss some of the challenges and opportunities for computer science, AI, and machine learning in chemistry. Abstract: This talk will present some general problem areas and solutions in two fields of applications of machine learning: bioinformatics and Geographic Information Systems (GIS). The bioinformatics arena is very broad and encompasses many problems such as gene finding in sequences, molecular pathway construction, protein structure prediction etc. We will outline our research on finding important keywords from the biomedical literature by statistical analysis and some natural language analysis. We have also incorporated ontologies such as UMLS (Unified Medical Language System) to determine relationships among biological and medical concepts. The primary goal of this work has been to interpret the long lists of genes that are derived in microarray experiments used to understand and treat diseases. We are able to cluster genes based on their functional similarity. We have also used lists of keywords as feature vectors to drive SVM models for a classification of literature. In particular, we have dealt with the classification of relevant literature for Public health at the CDC (Centers of Disease Control). We will briefly explain the discovery of biomarkers for cancer using a technique that combines SVM and gene ontology.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127788365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Sparsity regularization path for semi-supervised SVM 半监督支持向量机的稀疏正则化路径
G. Gasso, Karina Zapien Arreola, S. Canu
{"title":"Sparsity regularization path for semi-supervised SVM","authors":"G. Gasso, Karina Zapien Arreola, S. Canu","doi":"10.1109/ICMLA.2007.81","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.81","url":null,"abstract":"Using unlabeled data to unravel the structure of the data to leverage the learning process is the goal of semi supervised learning. A common way to represent this underlying structure is to use graphs. Flexibility of the maximum margin kernel framework allows to model graph smoothness and to build kernel machine for semi supervised learning such as Laplacian SVM [1]. But a common complaint of the practitioner is the long running time of these kernel algorithms for classification of new points. We provide an efficient way of alleviating this problem by using a LI penalization term and a regularization path algorithm to efficiently compute the solution. Empirical evidence shows the benefit of the algorithm.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126423083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Biomarker Identification by Knowledge-Driven Multi-Level ICA and Motif Analysis 基于知识驱动的多层次ICA和Motif分析的生物标志物鉴定
Li Chen, J. Xuan, Chen Wang, Y. Wang, I. Shih, Tian-Li Wang, Zhen Zhang, R. Clarke, E. Hoffman
{"title":"Biomarker Identification by Knowledge-Driven Multi-Level ICA and Motif Analysis","authors":"Li Chen, J. Xuan, Chen Wang, Y. Wang, I. Shih, Tian-Li Wang, Zhen Zhang, R. Clarke, E. Hoffman","doi":"10.1109/ICMLA.2007.58","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.58","url":null,"abstract":"Many statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study from expression data alone. In this paper, we develop a novel strategy, namely knowledge-driven multi-level independent component analysis (ICA), to infer regulatory signals and identify biologically relevant biomarkers from microarray data. Specifically, based on multi-level clustering results and partial prior knowledge, we apply ICA to find stable disease specific linear regulatory modes and then extract associated biomarker genes. A statistical test is designed to evaluate the significance of transcription factor enrichment for extracted gene set based on motif information. The experimental results on an Rsf-1 induced microarray data set show that our knowledge-driven method can extract more biologically meaningful biomarkers with significant enrichment of transcription factors related to ovarian cancer compared to other gene selection methods with/without prior knowledge.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127142413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An optimization method for selecting parameters in support vector machines 支持向量机参数选择的优化方法
Yulin Dong, Manghui Tu, Zhonghang Xia, Guangming Xing
{"title":"An optimization method for selecting parameters in support vector machines","authors":"Yulin Dong, Manghui Tu, Zhonghang Xia, Guangming Xing","doi":"10.1109/ICMLA.2007.38","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.38","url":null,"abstract":"It has been shown that the cost parameters and kernel parameters are critical in the performance of support vector machines (SVMs). A standard parameter selection method compares parameters among a discrete set of values, called the candidate set, and picks the one which has the best classification accuracy. As a result, the choice of parameters strongly depends on the pre-defined candidate set. In this paper, we formulate the selection of the cost parameter and kernel parameter as a two-level optimization problem, in which the values of parameters vary continuously and thus optimization techniques can be applied to select ideal parameters. Due to the non-smoothness of the objective function in our model, a genetic algorithm has been presented. Numerical results show that the two-level approach can significantly improve the performance of SVM classifier in terms of classification accuracy.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127273312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An incremental viterbi algorithm 一种增量viterbi算法
J. Bobbin
{"title":"An incremental viterbi algorithm","authors":"J. Bobbin","doi":"10.1109/ICMLA.2007.49","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.49","url":null,"abstract":"This paper describes an incremental version of the Viterbi dynamic programming algorithm. The incremental algorithm is shown to dramatically reduce memory usage in long state sequence problems compared with the standard Viterbi algorithm while having no measurable impact on the algorithms runtime. In addition, the set of problems which the Viterbi algorithm can be applied is extended by the incremental algorithm to include problems of finding optimal paths in realtime domains. The Viterbi algorithm is widely used to find optimal paths in hidden Markov models (HMM), and HMMs are frequently applied to both streaming data problems where realtime solutions can be of interest, and to large state sequence problems in areas like bioinformatics. In this paper we apply the incremental algorithm to finding optimal paths in a variant of the burst detection HMM applied to the novel problem of detecting user activity levels in digital evidence data derived from hard drives.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134631436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Memory-based context-sensitive spelling correction at web scale 基于记忆的上下文敏感拼写纠正在网络规模
Andrew Carlson, Ian Fette
{"title":"Memory-based context-sensitive spelling correction at web scale","authors":"Andrew Carlson, Ian Fette","doi":"10.1109/ICMLA.2007.50","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.50","url":null,"abstract":"We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training data. Our approach uses the context in which an error appears to select the most likely candidate from words which might have been intended in its place. Using a novel correction algorithm and a massive database of training data, we demonstrate higher accuracy on correcting real- word errors than previous work, and very high accuracy at a new task of ranking corrections to non-word errors given by a standard spelling correction package.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115246701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Learning with limited minority class data 使用有限的少数族裔课堂数据进行学习
T. Khoshgoftaar, Chris Seiffert, J. V. Hulse, Amri Napolitano, A. Folleco
{"title":"Learning with limited minority class data","authors":"T. Khoshgoftaar, Chris Seiffert, J. V. Hulse, Amri Napolitano, A. Folleco","doi":"10.1109/ICMLA.2007.76","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.76","url":null,"abstract":"A practical problem in data mining and machine learning is the limited availability of data. For example, in a binary classification problem it is often the case that examples of one class are abundant, while examples of the other class are in short supply. Examples from one class, typically the positive class, can be limited due to the financial cost or time required to collect these examples. This work presents a comprehensive empirical study of learning when examples from one class are extremely rare, but examples of the other class(es) are plentiful. Specifically, we address the issue of how many examples from the abundant class should be used when training a classifier on data where one class is very rare. Nearly one million classifiers were built and evaluated to generate the results presented in this work. Our results demonstrate that the often used 'even distribution' is not optimal when dealing with such rare events.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123897272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信