{"title":"A hierarchical feature decomposition clustering algorithm for unsupervised classification of document image types","authors":"Dean Curtis, V. Kubushyn, E. Yfantis, M. Rogers","doi":"10.1109/ICMLA.2007.13","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.13","url":null,"abstract":"In a system where medical paper document images have been converted to a digital format by a scanning operation, understanding the document types that exists in this system could provide for vital data indexing and retrieval. In a system where millions of document images have been scanned, it is infeasible to expect a supervised based algorithm or a tedious (human based) effort to discover the document types. The most sensible and practical way is an unsupervised algorithm. Many clustering techniques have been developed for unsupervised classification. Many rely on all data being presented at once, the number of clusters to be known, or both. The algorithm presented in this paper is a two-threshold based technique relying on a hierarchical decomposition of the features. On a subset of document images, it discovered document types at an acceptable level and confidentially classified unknown document images.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130419819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction of minimum decision algorithm using rough sets and genetic algorithms","authors":"M. Hirokane, Shusaku Kouno, Y. Nomura","doi":"10.1109/ICMLA.2007.51","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.51","url":null,"abstract":"In civil engineering, it is crucial to reuse knowledge which has been accumulated through the experience of engineers, etc. For this purpose, it is necessary to establish a method for knowledge acquisition and a method for explicit representation of the acquired knowledge. This paper applies the genetic algorithm to the process of deriving a decision algorithm from instances by using rough sets, and proposes a method of deriving a simple and useful decision algorithm with a relatively small amount of computation. A decision algorithm is actually derived from the data on accident instances at actual construction sites, and the recognition rate and other performance measures are investigated by the k-fold cross validation method.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123704741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An application of a rule-based model in software quality classification","authors":"Lofton A. Bullard, T. Khoshgoftaar, Kehan Gao","doi":"10.1109/ICMLA.2007.69","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.69","url":null,"abstract":"A new rule-based classification model (RBCM) and rule-based model selection technique are presented. The RBCM utilizes rough set theory to significantly reduce the number of attributes, discretation to partition the domain of attribute values, and Boolean predicates to generate the decision rules that comprise the model. When the domain values of an attribute are continuous and relatively large, rough set theory requires that they be discretized. The subsequent discretized domain must have the same characteristics as the original domain values. However, this can lead to a large number of partitions of the attribute's domain space, which in turn leads to large rule sets. These rule sets tend to form models that over-fit. To address this issue, the proposed rule-based model adopts a new model selection strategy that minimizes over-fitting for the RBCM. Empirical validation of the RBCM is accomplished through a case study on a large legacy telecommunications system. The results demonstrate that the proposed RBCM and the model selection strategy are effective in identifying the classification model that minimizes over-fitting and high cost classification errors.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125016296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Challenges in Preserving and Reconstructing Computer-Assisted Medical Decision Processes","authors":"Sang-Chul Lee, Peter Bajcsy","doi":"10.1109/ICMLA.2007.92","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.92","url":null,"abstract":"This paper addresses the problem of understanding preservation and reconstruction requirements for computer- aided medical decision-making. With an increasing number of computer-aided decisions having a large impact on our society, the motivation for our work is not only to document these decision processes semi-automatically but also to understand the preservation cost and related computational requirements. Our objective is to support computer-assisted creation of medical records, to guarantee authenticity of records, as well as to allow managers of electronic medical records (EMR), archivists and other users to explore and evaluate computational costs (e.g., storage and processing time) depending on several key characteristics of appraised records. Our approach to this problem is based on designing an exploratory simulation framework for investigating preservation tradeoffs and assisting in appraisals of electronic records. We have a prototype simulation framework called image provenance to learn (IP2Learn) to support computer-aided medical decisions based on visual image inspection. The current software enables to explore some of the tradeoffs related to (1) information granularity (category and level of detail), (2) representation of provenance information, (3) compression, (4) encryption, (5) watermarking and steganography, (6) information gathering mechanism, and (7) final medical report content (level of detail) and its format. We illustrate the novelty of IP2Learn by performing example studies and the results of tradeoff analyses for a specific image inspection task.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126728805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bias-variance tradeoff in hybrid generative-discriminative models","authors":"Guillaume Bouchard","doi":"10.1109/ICMLA.2007.85","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.85","url":null,"abstract":"Given any generative classifier based on an inexact density model, we can define a discriminative counterpart that reduces its asymptotic error rate, while increasing the estimation variance. An optimal bias-variance balance might be found using hybrid generative-discriminative (HGD) approaches. In these paper, these methods are defined in a unified framework. This allow us to find sufficient conditions under which an improvement in generalization performances is guaranteed. Numerical experiments illustrate the well fondness of our statements.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131864643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of two algorithms for predicting the condition number","authors":"Guénaël Cabanes, Younès Bennani","doi":"10.1109/ICMLA.2007.8","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.8","url":null,"abstract":"We present experimental results of comparing the modified K-nearest neighbor (MkNN) algorithm with support vector machine (SVM) in the prediction of condition numbers of sparse matrices. Condition number of a matrix is an important measure in numerical analysis and linear algebra. However, the direct computation of the condition number of a matrix is very expensive in terms of CPU and memory cost, and becomes prohibitive for large size matrices. We use data mining techniques to estimate the condition number of a given sparse matrix. In our previous work, we used support vector machine (SVM) to predict the condition numbers. While SVM is considered a state-of- the-art classification/regression algorithm, kNN is usually used for collaborative filtering tasks. Since prediction can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as kNN) can also be applied. Experiments are performed on a publicly available dataset. We conclude that modified kNN (MkNN) performs much better than SVM on this particular dataset.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125997635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvement of Bayesian Network Inference Using a Relaxed Gene Ordering","authors":"D. Zhu, Hua Li","doi":"10.1109/ICMLA.2007.68","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.68","url":null,"abstract":"Bayesian network structural learning from high throughput data has become a powerful tool in reconstructing signaling pathways. Recent bioinformatics research advocates the notion that signaling networks in the living cell are likely to be hierarchically organized. Genes resident in hierarchical layers constitute biological constraint, which can be readily used by many network structural learning algorithms to reduce the computational complexity. Based on the hierarchical constraint constructed by using breadth-first-search(BFS) on a manually assembled transcriptional regulation network in Saccharomyces cerevisiae, we propose a new constrained Bayesian network structural learning algorithm that solves the NP-hard computational problem in a heuristic manner. We demonstrate the utility of our algorithm in constructing two important signaling pathways.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"237-240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130748529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LEONARDO - The computational intelligence (CI) model selection wizard","authors":"Thanh-Nghi Do, Jean-Daniel Fekete","doi":"10.1109/ICMLA.2007.57","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.57","url":null,"abstract":"The need for tools to aid the selection of the CI models that lie at the heart of many AI systems has never been greater, due to the mainstreaming of data mining and other AI applications. LEONARDO -our contribution to this process- is a recommender system that selects and ranks applicable CI models for a given problem based on the peculiarities of the domain as determined by the user's preferences and dataset characteristics. Leonardo's recommendations are based on two knowledge bases. One contains the description of 65 CI models and provides the Meta knowledge for pruning the space of all CI models to only those applicable to the current task. The second KB contains the performance results of over 200 datasets on the applicable CI models. LEONARDO's ranking is achieved by using the performance information of the k entries, from this KB, nearest in similarity to the new domain dataset.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114586165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Scale Classification with Support Vector Machine Algorithms","authors":"Thanh-Nghi Do, Jean-Daniel Fekete","doi":"10.1109/icmla.2007.25","DOIUrl":"https://doi.org/10.1109/icmla.2007.25","url":null,"abstract":"Inductive transfer is applying knowledge learned on one set of tasks to improve the performance of learning a new task. Inductive transfer is being applied in improving the generalization performance on a classification task using the models learned on some related tasks. In this paper, we show a method of making inductive transfer for text classification more effective using Wikipedia. We map the text documents of the different tasks to a feature space created using Wikipedia, thereby providing some background knowledge of the contents of the documents. It has been observed here that when the classifiers are built using the features generated from Wikipedia they become more effective in transferring knowledge. An evaluation on the daily classification task on the Reuters RCV1 corpus shows that our method can significantly improve the performance of inductive transfer. Our method was also able to successfully overcome a major obstacle observed in a recent work on a similar setting.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125373793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised reinforcement learning using behavior models","authors":"Víctor Uc Cetina","doi":"10.1109/icmla.2007.14","DOIUrl":"https://doi.org/10.1109/icmla.2007.14","url":null,"abstract":"We introduce a supervised reinforcement learning (SRL) architecture for robot control problems with high dimensional state spaces. Based on such architecture two new SRL algorithms are proposed. In our algorithms, a behavior model learned from examples is used to dynamically reduce the set of actions available from each state during the early reinforcement learning (RL) process. The creation of such subsets of actions leads the agent to exploit relevant parts of the action space, avoiding the selection of irrelevant actions. Once the agent has exploited the information provided by the behavior model, it keeps improving its value function without any help, by selecting the next actions to be performed from the complete action space. Our experimental work shows clearly how this approach can dramatically speed up the learning process.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132715791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}