{"title":"Improving gene expression programming performance by using differential evolution","authors":"Qiongyun Zhang, Chi Zhou, Weimin Xiao, P. Nelson","doi":"10.1109/ICMLA.2007.62","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.62","url":null,"abstract":"Gene Expression Programming (GEP) is an evolutionary algorithm that incorporates both the idea of a simple, linear chromosome of fixed length used in Genetic Algorithms (GAs) and the tree structure of different sizes and shapes used in Genetic Programming (GP). As with other GP algorithms, GEP has difficulty finding appropriate numeric constants for terminal nodes in the expression trees. In this work, we describe a new approach of constant generation using Differential Evolution (DE), a real-valued GA robust and efficient at parameter optimization. Our experimental results on two symbolic regression problems show that the approach significantly improves the performance of the GEP algorithm. The proposed approach can be easily extended to other Genetic Programming variations.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126708583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Glenn Fung, R. Seigneuric, Sriram Krishnan, R. B. Rao, B. Wouters, P. Lambin
{"title":"Reducing a Biomarkers List via Mathematical Programming: Application to Gene Signatures to Detect Time-Dependent Hypoxia in Cancer","authors":"Glenn Fung, R. Seigneuric, Sriram Krishnan, R. B. Rao, B. Wouters, P. Lambin","doi":"10.1109/ICMLA.2007.61","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.61","url":null,"abstract":"In biology and medical sciences, highly parallel biological assays spurred a revolution leading to the emergence of the '-omics' era. Dimensionality reduction techniques are necessary to be able to analyze, interpret, validate and take advantage of the tremendous wealth of highly dimensional data they provide. This paper is based on a DNA microarray study providing gene signatures for hypoxia. These gene signatures were tested on a large breast cancer data set for assessing their prognostic power by means of Kaplan-Meier survival, univariate, and multivariate analyses. We explore the use of several mathematical programming-based techniques that aim to reduce the gene signature sizes as much as possible while maintaining the key characteristics of the original signature, more precisely: the signature prognostic and diagnostic significance. The proposed signature reduction techniques have very interesting potential uses. Indeed, by downsizing the relevant data to a manageable size, one can then patent the core set of biomarkers and also create a dedicated assay (e.g.: on a customized array) for routine applications (e.g.: in the clinical set up) leading to individualized medicine capabilities. Our experiments show that the reduced hypoxia signatures reproduced qualitatively and quantitatively in a similar way that of the original ones.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121073562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning in Biomedicine and Bioinformatics Using Affinity Propagation","authors":"B. Frey","doi":"10.1109/ICMLA.2007.127","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.127","url":null,"abstract":"Data sets arising in biomedicine and bioinformatics are often huge and consist of quite different types of data (eg, sequence data and microarray measurements). Consequently, standard machine learning techniques usually cannot be directly applied. In this talk, I will describe an algorithm called affinity propagation and discuss why it offers flexibility in analyzing the kinds of data sets arising in bioinformatics and biomedicine. I'll describe applications in the areas of whole-genome transcript detection using microarrays, image segmentation, text analysis and motif discovery. Affinity propagation can implemented in a couple dozen lines of MATLAB or C and is suitable for distributed computing environments, making it attractive for high-throughput computations. Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking x when a biomarker has been discovered and how important it has become for the understanding of the disease over time, x if a biomarker has been \" replaced \" or complemented by another, more informative biomarker, x at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122171263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2IBGSOM: interior and irregular boundaries growing self-organizing maps","authors":"T. Ayadi, T. M. Hamdani, A. Alimi, M. A. Khabou","doi":"10.1109/ICMLA.2007.89","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.89","url":null,"abstract":"In this paper, we introduce a new variant of growing self-organizing maps (GSOM) based on Alahakoon's algorithm for SOM training; so called 2IBGSOM (interior and irregular boundaries growing self-organizing maps). It's dynamically evolving structure for SOM, which allocates map size and shape during the unsupervised training process. 2IBGSOM starts with a small number of initial nodes and generates new nodes from the boundary and the interior of the network. 2IBGSOM represents the structure of the training data as accurately as possible. Our proposed method was tested on real world databases and showed better performance than the classical SOM and the growing grid (GG) algorithms. Three criteria were used to compare the above algorithms with our proposed method; the quantization error; the topological error and the labeling error to have more accuracy on the produced structure. Results report that 2IBGSOM shows a very good capacity of estimation for the training data based on the three tested factors.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127842748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning for Information Management: Some Promising Directions","authors":"William W. Cohen","doi":"10.1109/ICMLA.2007.123","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.123","url":null,"abstract":"Management of personal information such as email messages, calendar entries, to-do items, and workstation documents is one of the most highly visible current uses of computer technology. I will present experimental evidence that machine learning techniques can be effectively used to improve personal information management tools in two ways. First, machine learning can be used to improve performance on certain types of difficult searches, notably searches that require some awareness of context. Second, machine learning can be used to reduce the chance of certain high-cost errors. One type of high-cost error we consider is the “dropped ball”—i.e., losing track of a task that has been delegated, in part or whole, to others. The second type of high-cost error is an “email leak”—i.e., mistakenly sending a sensitive email message to the wrong recipient.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"401 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128519418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amit U. Sinha, Mukta Phatak, Raj Bhatnagar, Anil G. Jegga
{"title":"Identifying Functional Binding Motifs of Tumor Protein p53 Using Support Vector Machines","authors":"Amit U. Sinha, Mukta Phatak, Raj Bhatnagar, Anil G. Jegga","doi":"10.1109/ICMLA.2007.46","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.46","url":null,"abstract":"Identification of transcription factor binding site in DNA sequences is a frequently performed task in bioinformatics. However, current methods of search produce a large number of false positives as these motifs are short and degenerate. We propose an implicit model of cooperative binding of transcription factors. We hypothesize that flanking regions of binding sites have a different composition compared to regions which do not have that binding site. Using statistically significant motifs in flanking region of true binding sites as features, we design a SVM classifier for discriminating true binding sites from false positives. We demonstrate the effectiveness of our method on a data set of experimentally verified p53 binding sites. We were able to obtain an overall accuracy of 80% and 76% on cross- validation and independent test set, respectively. By analyzing the features, we identified known as well as potentially new binding partners of p53.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126609423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiyong Guo, Hongyu Li, Wenbin Chen, I-Fan Shen, Jussi Parkkinen
{"title":"Manifold clustering via energy minimization","authors":"Qiyong Guo, Hongyu Li, Wenbin Chen, I-Fan Shen, Jussi Parkkinen","doi":"10.1109/ICMLA.2007.43","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.43","url":null,"abstract":"Manifold clustering aims to partition a set of input data into several clusters each of which contains data points from a separate, simple low-dimensional manifold. This paper presents a novel solution to this problem. The proposed algorithm begins by randomly selecting some neighboring orders of the input data and defining an energy function that is described by geometric features of underlying manifolds. By minimizing such energy using the tabu search method, an approximately optimal sequence could be found with ease, and further different manifolds are separated by detecting some crucial points, boundaries between manifolds, along the optimal sequence. We have applied the proposed method to both synthetic data and real image data and experimental results show that the method is feasible and promising in manifold clustering.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122356389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary Sound Matching: A Test Methodology and Comparative Study","authors":"Thomas J. Mitchell, David P. Creasey","doi":"10.1109/ICMLA.2007.34","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.34","url":null,"abstract":"With the ever-increasing complexity of sound synthesisers, there is a growing demand for automated parameter estimation and sound space navigation techniques. Recent research in this domain has focused on the application of general-purpose evolutionary algorithms to match specific types of target sounds. However, it is difficult to establish whether success or failure of a particular match is due to the inefficiency of the optimisation engine, or the limitations of the matching synthesiser. In this paper the distinction between optimiser inefficiency and synthesiser limitations is elucidated with a contrived target test methodology that enables the performance of different optimisation techniques to be measured and compared. The methodology is applied to a Frequency Modulation synthesiser, in order to compare the performance of different Evolution Strategy-based algorithms. The algorithm producing the best results with contrived targets is then used to match a non-contrived acoustic instrument tone.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning complex problem solving expertise from failures","authors":"Cristina Boicu, G. Tecuci, Mihai Boicu","doi":"10.1109/ICMLA.2007.42","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.42","url":null,"abstract":"Our research addresses the issue of developing knowledge-based agents that capture and use the problem solving knowledge of subject matter experts from diverse application domains. This paper emphasizes the use of negative examples in agent learning by presenting several strategies for capturing expert's knowledge when the agent fails to correctly solve a problem. These strategies have been implemented into the disciple learning agent shell and used in complex application domains such as intelligence analysis, center of gravity determination, and emergency response planning.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116731730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ronald C. Taylor, Mudita Singhal, D. S. Daly, Kelly Domico, Amanda M. White, D. Auberry, K. Auberry, Brian Hooker, Gregory B. Hurst, Jason E. McDermott, W. H. McDonald, Dale A. Pelletier, Denise Schmoyer, William R. Cannon
{"title":"SEBINI-CABIN: An Analysis Pipeline for Biological Network Inference, with a Case Study in Protein-Protein Interaction Network Reconstruction","authors":"Ronald C. Taylor, Mudita Singhal, D. S. Daly, Kelly Domico, Amanda M. White, D. Auberry, K. Auberry, Brian Hooker, Gregory B. Hurst, Jason E. McDermott, W. H. McDonald, Dale A. Pelletier, Denise Schmoyer, William R. Cannon","doi":"10.1109/ICMLA.2007.63","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.63","url":null,"abstract":"The Software Environment for Biological Network Inference (SEBINI) has been created to provide an interactive environment for the deployment and testing of network inference algorithms that use high-throughput expression data. Networks inferred from the SEBINI software platform can be further analyzed using the Collective Analysis of Biological Interaction Networks (CABIN), software that allows integration and analysis of protein- protein interaction and gene-to-gene regulatory evidence obtained from multiple sources. In this paper, we present a case study on the SEBINI and CABIN tools for protein-protein interaction network reconstruction. Incorporating the Bayesian Estimator of Protein-Protein Association Probabilities (BEPro) algorithm into the SEBINI toolkit, we have created a pipeline for structural inference and supplemental analysis of protein- protein interaction networks from sets of mass spectrometry bait-prey experiment data.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122277233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}