{"title":"Designing artificial organisms for use in biological simulations","authors":"W. Ashlock, D. Ashlock","doi":"10.1109/CIBCB.2011.5948463","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948463","url":null,"abstract":"In this paper we investigate two types of artificial organism which have the potential to be useful in biological simulations at the genomic level, such as simulations of speciation or gene interaction. Biological problems of this type are usually studied either with simulations using artificial genes that are merely evolving strings with no phenotype, ignoring the possibly crucial contribution of natural selection, or with real biological data involving so much complexity that it is difficult to sort out the important factors. This research provides a middle ground. The artificial organisms are: gridwalkers (GWs), a variation on the self-avoiding walk problem, and plus-one-recall-store (PORS), a simple genetic programming maximum problem implemented with a context free grammar. Both are known to have rugged multimodal fitness landscapes. We define a new variation operator, a kind of aligned crossover for variable length strings, which we call Smith-Waterman crossover. The problems, using Smith-Waterman crossover, size-neutral crossover (a kind of non-aligned crossover defined in [3]), mutation only, and horizontal gene transfer (such as occurs in biology with retroviruses) are explored. We define a measure called fitness preservation to quantify the differences in their fitness landscapes and to provide guidance to researchers in determining which problem/variation operator set is best for their simulation.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127300972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting coding region candidates in the DNA sequence based on visualization without training","authors":"Bo Chen, P. Ji","doi":"10.1109/CIBCB.2011.5948454","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948454","url":null,"abstract":"Identifying the protein coding regions in the DNA sequence is an active issue in computational biology. Presently, there are many outstanding methods in predicting the coding regions with extreme high accuracy, after conducting preceding training process. However, the training dependence may reduce adaptability of the methods, particularly for new sequences from unknown organisms with no or small training sets. In this paper, we firstly present a Self Adaptive Spectral Rotation (SASR) approach, which was first introduced in a previous work published in Nucleic Acids Research. This approach is adopted to visualize the Triplet Periodicity (TP) property, which is a simple and universal coding related property. After that, we use a segmentation technique to computationally analyze the visualization and provide a numerical prediction of the coding region candidates in the DNA sequence. This approach does not require any training process, so it can work before any extra information is available, especially is helpful when dealing with new sequences from unknown organisms. Hence, it could be an efficient tool for coding region prediction in the early stage study.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124338439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regularized linear discriminant analysis and its recursive implementation for gene subset selection","authors":"K. Mao, Feng Yang, W. Tang","doi":"10.1109/CIBCB.2011.5948468","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948468","url":null,"abstract":"Although mostly used for pattern classification, linear discriminant analysis (LDA) may also be used for feature selection. When employed to select genes for microarray data, which has high dimensionality and small sample size, LDA encounters three problems, including singularity of scatter matrix, overfitting and prohibitive computational complexity. In this study, we propose a new regularization technique to address the singularity and overfitting problem. In addition, we develop a recursive implementation for LDA to reduce computational overhead. Experimental studies on 5 gene microarray problems show that the regularized linear discriminant analysis (RLDA) and its recursive implementation produce gene subsets with excellent classification performance.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124788294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Gisbrecht, B. Hammer, Frank-Michael Schleif, Xibin Zhu
{"title":"Accelerating kernel clustering for biomedical data analysis","authors":"A. Gisbrecht, B. Hammer, Frank-Michael Schleif, Xibin Zhu","doi":"10.1109/CIBCB.2011.5948460","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948460","url":null,"abstract":"The increasing size and complexity of modern data sets turns modern data mining techniques to indispensable tools when inspecting biomedical data sets. Thereby, dedicated data formats and detailed information often cause the need for problem specific similarities or dissimilarities instead of the standard Euclidean norm. Therefore, a number of clustering techniques which rely on similarities or dissimilarities only have recently been proposed. In this contribution, we review some of the most popular dissimilarity based clustering techniques and we discuss possibilities how to get around the usually squared complexity of the models due to their dependency on the full dissimilarity matrix. We evaluate the techniques on two benchmarks from the biomedical domain.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Organizational texts classification using artificial immune recognition systems","authors":"N. Forouzideh, M. Mahmoudi, K. Badie","doi":"10.1109/CIBCB.2011.5948456","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948456","url":null,"abstract":"This paper outlines the use of Artificial Immune Recognition System (AIRS) within the field of text/document classification. Various versions of AIRS including AIRS1, AIRS2, Parallel AIRS and Modified AIRS with Fuzzy KNN are applied to classify the mode of a text's content which is organized for helping users with their organizational tasks. In this regard, 7 major features as inputs with 3 nominal values of Low, Medium, and High are chosen to classify texts into 6 organizational functionality classes. Results of experimentation on a dataset including 540 data show the fact that different versions of AIRS, performs better compared to multi-layer perceptron and radial basis function as simple neural approaches. Due to the high performance of this approach, it is expected to be successfully applicable to a wide range of content mode classification issues in decision support environment.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125214886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An incremental method for mosaicing of optical microscope imagery","authors":"L. Carozza, A. Bevilacqua, F. Piccinini","doi":"10.1109/CIBCB.2011.5948458","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948458","url":null,"abstract":"Digital imaging is nowadays widely employed in the field of optical microscopy. One of the most apparent benefits consists in the possibility for the researcher to see the whole biological sample in one image, achieved by collecting all the parts being inspected. Common approaches work in batch mode and rely on known motorized x–y stage offsets of the microscope holder. Or alternatively, the methods are conceived just to provide visually pleasant mosaics off-line, that are often built by altering the photometric values or the geometric properties of the original component images. This work presents an incremental mosaicing method for optical microscopy imagery, compliant with on-line requirements and suitable even for non-motorized microscopes. The resulting mosaics are very accurate and preserve the consistency of the original images so to be used for further global measurement steps. Nevertheless, the mosaics are visually pleasant so to be used for visual inspection as well. The experimental results obtained in different biological examinations confirm the efficacy of our approach.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126332340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simulation of bacterial communities","authors":"D. Ashlock, Andrew McEachern","doi":"10.1109/CIBCB.2011.5948465","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948465","url":null,"abstract":"This study constructs and tests an agent-based model of bacterial communities with the goal of modeling the observation that the majority of bacteria in nature cannot be cultured. The new field of metagenomics, the direct, mass sequencing of DNA recovered from the environment, is the source of this observation. The hypothesis tested is that bacteria form interdependent communities so that viable levels of energy production are rare in bacteria when they are grown in monoculture. A new game, the metabolism game is introduced. Agents produce energy by playing this game with one another. Studies are run with different number of bacterial species in the simulation. The energy level for viability is set by running simulations with a single bacterial species and then the hypothesis is tested in simulations with multiple bacterial species. Multiple bacterial species are evolved in a novel type of multi-population evolutionary algorithm called a multiple worlds algorithm. The fraction of culturable bacterial agents recovered from the simulation is larger than that found in nature but still quite low, supporting the hypothesis that bacteria may not be culturable because they require the presence of partner species.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126351326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combined covariance model for non-coding RNA gene finding","authors":"Wenbo Jiang, K. Wiese","doi":"10.1109/CIBCB.2011.5948474","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948474","url":null,"abstract":"The use of covariance models in finding non-coding RNA gene members in genome sequence databases has been shown quite effective in many studies. However, it has a significant drawback, which is the very large computational burden. A combined covariance model is proposed to reduce the search complexity when a genome sequence is searched for more than one ncRNA gene family. The covariance models that are combined are selected using a hierarchical clustering algorithm. This study shows that when a small number of original covariance models are combined, the combined covariance model can find members from all original ncRNA families thus successfully reducing the search time.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133997135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward W. Lowe, Mariusz Butkiewicz, Matthew Spellings, A. Omlor, J. Meiler
{"title":"Comparative analysis of machine learning techniques for the prediction of logP","authors":"Edward W. Lowe, Mariusz Butkiewicz, Matthew Spellings, A. Omlor, J. Meiler","doi":"10.1109/CIBCB.2011.5948478","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948478","url":null,"abstract":"Several machine learning techniques were evaluated for the prediction of logP. The algorithms used include artificial neural networks (ANN), support vector machines (SVM) with the extension for regression, and kappa nearest neighbor (k-NN). Molecules were described using optimized feature sets derived from a series of scalar, two- and three-dimensional descriptors including 2-D and 3-D autocorrelation, and radial distribution function. Feature optimization was performed as a sequential forward feature selection. The data set contained over 25,000 molecules with experimentally determined logP values collected from the Reaxys and MDDR databases, as well as data mining through SciFinder. LogP, the logarithm of the equilibrium octanol-water partition coefficient for a given substance is a metric of the hydrophobicity. This property is an important metric for drug absorption, distribution, metabolism, and excretion (ADME). In this work, models were built by systematically optimizing feature sets and algorithmic parameters that predict logP with a root mean square deviation (rmsd) of 0.86 for compounds in an independent test set. This result presents a substantial improvement over XlogP, an incremental system that achieves a rmsd of 1.41 over the same dataset. The final models were 5-fold cross-validated. These fully in silico models can be useful in guiding early stages of drug discovery, such as virtual library screening and analogue prioritization prior to synthesis and biological testing. These models are freely available for academic use.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133106427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Derivation of minimum best sample size from microarray data sets: A Monte Carlo approach","authors":"Chengpeng Bi, M. Becker, J. Leeder","doi":"10.1109/CIBCB.2011.5948461","DOIUrl":"https://doi.org/10.1109/CIBCB.2011.5948461","url":null,"abstract":"NCBI has been accumulating a large repository of microarray data sets, namely Gene Expression Omnibus (GEO). GEO is a great resource enabling one to pursue various biological and pathological questions. The question we ask here is: given a set of gene signatures and a classifier, what is the best minimum sample size in a clinical microarray research that can effectively distinguish different types of patient responses to a therapeutic drug. It is difficult to answer the question since the sample size for most microarray experiments stored in GEO is very limited. This paper presents a Monte Carlo approach to simulating the best minimum microarray sample size based on the available data sets. Support Vector Machine (SVM) is used as a classifier to compute prediction accuracy for different sample size. Then, a logistic function is applied to fit the relationship between sample size and accuracy whereby a theoretic minimum sample size can be derived.","PeriodicalId":395505,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132830978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}