V. Desai, P. Khatri, A. Done, Aviva Friedman, M. Tainsky, S. Drăghici
{"title":"A Novel Bioinformatics Technique For Predicting Condition-Specific Transcription Factor Binding Sites","authors":"V. Desai, P. Khatri, A. Done, Aviva Friedman, M. Tainsky, S. Drăghici","doi":"10.1109/CIBCB.2005.1594918","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594918","url":null,"abstract":"The advent of high throughput sequencing and DNA microarray technologies along with the advances in bioinformatics have revolutionized biological research in the recent years. However, the precise mechanisms that control gene expression are largely unknown despite the numerous efforts to understand them. We describe a bioinformatics technique that can potentially identify condition-specific transcription factor binding sites. We applied our technique to cellular immortalization data set. Our analysis revealed similarities in upstream regions of CXCL gene family that explain condition-specific differential expression of genes CXCL1 and CXCL2, versus gene CXCL3.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thorhildur Juliusdottir, D. Corne, E. Keedwell, A. Narayanan
{"title":"Two-Phase EA/k-NN for Feature Selection and Classification in Cancer Microarray Datasets","authors":"Thorhildur Juliusdottir, D. Corne, E. Keedwell, A. Narayanan","doi":"10.1109/CIBCB.2005.1594891","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594891","url":null,"abstract":"Efficient and reliable methods that can find a small sample of informative genes amongst thousands are of great importance. In this area, much research is investigating the combination of advanced search strategies (to find subsets of features), and classification methods. We investigate a simple evolutionary algorithm/classifier combination on two microarray cancer datasets, where this combination is applied twice – once for feature selection, and once for further selection and classification. Our contribution are: (further) demonstration that a simple EA/classifier combination is capable of good feature discovery and classification performance with no initial dimensionality reduction; demonstration that a simple repeated EA/k-NN approach is capable of competitive or better performance than methods using more sophisticated preprocessing and classifer methods; new and challenging results on two public datasets with clear explanation of experimental setup; review material on the EA/kNN area; and specific identification of genes that our work suggests are significant regarding colon cancer and prostate cancer.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115815733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Selection for Microarray Data Using Least Squares SVM and Particle Swarm Optimization","authors":"E. Tang, P. N. Suganthan, X. Yao","doi":"10.1109/CIBCB.2005.1594892","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594892","url":null,"abstract":"Feature selection is an important preprocessing technique for many pattern recognition problems. When the number of features is very large while the number of samples is relatively small as in the micro-array data analysis, feature selection is even more important. This paper proposes a novel feature selection method to perform gene selection from DNA microarray data. The method originates from the least squares support vector machine (LSSVM). The particle swarm optimization (PSO) algorithm is also employed to perform optimization. Experimental results clearly demonstrate good and stable performance of the proposed method.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127680157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roman Eisner, B. Poulin, D. Szafron, P. Lu, R. Greiner
{"title":"Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology","authors":"Roman Eisner, B. Poulin, D. Szafron, P. Lu, R. Greiner","doi":"10.1109/CIBCB.2005.1594940","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594940","url":null,"abstract":"High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of protein function prediction. We leverage the hierarchical structure of the ontology in two ways. First, we present a method of creating hierarchy-aware training sets for machine-learned classifiers and we show that, in the case of GO molecular function, it is the most accurate method compared to not considering the hierarchy during training. Second, we use the hierarchy to reduce the computational cost of classification. We also introduce a sound methodology for evaluating hierarchical classifiers using global cross-validation. Biologists often use sequence similarity (e.g. BLAST) to identify a \" nearest neighbor\" sequence and use the database annotations of this neighbor to predict protein function. In these cases, we use the hierarchy to improve accuracy by a small amount. When no similar sequences can be found (which is true for up to 40% of some common proteomes), our technique can improve accuracy by a more significant amount. Although this paper focuses on a specific important application-protein function prediction for the GO hierarchy-the techniques may be applied to any classification problem over a hierarchical ontology.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129812511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Features from Protein Sequences Using Chinese Segmentation Techniques for Subcellular Localization","authors":"Yang Yang, Bao-Liang Lu","doi":"10.1109/CIBCB.2005.1594931","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594931","url":null,"abstract":"This paper proposes a new method for extracting features from protein sequences to deal with the problem of protein subcellular localization. The idea behind the method arises from Chinese segmentation techniques. We regard the amino acid sequences as text and segment them into words in a non-overlapping way. The words are predefined in a dictionary, which includes valuable words according to some criteria. Every word in the dictionary will be assigned a weight, and a matching strategy called maximum weight product is adopted for segmentation. By recording word frequencies, a given sequence can be converted into a feature vector. To evaluate the effectiveness of the proposed feature extraction method, two different kinds of classifiers are used to predict protein subcellular locations. The experimental results show that our method is superior to existing approaches in classification accuracy and reduces the number of dimensions of feature space at the same time.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114201678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Spike Communication under Noisy Environments","authors":"N. Homma, K. Fuchigami, M. Gupta","doi":"10.1109/CIBCB.2005.1594913","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594913","url":null,"abstract":"In this paper, we analyze neural spike dynamics of a double feedback neural unit (DFNU) and its networks. An essential emphasis of the DFNU is not only on its simple formulations that can provide quantitative analytic results, but also on physiological plausibility of the dynamics that is comparable to that of Hodgkin-Huxley’s model. The results suggest that a proportional coding of neural information on firing frequency may not be always reasonable due to sensitivity to noisy inputs especially for low-frequency firings. On the other hand, high-frequency firings are relatively appropriate for a neural informational carrier due to the reliability and robustness to noisy inputs. It is also demonstrated by simulation studies that use of noisy inputs can enhance the dynamic neural performances.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127218118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Algorithm for Finding Weak Motifs","authors":"X. Yang, Jagath Rajapakse","doi":"10.1109/CIBCB.2005.1594948","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594948","url":null,"abstract":"The Challenge Problem posed by Pevzner et al. showed that special algorithms are needed to detect weak motifs in bio-sequences, where the classical approaches, such as MEME and Gibbs Sampler, fail. Though several algorithms have since been developed to solve the weak motif recognition problem, their focus has been on exact datasets and their performances show poor tolerance to the noisy datasets, i.e., for datasets bearing sequences without any motif instances. We propose a novel approach to find weak motifs that is robust to noise in the datasets. The experiments with synthetic datasets show that our algorithm has less running time and higher accuracy in detecting weak motifs over the existing approaches and is more robust to the presense of noise. The application of the algorithm on some promoter datasets from yeast genomes found previously-proven binding sites.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Variation Operator for More Rapid Evolution of DNA Error Correcting Codes.","authors":"D. Ashlock, S. Houghten","doi":"10.1109/CIBCB.2005.1594898","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594898","url":null,"abstract":"Error correcting codes over the edit metric have been used as embedded DNA markers in at least one sequencing project. The algorithm used to construct those codes was an evolutionary algorithm with a fitness function with exponential time complexity. Presented here is an substantially faster evolutionary algorithm for locating error correcting codes over the edit metric that exhibits either equivalent or only slightly inferior performance on test cases. The new algorithm can produce codes for parameters where the run-time of the earlier algorithm was prohibitive. The new algorithm is a novel type of evolutionary algorithm using a greedy algorithm to implement a variation operator. This variation operator is the sole variation operator used and has unary, binary, and k-ary forms. The unary and binary forms are compared, with the binary form being found superior. Population size and the rate of introduction of random material by the variation operator are also studied. A high rate of introduction of random material and a small population size are found to be the best.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126898925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runxuan Zhang, N. Sundararajan, G. Huang, P. Saratchandran
{"title":"An Efficient Sequential RBF Network for Gene Expression-Based Multi-category classification","authors":"Runxuan Zhang, N. Sundararajan, G. Huang, P. Saratchandran","doi":"10.1109/CIBCB.2005.1594925","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594925","url":null,"abstract":"This paper presents a fast and efficient sequential learning method for RBF networks that can perform classification directly for multi-category cancer diagnosis problems based on microarray data. The recently developed algorithm, referred to as Fast Growing And Pruning-RBF (FGAP-RBF) can perform incremental learning on the future data directly. No training of all the previous data is needed. This character can reduce the learning complexity and improve the learning efficiency and is greatly favored in the real implementation of a gene expression-based cancer diagnosis system. We have evaluated FGAP-RBF algorithm on a benchmark multi-category cancer diagnosis problem based on microarray data, namely GCM dataset. The results indicate that compared with the results available in literature FGAP-RBF algorithm produces a higher classification accuracy with reduced training time and implementation complexity.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114478365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segment and Combine Approach for Biological Sequence Classification","authors":"P. Geurts, Antia Blanco Cuesta, L. Wehenkel","doi":"10.1109/CIBCB.2005.1594917","DOIUrl":"https://doi.org/10.1109/CIBCB.2005.1594917","url":null,"abstract":"This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences.","PeriodicalId":330810,"journal":{"name":"2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129791394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}