E. Uberbacher, David Thomassen, A. Patrinos, Gary Johnson, C. Oliver, M. Frazier
{"title":"Computing for the DOE genomes to life program","authors":"E. Uberbacher, David Thomassen, A. Patrinos, Gary Johnson, C. Oliver, M. Frazier","doi":"10.1109/CSB.2003.1227298","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227298","url":null,"abstract":"A key goal of the DOE office of science's systems biology program, genomes to life (GTL), is to achieve, over the next 10 to 20 years, a basic understanding of thousands of environmental microbes and microbial systems in their native environments. This goal demands that we develop new models for scientific discovery that integrate methods and systems which tightly couple advanced computing, mathematics, algorithms, and data-management technologies with large-scale experimental data generation.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127041310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D structural homology detection via unassigned residual dipolar couplings","authors":"C. Langmead, B. Donald","doi":"10.1109/CSB.2003.1227320","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227320","url":null,"abstract":"Recognition of a protein's fold provides valuable information about its function. While many sequence-based homology prediction methods exist, an important challenge remains: two highly dissimilar sequences can have similar folds - how can we detect this rapidly, in the context of structural genomics? High-throughput NMR experiments, coupled with novel algorithms for data analysis, can address this challenge. We report an automated procedure for detecting 3D structural homologies from sparse, unassigned protein NMR data. Our method identifies the 3D structural models in a protein structural database whose geometries best fit the Unassigned experimental NMR data. It does not use sequence information and is thus not limited by sequence homology. The method can also be used to confirm or refute structural predictions made by other techniques such as protein threading or sequence homology. The algorithm runs in O(pnk/sup 3/) time, where p is the number of proteins in the database, n is the number of residues in the target protein, and k is the resolution of a rotation search. The method requires only uniform /sup 15/N-labelling of the protein and processes unassigned /sup H/N-/sup 15/N residual dipolar couplings, which can be acquired in a couple of hours. Our experiments on NMR data from 5 different proteins demonstrate that the method identifies closely related protein folds, despite low-sequence homology between the target protein and the computed model.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114171162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum redundancy feature selection from microarray gene expression data","authors":"C. Ding, Hanchuan Peng","doi":"10.1109/CSB.2003.1227396","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227396","url":null,"abstract":"Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126940033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Substrate recognition by enzymes: a theoretical study","authors":"Kaori Ueno-Noto, K. Takano, M. Hara-Yokoyama","doi":"10.1109/CSB.2003.1227373","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227373","url":null,"abstract":"We previously reported that a series of gangliosides inhibited the activity of an enzyme NAD glycohydrolase (CD38), and that those with tandem sialic acid residues in the sugar chain had great inhibitory effect. We describe the results of computer simulations on three-dimensional and electronic structures of gangliosides to clarify the causative factors of difference in the inhibitory effect and the recognition mechanisms of the enzyme. We found that dipole moments and HOMO were correlated with inhibitory effect by conformational analyses and molecular orbital (MO) calculations. CD38 is likely to recognize the two carboxyl groups in tandem sialic acid residues of gangliosides, as well as the phosphate groups in NAD. A strong correlation was found between the orbital energies of HOMO by MO calculations and the extent of the inhibitory effect. Salvation effects were also considered to interpret the substrate recognition mechanisms in the biological system, which supported the above results.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126762218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preliminary wavelet analysis of genomic sequences","authors":"Jianchang Ning, Charles N. Moore, J. Nelson","doi":"10.1109/CSB.2003.1227391","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227391","url":null,"abstract":"Large genome-sequencing projects have made urgent the development of accurate methods for annotation of DNA sequences. Existing methods combine ab inito pattern searches with knowledge gathered from comparison with sequence databases or from training sets of known genes. However, the accuracy of these methods is still far from satisfactory. In the present study, wavelet algorithms in combination with entropy method are being developed as an alternative way to determine gene locations in genomic DNA sequences. Wavelet methods seek periodicity present in sequences. A promising advantage of wavelets is their adaptivity to varying lengths of coding/noncoding regions. Moreover, the wavelet methods integrated with entropy method just search the information contents of the sequences, which do not need to be trained. The preliminary results show that the wavelet approach is feasible and may be better than some knowledge-dependent approaches based on a sample of genomic DNA sequences.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124516394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Eckman, K. Deutsch, M. Janer, Z. Lacroix, L. Raschid
{"title":"A query language to support scientific discovery","authors":"B. Eckman, K. Deutsch, M. Janer, Z. Lacroix, L. Raschid","doi":"10.1109/CSB.2003.1227340","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227340","url":null,"abstract":"Traditional data management approaches need to be leveraged to support scientific discovery. A query language that supports biological science must be capable of expressing and implementing biological investigations. The biological query language (BQL) presented in this paper aims to enhance the scientists querying ability by: (1) providing an intermediate query language between scientific workflows and traditional query languages such as SQL, (2) expressing operators such as ranking and validating, not made directly available by traditional query languages and often difficult to express (by complex queries), and (3) constraining the evaluation of their operators by various semantics. This paper shows step by step how the overall problem of identifying genes related to a disease may be translated into a succession of BQL queries.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133668330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gene selection for cancer classification using bootstrapped genetic algorithms and support vector machines","authors":"Xue-wen Chen","doi":"10.1109/CSB.2003.1227389","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227389","url":null,"abstract":"The gene expression data obtained from microarrays have shown useful in cancer classification. DNA microarray data have extremely high dimensionality compared to the small number of available samples. In this paper, we propose a novel system for selecting a set of genes for cancer classification. This system is based on a linear support vector machine and a genetic algorithm. To overcome the problem of the small size of training samples, bootstrap methods are combined into genetic search. Two databases are considered: the colon cancer database and the leukemia database. Our experimental results show that the proposed method is capable of finding genes that discriminate between normal cells and cancer cells and generalizes well.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134318847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Fridlyand, A. Snijders, D. Pinkel, D. Albertson, Ajay N. Jain
{"title":"Statistical issues in the analysis of the array CGH data","authors":"J. Fridlyand, A. Snijders, D. Pinkel, D. Albertson, Ajay N. Jain","doi":"10.1109/CSB.2003.1227347","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227347","url":null,"abstract":"The development of solid tumors is associated with acquisition of complex genetic alterations, indicating that failures in the mechanisms that maintain the integrity of the genome contribute to tumor evolution. Thus, one expects that the particular types of genomic derangement seen in tumors reflect underlying failures in maintenance of genetic stability, as well as selection for changes that provide growth advantage. In order to investigate genomic alterations we are using microarray-based comparative genomic hybridization (array CGH). The computational task is to map and characterize the number and types of copy number alterations present in the tumors, and so define copy number phenotypes as well as to associate them with known biological markers. To utilize the spatial coherence between nearby clones, we use unsupervised Hidden Markov Models approach. The clones are partitioned into the states which represent underlying copy number of the group of clones. The method is demonstrated on the two cell line datasets with known copy number alterations for one of them. The biological conclusions drawn from the analyses are discussed.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131327918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of nonrandom patterns in structural and mutational data: the case of prion protein","authors":"I. B. Kuznetsov, S. Rackovsky","doi":"10.1109/CSB.2003.1227420","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227420","url":null,"abstract":"Prion diseases (mad cow disease, CJD, etc.) are a group of fatal neurodegenerative disorders associated with structural conversion of a normal, mostly /spl alpha/-helical cellular prion protein (PrP) into a pathogenic /spl beta/-sheet-rich conformation. Little is known about which parts of PrP undergo conformational transition and how disease associated mutations facilitate this transition. In this work, we utilize a computational statistical approach to detect unusual patterns in prion protein, (i) We construct a novel entropic index which provides a quantitative measure of context-dependent conformational flexibility of a sequence fragment. This index is used to study conformational flexibility of PrP fragments, (ii) We identify PrP fragments that show unusual intrinsic structural propensities. (Hi) We estimate the statistical significance of clusters of disease-associated PrP mutations using a stochastic model of mutational process with unequal substitution rates and context-dependent mutational hot spots.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133838240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features","authors":"Tolga Can, Yuan-fang Wang","doi":"10.1109/CSB.2003.1227316","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227316","url":null,"abstract":"We present a new method for conducting protein structure similarity searches, which improves on the accuracy, robustness, and efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. To improve matching accuracy, we smooth the noisy raw atomic coordinate data with spline fitting. To improve matching efficiency, we adopt a hierarchical coarse-to-fine strategy. We use an efficient hashing-based technique to screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to discover new, meaningful motifs that were not reported by other structure alignment methods.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131242646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}