{"title":"New computational methods for electrostatics in macromolecular simulation","authors":"I. Tsukerman","doi":"10.1109/CSB.2003.1227371","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227371","url":null,"abstract":"Computer simulation of long-range electrostatic forces in macromolecular simulation is quite challenging due to a large number of charges involved, varying dielectric constants, and ionic interactions in the solvent. The paper introduces new difference schemes that can incorporate any desired analytical approximation of the electrostatic potential (e.g., its singular Coulombic or dipole terms) exactly, and with little computational overhead. Numerical experiments for explicit solvent models show 1-2 orders of magnitude higher accuracy in the computed energy and force, as compared to conventional Ewald summation methods with comparable parameters.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116482665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Riptide: fast protein identification from mass spectrometer data","authors":"R. Carter","doi":"10.1109/CSB.2003.1227344","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227344","url":null,"abstract":"The biotechfirm target discovery incorporated (TDI) has developed a relatively fast and inexpensive method for protein identification. The final step in their approach involves an algorithm to deduce the terminal amino acid sequence of an unknown intact protein from its fragmentation mass spectrum. TDI's web-published algorithm was taken as a starting point for further research. The algorithm Riptide was developed that matches the output of TDI's algorithm, but demonstrates a 193X speed improvement on a 6-deep sequencing.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121647706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple protein structure alignment by deterministic annealing","authors":"Luonan Chen","doi":"10.1109/CSB.2003.1227421","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227421","url":null,"abstract":"In this paper, we propose a novel method for solving multiple structure alignment problem, based on mean field annealing technique. We define the structure alignment as a mixed integer-programming (MIP) problem with the inter-atomic distances between two or more structures as an objective function[1]. The integer variables represent the marchings among structures whereas the continuous variables are translation vectors and rotation matrices with each protein structure as a rigid body. By exploiting the special structure of continuous partial problem, we transform the MIP into a nonlinear optimization problem (NOP) with a nonlinear objective function and linear constraints, based on mean field equations. To optimize the NOP, a mean field annealing procedure is adopted with a modified Potts spin model[2]. Since all linear constraints are embedded in the mean field equations, we do not need to add any penalty terms of the constraints to the error function. In other words, there is no \"soft constraint\" in our mean field model and all constraints are automatically satisfied during the annealing process, thereby not only making the optimization more efficiently but also eliminating unnecessary parameters of penalty that usually require careful tuning dependent on the problems.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"PP 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126356120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two operative concepts for the post-genomic era: the \"memoire vive\" of the cell and a molecular algebra","authors":"S. Bentolila","doi":"10.1109/CSB.2003.1227310","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227310","url":null,"abstract":"The first successes in cloning experiments and stem cell \"reprogramming\" have already demonstrated the primordial role of cellular working-space memory and regulatory mechanisms, which use the knowledge stored in the DNA database in read mode. We present an analogy between living systems and informatics systems by considering: 1) the cell cytoplasm as a memory device accessible as read/write; 2) the mechanisms of regulation as a programming language defined by a grammar, a molecular algebra; 3) biological processes as volatile programs which are executed without being written; 4) DNA as a database in read only mode. We also present applications to two biological algorithms: the immune response and glycogen metabolism.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126471039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On gene prediction by cross-species comparative sequence analysis","authors":"Rong Chen, H. Ali","doi":"10.1109/CSB.2003.1227366","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227366","url":null,"abstract":"Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the noncoding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. Based on this finding and training of data sets, we proposed a model by which coding sequences could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GenericBioMatch: A novel generic pattern match algorithm for biological sequences","authors":"Youlian Pan, Fazel Famili","doi":"10.1109/CSB.2003.1227408","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227408","url":null,"abstract":"GenericBioMatch is a novel algorithm for exact match in biological sequences. It allows the sequence motif pattern to contain one or more wild card letters (eg. Y, R, W in DNA sequences) and one or more gaps of any number of bases. GenericBioMatch is a relatively fast algorithm as compared to probabilistic algorithms, and has very little computational overhead. It is able to perform exact match of protein motifs as well as DNA motifs. This algorithm can serve as a quick validation tool for implementation of other algorithms, and can also serve as a supporting tool for probabilistic algorithms in order to reduce computational overhead. This algorithm has been implemented in the BioMiner software (http://iit-iti.nrc-cnrc.gc.ca/biomine e.trx), a suite of Java tools for integrated data mining in genomics. It has been tested successfully with DNA sequences from human, yeast, and Arabidopsis.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125679134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local minima-based exploration for off-lattice protein folding","authors":"E. Santos, K. Kim, Eunice E. Santos","doi":"10.1109/CSB.2003.1227424","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227424","url":null,"abstract":"We present a new and simple algorithmic approach to help predict protein structures from amino acid sequences based on energy minimization. In the search for the minimal energy conformation, we analyze and exploit the protein structures found at the various local minima to direct the search the global minimum. As such, we explore the energy landscape efficiently by considering only the space of local minima instead of the whole feasible space of conformations. Our specific algorithmic approach is comprised of two different elements: local minimization and operators from genetic algorithms. Unlike existing hybrid approaches where the local optimization is used to fine-tune the solutions, we focus primarily on the local optimization and employ stochastic sampling through genetic operators for diversification. Our empirical results indicate that each local minimum is representative of the substructures contained in the set of solutions surrounding the local minima. We applied our approach to determining the minimal energy conformation of proteins from the protein data bank (PDB) using the CHARMM and UNRES energy model. We compared against standard genetic algorithms and Monte Carlo approaches as well as the conformations found in the PDB as the baseline. In all cases, our new approach computed the lowest energy conformation.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131929741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying gene and protein names from biological texts","authors":"Weijian Xuan, S. Watson, H. Akil, F. Meng","doi":"10.1109/CSB.2003.1227431","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227431","url":null,"abstract":"Extracting and identifying gene and protein names from literature is a critical step for mining functional information of genes and proteins. While extensive efforts have been devoted to this important task, most of them were aiming at extracting gene/protein name per se without paying much attention to associate the extracted name with existing gene and protein database entries. We developed a simple and efficient method to identify gene and protein names in literature using a combination of heuristic and statistical strategies. Our approach will map the extracted names to individual LocusLink entries thus enable the seamless integration of literature information with existing gene/protein databases. Evaluation on a test corpus shows that our method can achieve both high recall and precision. Our method exhibits good performance and can be used as a building block for large biomedical literature mining systems.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SNP analysis system for detecting complex disease associated sites","authors":"Yoko Higashi, Hirotaka Higuchi, T. Kido, Hirohito Matsumine, Masanori Baba, Toshihiko Morimoto, Masaaki Muramatsu","doi":"10.1109/CSB.2003.1227368","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227368","url":null,"abstract":"We developed a system that supports disease association studies to detect genes that may cause complex diseases. The main function of the system is to examine the possibility of each polymorphism being associated with a disease. Another important function is to perform linkage disequilibrium analysis and combine SNPs (single nucleotide polymorphisms) together into LDblocks (linkage-disequilibrium-blocks) to improve statistical power for association study. Those analyses can be efficiently performed using an analysis pipeline of the new system with handy tools for eliminating the inadequate data and so on. Consequently, the number of SNPs the system can analyze is about 30 to 50 times higher than by the standard manual procedures per unit of time. The new system also has a sophisticated visualization tool. The main viewer displays the genomic structure and is linked to another main viewer showing the in-depth analysis result. These viewers let the user easily check and make an interpretation of the results. The new system should provide significant assistance for the genome research of complex diseases.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133806663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genomic sequence analysis using gap sequences and pattern filtering","authors":"Shih-Chieh Su, C. Yeh, C.-C. Jay Kuo","doi":"10.1109/CSB.2003.1227401","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227401","url":null,"abstract":"A new pattern filtering technique is developed to analyze the genomic sequence in this research based on gap sequences, in which the distance of the same symbol is recorded consecutively as a sequence of integers. Sequence alignment and similarity testing can be performed on a family of gap sequences over selected patterns. The gap sequence offers a new way for sequence structural analysis. The match between the gap sequences is considered as a frame match while a true match requires both frame and stuffing match. Simulation results show that the extension of gap match indicates the corresponding segment extension in the original genomic sequence. Thus, we are able to generalize the conventional alignment and scoring methods in a more adaptive way.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115548480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}