{"title":"Improving operon prediction in E. coli","authors":"P. Dam, V. Olman, Ying Xu","doi":"10.1109/CSBW.2005.76","DOIUrl":"https://doi.org/10.1109/CSBW.2005.76","url":null,"abstract":"In bacterium, genes working in the same pathway or interacting with each other are often organized into operons. Currently, the prediction accuracy for operon/boundary gene pairs is fairly good in Escherichia coli, however, such a high level of success in recognizing a gene pair as a boundary or operon pair does not automatically transcribe into a high level of accuracy in predicting the boundary of operons. We found that for several operon prediction programs, the prediction accuracy is often less accurate when the intergenic region of a gene pair is between 40 to 250 base pairs. In our approach, multiple features of the intergenic region, gene length and available microarray data in E. coli were used to improve the accuracy of the operon prediction programs in general and of gene pairs in the above intergenic region in particular. These features were scored according to a log likelihood formula, and the result suggests that we can gain up to 8% increase in the accuracy level for gene pairs with the intergenic distance between 40-250 base pairs. For other regions, the newly added features also give a moderate improvement in prediction accuracy. Furthermore, the accuracy in predicting transcript boundary is also improved, comparing to methods using the intergenic distance and functional annotation alone. We are currently fine-tuning our program to predict all operons in E. coli, and applying this method to predict operons in other organisms.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131243215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Jahns, N. DelRaso, Mark P. Westrick, Victor Chan, N. Reo, T. Zacharewski
{"title":"Joint genomic and metabolomic analysis of toxic dose-response experiments","authors":"G. Jahns, N. DelRaso, Mark P. Westrick, Victor Chan, N. Reo, T. Zacharewski","doi":"10.1109/CSBW.2005.81","DOIUrl":"https://doi.org/10.1109/CSBW.2005.81","url":null,"abstract":"A methodology has been implemented for analyzing microarray and NMR spectral data obtained from the same set of toxic-exposure dose-response experiments. The NMR spectra additionally track the time course of exposure. Analyses consist of screening the data to eliminate variates with insignificant signal, normalization appropriate to the experimental design, principal components analysis, and nonlinear classification using a support vector machine. It is found that exposure at subtoxic levels can be detected.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115944131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach to generating and verifying complex scripts and procedures","authors":"J. Rash, M. Hinchey, D. Gračanin","doi":"10.1109/CSBW.2005.21","DOIUrl":"https://doi.org/10.1109/CSBW.2005.21","url":null,"abstract":"Currently available tools and methods for system development that start with a formal model of a system and mechanically produce a provably equivalent implementation are valuable but not sufficient. The \"gap\" that such tools and methods leave unfilled is that the formal models cannot be proven to be equivalent to the system requirements as originated by the customer. For the classes of complex systems whose behavior can be described as a finite (but significant) set of scenarios, we offer a method for mechanically transforming requirements expressed in restricted natural language, or appropriate graphical notations, into a provably equivalent formal model that can be used as the basis for code generation and other transformations. The same approach may be applied to address computer science aspects of bioinformatics problems. Many software tools for bioinformatics have been developed using scripting languages such as Perl and Python. Scripts are developed based on a set of requirements that can be expressed using English-like statements. Using our approach, these may be used to automatically generate and validate scripts rather than write them from scratch.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115969963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying orthologs: cycle splitting on the breakpoint graph","authors":"K. M. Swenson, N. Pattengale, Bernard M. E. Moret","doi":"10.1109/CSBW.2005.73","DOIUrl":"https://doi.org/10.1109/CSBW.2005.73","url":null,"abstract":"Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. This paper approach that problem using gene rearrangement data. Identifying a single gene within each family on the basis of a parsimonious criterion and discarding all others. Steps were taken to remedy this problem by providing an optimization framework derived from the breakpoint graph. The basic structure describing a pair of genomes with no duplicates and equal gene content is the breakpoint graph.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116333099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aligning peaks across multiple mass spectrometry data sets using a scale-space based approach","authors":"Weichuan Yu, Xiaoye Li, Hongyu Zhao","doi":"10.1109/CSBW.2005.19","DOIUrl":"https://doi.org/10.1109/CSBW.2005.19","url":null,"abstract":"We proposed a scale-space approach to automatically align multiple MS peak sets without manual parameter determination. It is more robust against noise than the hierarchical clustering method. In addition, it is possible to embed intensity information into the alignment framework, thus generalizing current approaches that use only the m/z information during the alignment of peaks. Our tests showed that this generalization brought some extra advantages for peak alignment, although we did not show concrete examples here due to the space limitation.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126408186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hae-Jin Hu, P. Tai, R. Harrison, Jieyue He, Yi Pan
{"title":"Protein secondary structure prediction using support vector machine with a PSSM profile and an advanced tertiary classifier","authors":"Hae-Jin Hu, P. Tai, R. Harrison, Jieyue He, Yi Pan","doi":"10.1109/CSBW.2005.114","DOIUrl":"https://doi.org/10.1109/CSBW.2005.114","url":null,"abstract":"In this study, the support vector machine (SVM) is applied as a learning machine for the secondary structure prediction. As an encoding scheme for training the SVM, position-specific scoring matrix (PSSM) is adopted. To improve the prediction accuracy, three optimization processes such as encoding scheme, sliding window size and parameter optimization are performed. For the multi-class classification, the results of three one-versus-one binary classifiers (H/E, E/C and C/H) are combined using our new tertiary classifier called SVM/spl I.bar/Represent. By applying this new tertiary classifier, the Q/sub 3/ prediction accuracy reaches 89.6% on the RSI 26 dataset and 90.1% on the CB513 dataset. Also the Segment Overlap Measure (SOV) is 85.0% on the RS 126 dataset and 85.7% on the CB513 dataset. Compared with the existing best prediction methods, our new prediction algorithm improves the accuracy about 13%) in terms of Q/sub 3/ and SOV, the two most commonly used accuracy measures.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128943951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast shotgun assembly heuristic","authors":"C. Wilks, S. Khuri","doi":"10.1109/CSBW.2005.7","DOIUrl":"https://doi.org/10.1109/CSBW.2005.7","url":null,"abstract":"Genome sequencing opened a new era in genetics allowing the study of genomes at the nucleotide level. However, the chosen method of sequencing produced large numbers of nucleotide fragments which had to be re-assembled. The re-assembly of string fragments is known to be NP-hard. We report the results of our fast heuristic implementation for reassembling DNA fragments based on a unique approach to the problem called, \"A Structured Pattern Matching Approach to Shotgun Sequence Assembly\", (AMASS) created by Sun Kim. The algorithm's main idea is taken from the biological concept of probe hybridization where certain strands of nucleic acids are identified by short, unique sequences of bases that are contained within much longer DNA strands.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130453388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient sampling of protein folding pathways using HMMSTR and probabilistic roadmaps","authors":"Y. Girdhar, C. Bystroff, Srinivas Akella","doi":"10.1109/CSBW.2005.59","DOIUrl":"https://doi.org/10.1109/CSBW.2005.59","url":null,"abstract":"We present a method for constructing thousands of compact protein conformations from fragments and then connecting these structures to form a network of physically plausible folding pathways. This is the first attempt to merge the previous successes in fragment assembly methods with probabilistic roadmap (PRM) methods. Previous PRM methods have used the knowledge of the true structure to sample conformational space. Our method uses only the amino acid sequence to bias the conformational sampling. Conformational sampling is done using HMMSTR, a hidden Markov model for local sequence-structure correlations. We then build a PRM graph and find paths that have the the lowest energy climb. We find that favored folding pathways exist, corresponding to deep valleys in the energy landscape. We describe the pathways for three small proteins with different secondary structure content in the context of a folding funnel model.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117045172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Kido, Masanori Baba, Guijin Ji, Hidenori Satoh, Masaaki Muramatsu
{"title":"Mapping SNP association results into type 2 DM pathways-metabolic syndrome as a robust system","authors":"T. Kido, Masanori Baba, Guijin Ji, Hidenori Satoh, Masaaki Muramatsu","doi":"10.1109/CSBW.2005.87","DOIUrl":"https://doi.org/10.1109/CSBW.2005.87","url":null,"abstract":"We have mapped the results of large-scale search of SNPs for type 2 DM susceptibility genes in a Japanese population into a molecular interaction map for metabolic syndrome. This computer readable mapping may result in important insights in the ongoing endeavor to understand the complex web of SNPs and gene interactions in metabolic syndrome.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131071768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shawn Martin, W. M. Brown, J. Faulon, Derick C. Weis, D. Visco
{"title":"Inverse design of large molecules using linear diophantine equations","authors":"Shawn Martin, W. M. Brown, J. Faulon, Derick C. Weis, D. Visco","doi":"10.1109/CSBW.2005.79","DOIUrl":"https://doi.org/10.1109/CSBW.2005.79","url":null,"abstract":"We have previously developed a method for the inverse design of small ligands. This method can be used to design novel compounds with optimized properties (such as drugs) and has been applied successfully to the design of small peptide antagonists to leukocyte functional antigen-1 (LFA-1) and its intercellular adhesion molecule (ICAM-1). A key step in our method involves computing the Hilbert basis of a system of linear Diophantine equations. In our previous application, the ligands considered were small peptide rings, so that the resulting system of Diophantine equations was relatively small and easy to solve. When considering larger molecules, however, the Diophantine system is larger and more difficult to solve. In this work we present a method for reducing the system of Diophantine equations before they are solved, allowing the inverse design of larger compounds. We present this reduction on our original LFA-1/ICAM-1 dataset, where we were able to reduce a system with 24 equations and 49 variables to an equivalent system with 11 equations and 34 variables, giving a 10 times speedup in performance. We also present the results of our reduction on two new datasets, neither of which we could solve previously: a set of 27 conazole fungicides and a set of 61 /spl gamma/-secrerase inhibitors.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132506379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}