{"title":"Group testing with DNA chips: generating designs and decoding experiments","authors":"Alexander Schliep, D. Torney, S. Rahmann","doi":"10.1109/CSB.2003.1227307","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227307","url":null,"abstract":"DNA microarrays are a valuable tool for massively parallel DNA-DNA hybridization experiments. Currently, most applications rely on the existence of sequence-specific oligonucleotide probes. In large families of closely related target sequences, such as different virus subtypes, the high degree of similarity often makes it impossible to find a unique probe for every target. Fortunately, this is unnecessary. We propose a microarray design methodology based on a group testing approach. While probes might bind to multiple targets simultaneously, a properly chosen probe set can still unambiguously distinguish the presence of one target set from the presence of a different target set. Our method is the first one that explicitly takes cross-hybridization and experimental errors into account while accommodating several targets. The approach consists of three steps: (1) Pre-selection of probe candidates, (2) Generation of a suitable group testing design, and (3) Decoding of hybridization results to infer presence or absence of individual targets. Our results show that this approach is very promising, even for challenging data sets and experimental error rates of up to 5%. On a data set of 28S rDNA sequences we were able to identify 660 sequences, a substantial improvement over a prior approach using unique probes which only identified 408 sequences.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128328474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sutton, Lori Deneke, J. Eme, W. Bennett, F. Wray
{"title":"Development and assessment of bioinformatics tools for species conservation and habitat management","authors":"M. Sutton, Lori Deneke, J. Eme, W. Bennett, F. Wray","doi":"10.1109/CSB.2003.1227435","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227435","url":null,"abstract":"This project represents an interdisciplinary approach to integrating computational methods into the knowledge-discovery process associated with understanding biological systems impacted by the loss or destruction of sensitive habitats. We specifically developed bioinformatics tools for the study of (1) beach mouse communities and (2) marginal fish habitats. Data mining was used in these projects to intelligently query databases and to elucidate broad patterns that facilitate overall data interpretation. Visualization techniques that were developed present mined data in ways where context, perceptual cues, and spatial reasoning skills can be applied to uncover significant trends in behavioral patterns, habitat use, species diversity, and community composition.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132952050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The SCP and compressed domain analysis of biological sequences","authors":"D. Adjeroh, Jianan Feng","doi":"10.1109/CSB.2003.1227416","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227416","url":null,"abstract":"We introduce the SCP - the sorted common prefix, and study some of its properties. Based on the internal representations used by a class of new compression schemes, we show how the SCP table can be constructed using an O(u+| /spl Sigma/ |K/sub max/) number of comparisons on average, and O(u | /spl Sigma/ |) worst case, where u is the size of the sequence, | /spl Sigma/ | is the number of symbols, and K/sub max/ is the maximum SCP value. We describe one application of the SCP to the problem of anchor points in multiple sequence alignment.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121211191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing parallel hmm-pfam on the EARTH multithreaded architecture","authors":"Weirong Zhu, Yanwei Niu, Jizhu Lu, G. Gao","doi":"10.1109/CSB.2003.1227404","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227404","url":null,"abstract":"Hmmpfam is a widely used computation-intensive bioinformatics software for sequence classification. This poster describes a new parallel implementation of hmmpfam on EARTH, which is an event-driven fine-grain multithreaded programming execution model. The comparison results of the original PVM implementation and our implementation shows notable improvements on absolute speedup and scalability. On a cluster of 128 dual-CPU nodes, the execution time of a representative testbench is reduced from 15.9 hours to 4.3 minutes.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129054904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating the drug design process through parallel inductive logic programming data mining","authors":"J. Graham, David Page, A. H. Kamal","doi":"10.1109/CSB.2003.1227345","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227345","url":null,"abstract":"This paper presents a new system for parallel inductive logic search for pharmacophores which can potentially accelerate the chemical evaluation phase of the drug design process. This system has been tested on a Beowulf cluster and an IBM SP2 supercomputer with promising results.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125675726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of singular value decomposition and functional clustering to analyzing gene expression profiles of renal cell carcinoma","authors":"Z. Duan, L. Liou, T. Shi, J. DiDonato","doi":"10.1109/CSB.2003.1227341","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227341","url":null,"abstract":"Microarray gene expression profiles of both renal cell carcinoma (RCC) tissues and a RCC cell line were analyzed using singular value decomposition (SVD) and functional clustering. The SVD projections of the expression profiles revealed significant differences between the profiles of RCC tissues and a RCC cell line. Based on the biological processes, selected genes were annotated and clustered into functional groups. The analysis of each functional group revealed remarkable gene expression alterations in the biological pathways in RCC and provided insights into understanding the molecular mechanism of renal cell carcinogenesis.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127493243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seung Yup Lee, Y. Fujitsuka, S. Takada, Do Hyun Kim
{"title":"What makes IgG binding domain of protein L fold up to native state: a simulation study with physical oriented energy functions coupled to topology induced terms","authors":"Seung Yup Lee, Y. Fujitsuka, S. Takada, Do Hyun Kim","doi":"10.1109/CSB.2003.1227370","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227370","url":null,"abstract":"The folding pathways and mechanisms of IgG binding domain of protein L composed of 62 residues are simulated by an over-damped Langevin dynamics with a coarse-grained chain representation. Physical oriented effective energy functions (EEFs) are employed for sequence-specific interactions as well as topology induced energies to bias overall energies to native basin. We observed the preferential formation of N terminal hairpin and the break of structural symmetry during folding. In the free energy profile calculated from equilibrium sampling and histogram method, it clearly shows two state folding scenario with transition state (TS). In the TS regime, N terminal hairpin already forms whereas C terminal hairpin and alpha helix are not structured yet. The predicted results are fully consistent with experimental data. Moreover, we found that hydrophobicity and secondary local propensity among many physical interactions determine the overall folding routes significantly by reduced model studies.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124484057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cun-Quan Zhang, Yunkai Liu, Elaine M. Eschen, Keqiang Wu
{"title":"Identifying regulatory signals in DNA-sequences with a nonstatistical approximation approach","authors":"Cun-Quan Zhang, Yunkai Liu, Elaine M. Eschen, Keqiang Wu","doi":"10.1109/CSB.2003.1227417","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227417","url":null,"abstract":"The identification of regulatory signals is one of the most challenging tasks in bioinformatics. The development of gene-profiling technologies now makes it possible to obtain vast data on gene expression in a particular organism under various conditions. This has created the opportunity to identify and analyze the parts of the genome believed to be responsible for transcription control-the transcription factor DNA-binding motifs (TFBMs). Developing a practical and efficient computational tool to identify TFBMs will enable us to better understand the interplay among thousands of genes in a complex eukaryotic organism. This problem, which is mathematically formulated as the motif finding problem in computer science, has been studied extensively in recent years. We develop a new mathematical model and approximation technique for motif searching. Based on the graph theoretic and geometric properties of this approach, we propose a nonstatistical approximation algorithm to find motifs in a set of genome sequences.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123815882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance","authors":"Bailin Hao, J. Qi","doi":"10.1109/CSB.2003.1227338","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227338","url":null,"abstract":"A new and essentially simple method to reconstruct prokaryotic phylogenetic trees from their complete genome data without using sequence alignment is proposed. It is based on the appearance frequency of oligopeptides of a fixed length (up to K=6) in their proteomes. This is a method without fine adjustment and choice of genes. It can incorporate the effect of lateral gene transfer to some extent and leads to results comparable with the bacteriologists' systematics as reflected in the latest 2001 edition of the Sergey's manual of systematic bacteriology. A key point in our approach is subtraction of a random back-groundby using a Markovian model of order K-1 from the composition vectors to highlight the shaping role of natural selection.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131808814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can we identify cellular pathways implicated in cancer using gene expression data?","authors":"N. Shah, J. Lepre, Y. Tu, G. Stolovitzky","doi":"10.1109/CSB.2003.1227308","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227308","url":null,"abstract":"The cancer state of a cell is characterized by alterations of important cellular processes such as cell proliferation, apoptosis, DNA-damage repair, etc. The expression of genes associated with cancer related pathways, therefore, may exhibit differences between the normal and the cancerous states. We explore various means to find these differences. We analyze 6 different pathways (p53, Ras, Brca, DNA damage repair, NF/spl kappa/b and /spl beta/-catenin) and 4 different types of cancer: colon, pancreas, prostate and kidney. Our results are found to be mostly consistent with existing knowledge of the involvement of these pathways in different cancers. Our analysis constitutes proof of principle that it may be possible to predict the involvement of a particular pathway in cancer or other diseases by using gene expression data. Such method would be particularly useful for the types of diseases where biology is poorly understood.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133949893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}