Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003最新文献

筛选
英文 中文
Fast and sensitive probe selection for DNA chips using jumps in matching statistics 利用匹配统计跳变快速灵敏地选择DNA芯片探针
S. Rahmann
{"title":"Fast and sensitive probe selection for DNA chips using jumps in matching statistics","authors":"S. Rahmann","doi":"10.1109/CSB.2003.1227304","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227304","url":null,"abstract":"The design of large scale DNA microarrays is a challenging problem. So far, probe selection algorithms must trade the ability to cope with large scale problems for a loss of accuracy in the estimation of probe quality. We present an approach based on jumps in matching statistics that combines the best of both worlds. This article consists of two parts. The first part is theoretical. We introduce the notion of jumps in matching statistics between two strings and derive their properties. We estimate the frequency of jumps for random strings in a nonuniform Bernoulli model and present a new heuristic argument to find the center of the length distribution of the longest substring that two random strings have in common. The results are generalized to near-perfect matches with a small number of mismatches. In the second part, we use the concept of jumps to improve the accuracy of the longest common factor approach for probe selection by moving from a string-based to an energy-based specificity measure, while only slightly more than doubling the selection time.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Using natural language processing and the gene ontology to populate a structured pathway database 利用自然语言处理和基因本体构建结构化的路径数据库
David Dehoney, R. Harte, Yan Lu, Daniel Chin
{"title":"Using natural language processing and the gene ontology to populate a structured pathway database","authors":"David Dehoney, R. Harte, Yan Lu, Daniel Chin","doi":"10.1109/CSB.2003.1227433","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227433","url":null,"abstract":"Reading literature is one of the most time consuming tasks a busy scientist has to contend with. As the volume of literature continues to grow there is a need to sort through this information in a more efficient manner. Mapping the pathways of genes and proteins of interest is one goal that requires frequent reference to the literature. Pathway databases can help here and scientists currently have a choice between buying access to externally curated pathway databases or building their own in house. However such databases are either expensive to license or slow to populate manually. Building upon easily available, open-source tools we have developed a pipeline to automate the collection, structuring and storage of gene and protein interaction data from the literature. As a team of both biologists and computer scientists we integrated our natural language processing (NLP) software with the gene ontology (GO) to collect and translate unstructured text data into structured interaction data. For NLP we used a machine learning approach with a rule induction program, RAPIER (http://www. cs. utexas. edu/users/mUrapier. html). RAPIER was modified to learn rules from tagged documents, and then it was trained on a corpus tagged by expert curators. The resulting rules were used to extract information from a test corpus automatically. Extracted genes and proteins were mapped onto Locuslink, and extracted interactions were mapped onto GO. Once information was structured in this way it was stored in a pathway database and this formal structure allowed us to perform advanced data mining and visualization.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115065647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient reconstruction of phylogenetic networks with constrained recombination 基于约束重组的系统发育网络的高效重构
D. Gusfield, Satish Eddhu, C. Langley
{"title":"Efficient reconstruction of phylogenetic networks with constrained recombination","authors":"D. Gusfield, Satish Eddhu, C. Langley","doi":"10.1109/CSB.2003.1227337","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227337","url":null,"abstract":"A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks. We consider the problem of determining whether the sequences can be derived on a phylogenetic network where the recombination cycles are node disjoint. In this paper, we call such a phylogenetic network a \"galled-tree\". By more deeply analysing the combinatorial constraints on cycle-disjoint phylogenetic networks, we obtain an efficient algorithm that is guaranteed to be both a necessary and sufficient test for the existence of a galled-tree for the data. If there is a galled-tree, the algorithm constructs one and obtains an implicit representation of all the galled trees for the data, and can create these in linear time for each one. We also note two additional results related to galled trees: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation is allowed per site; second, the site compatibility problem (which is NP-hard in general) can be solved in linear time for any set of sequences that can be derived on a galled tree. The combinatorial constraints we develop apply (for the most part) to node-disjoint cycles in any phylogenetic network (not just galled-trees), and can be used for example to prove that a given site cannot be on a node-disjoint cycle in any phylogenetic network. Perhaps more important than the specific results about galled-trees, we introduce an approach that can be used to study recombination in phylogenetic networks that go beyond galled-trees.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122175422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Bridging paradigm gaps between biology and engineering 弥合生物学和工程学之间的范例差距
Jehoshua Bruck
{"title":"Bridging paradigm gaps between biology and engineering","authors":"Jehoshua Bruck","doi":"10.1109/CSB.2003.1227290","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227290","url":null,"abstract":"Computing and communications are well understood topics in engineering. However, we are very much at the beginning of the road to understanding those mechanisms in biological systems. I'll argue that progress in biology will require better understanding of biologically inspired paradigms for computing and communications. In particular, I'll discuss some initial results related to asynchronous circuits with feedback and to delay insensitive communications.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124647563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated protein NMR resonance assignments 自动蛋白质核磁共振分配
Xiang Wan, Dong Xu, C. Slupsky, Guohui Lin
{"title":"Automated protein NMR resonance assignments","authors":"Xiang Wan, Dong Xu, C. Slupsky, Guohui Lin","doi":"10.1109/CSB.2003.1227319","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227319","url":null,"abstract":"NMR resonance peak assignment is one of the key steps in solving an NMR protein structure. The assignment process links resonance peaks to individual residues of the target protein sequence, providing the prerequisite for establishing intra- and inter-residue spatial relationships between atoms. The assignment process is tedious and time-consuming, which could take many weeks. Though there exist a number of computer programs to assist the assignment process, many NMR labs are still doing the assignments manually to ensure quality. This paper presents (1) a new scoring system for mapping spin systems to residues, (2) an automated adjacency information extraction procedure from NMR spectra, and (3) a very fast assignment algorithm based on our previous proposed greedy filtering method and a maximum matching algorithm to automate the assignment process. The computational tests on 70 instances of (pseudo) experimental NMR data of 14 proteins demonstrate that the new score scheme has much better discerning power with the aid of adjacency information between spin systems simulated across various NMR spectra. Typically, with automated extraction of adjacency information, our method achieves nearly complete assignments for most of the proteins. The experiment shows very promising perspective that the fast automated assignment algorithm together with the new score scheme and automated adjacency extraction may be ready for practical use.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127229192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Estimating recombination rate distribution by optimal quantization 最优量化估计重组率分布
Mingzhou Song, S. Boissinot, R. Haralick, I. T. Phillips
{"title":"Estimating recombination rate distribution by optimal quantization","authors":"Mingzhou Song, S. Boissinot, R. Haralick, I. T. Phillips","doi":"10.1109/CSB.2003.1227346","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227346","url":null,"abstract":"We obtain recombination rate distribution functions for all human chromosomes using an optimal quantization method. This nonparametric method allows us to control over-/under-fitting. The piece-wise constant recombination rate distribution functions are convenient to store and retrieve. Our experimental results showed more abrupt distribution functions than two recently published results. In the previous results, the over-/under-fitting issues were not addressed explicitly. Our estimation had greater log likelihood over a previous result using Parzen window. It suggests that the optimal quantization technique might be of great advantage for estimation of other genomic feature distributions.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of phylogenetic profiles using Bayesian decomposition 基于贝叶斯分解的系统发育分析
Ghislain Bidaut, K. Suhre, J. Claverie, M. Ochs
{"title":"Analysis of phylogenetic profiles using Bayesian decomposition","authors":"Ghislain Bidaut, K. Suhre, J. Claverie, M. Ochs","doi":"10.1109/CSB.2003.1227380","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227380","url":null,"abstract":"Antibiotic resistance together with the side effects of broad spectrum antibacterials make development of targeted antibiotics of great interest. To meet the problem of identifying potential targets specific to some genuses, a dataset comprising a series of phylogenetic profiles was built for a series of pathogenic bacteria of interest. The profiles are the highest BLAST scores for genes compared to selected genes of E. coli and M. tuberculosis. The dataset reflects the past evolution of those genes due to adaptation to specific niches, marked by lateral gene transfer, duplication and mutation of existing genes, or merging of existing genes. Genes that function together will be constrained to evolve together, to maintain viability in the organism. However, a given gene may have a role in multiple functional groups through the evolutionary process. Analysis using Bayesian decomposition helps to retrieve those relationships by retrieving fundamental patterns related to the evolutionary retained functions.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125677548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A flexible pipeline for experimental design, processing, and analysis of microarray data 一个灵活的管道实验设计,处理和分析微阵列数据
Stephen Osborn, S. Kennedy, Daniel Chin
{"title":"A flexible pipeline for experimental design, processing, and analysis of microarray data","authors":"Stephen Osborn, S. Kennedy, Daniel Chin","doi":"10.1109/CSB.2003.1227349","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227349","url":null,"abstract":"We created a web-based microarray data analysis pipeline for managing the volumes of data created by production microarray experiments. Experiments are formalized by grouping array data into hierarchies based on types such as 'dye swap' or 'replicate'. Grouping determines the analysis to be performed and enables the tool to automatically generate reports and charts appropriate to the experiment results. Subsets of data across arrays may also be hierarchically grouped into types such as 'gene' or 'list'. The group hierarchy is similar to a document object model (DOM), which enables queries to be posed in an XPath or XQuery language. Analyzer modules provide the complicated statistical processing and may be custom written or implemented as wrappers around existing tools. For speculative data analysis or publication, the results may be exported to a standard format.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132999346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CoMRI: a compressed multiresolution index structure for sequence similarity queries CoMRI:用于序列相似性查询的压缩多分辨率索引结构
Hong Sun, Ozgur Ozturk, H. Ferhatosmanoğlu
{"title":"CoMRI: a compressed multiresolution index structure for sequence similarity queries","authors":"Hong Sun, Ozgur Ozturk, H. Ferhatosmanoğlu","doi":"10.1109/CSB.2003.1227406","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227406","url":null,"abstract":"In this paper, we present CoMRI, compressed multiresolution index, our system for fast sequence similarity search in DNA sequence databases. We employ virtual bounding rectangle (VBR) concept to build a compressed, grid style index structure. An advantage of grid format over trees is subsequence location information is given by the order of corresponding VBR in the VBR list. Taking advantage of VBRs, our index structure fits into a reasonable size of memory easily. Together with a new optimized multiresolution search algorithm, the query speed is improved significantly. Extensive performance evaluations on human chromosome sequence data show that VBRs save 80%-93% index storage size compared to MBRs (minimum bounding rectangles) and new search algorithm prunes almost all unnecessary VBRs which guarantees efficient disk I/O and CPU cost. According to the results of our experiments, the performance of CoMRI is at least 100 times faster than MRS which is another grid index structure introduced very recently.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115814725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A new approach for gene annotation using unambiguous sequence joining 基于无二义序列连接的基因注释新方法
A. Tchourbanov, Daniel J. Quest, H. Ali, M. Pauley, R. Norgren
{"title":"A new approach for gene annotation using unambiguous sequence joining","authors":"A. Tchourbanov, Daniel J. Quest, H. Ali, M. Pauley, R. Norgren","doi":"10.1109/CSB.2003.1227336","DOIUrl":"https://doi.org/10.1109/CSB.2003.1227336","url":null,"abstract":"The problem addressed by this paper is accurate and automatic gene annotation following precise identification/annotation of exon and intron boundaries of biologically verified nucleotide sequences using the alignment of human genomic DNA to curated mRNA transcripts. We provide a detailed description of a new cDNA/DNA homology gene annotation algorithm that combines the results of BLASTN searches and spliced alignments. Compared to other programs currently in use, annotation quality is significantly increased through the unambiguous junction of genomic DNA sequences. We also address gene annotation with both noncanonic splice sites and short exons. The approach has been tested on the genie learning subset as well as full-scale human RefSeq, and has demonstrated performance as high as 97%.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121803049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信