Proceedings. IEEE Computer Society Bioinformatics Conference最新文献

筛选
英文 中文
Accelerating approximate subsequence search on large protein sequence databases 加速大型蛋白质序列数据库的近似子序列搜索
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039343
Jiong Yang, Wei Wang, Yi Xia, Philip S. Yu
{"title":"Accelerating approximate subsequence search on large protein sequence databases","authors":"Jiong Yang, Wei Wang, Yi Xia, Philip S. Yu","doi":"10.1109/CSB.2002.1039343","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039343","url":null,"abstract":"In this paper, we study the problem on how to build a persistent index structure for protein sequences to support approximate match. The suffix tree has been proposed as a solution to index sequence database and has been deployed on organizing DNA sequences (Hunt et al. (2001)). Unfortunately, it suffers from the problem of \"memory bottleneck\" that prevents it from being applied efficiently to a large database. The performance even degrades further for protein database due to a larger fanout at each node. Here, we employ an indexing structure, called BASS-tree, to support approximate match in sublinear time on a large protein database. We call this indexing method the sequence approximate match index method. The search of approximate matches can be properly directed to the portion in the database with a high potential of matching quickly. It is demonstrated in our experiments that the potential performance improvement is in an order of magnitude over alternative methods such as the BLAST algorithm and the suffix tree.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"207-216"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
HIV protease structural database HIV蛋白酶结构数据库
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039363
V. Ravichandran, J. Vondrášek, G. Gilliland, T. Bhat, A. Wlodawer
{"title":"HIV protease structural database","authors":"V. Ravichandran, J. Vondrášek, G. Gilliland, T. Bhat, A. Wlodawer","doi":"10.1109/CSB.2002.1039363","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039363","url":null,"abstract":"HIV Protease Database (HIVdb) is a repository for those structures of HIV protease that have never been released or deposited to the Protein Data Bank (PDB). Together with the official PDB data, HIVdb provided a unique source of information in a statistical sense. The database contains 207 structures; 148 taken from PDB, and 59 that are unique entries in HIVdb. Query tools in terms of the creation of ensembles for statistical analysis were designed. We present a new form, location, tools and data form of the HIV Protease Database. The new tools utilize a standard PDB user interface, but provide extra capabilities connected exclusively with this one protein and its ligands. We also present a design strategy for a specific subset or sub-database of the PDB with the aim of pointing out the statistical dimension of the problem related to a single protein structure. We are currently annotating the ligands in order to include their chemical properties. This approach emphasises large scale databases and scalability.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"340-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039363","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards automatic clustering of protein sequences 蛋白质序列的自动聚类
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039340
Jiong Yang, Wei Wang
{"title":"Towards automatic clustering of protein sequences","authors":"Jiong Yang, Wei Wang","doi":"10.1109/CSB.2002.1039340","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039340","url":null,"abstract":"Analyzing protein sequence data becomes increasingly important recently. Most previous work on this area has mainly focused on building classification models. In this paper we investigate in the problem of automatic clustering of unlabeled protein sequences. As a widely recognized technique in statistics and computer science, clustering has been proven very useful in detecting unknown object categories and revealing hidden correlations among objects. One difficulty, that prevents clustering from being performed directly on protein sequence is the lack of an effective similarity measure that can be computed efficiently. Therefore, we propose a novel model for protein sequence cluster by exploring significant statistical properties possessed by the sequences. The concept of imprecise probabilities are introduced to the original probabilistic suffix tree to monitor the convergence of the empirical measurement and to guide the clustering process. It is demonstrated that the proposed method can successfully discover meaningful families without the necessity of learning models of different families from pre-labeled \"training data\".","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"175-186"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039340","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Complexity and application of pedigree analysis programme GTree 谱系分析程序GTree的复杂性及其应用
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039358
D. Ogino, S. Mori, M. Nose, Hideki Sawada
{"title":"Complexity and application of pedigree analysis programme GTree","authors":"D. Ogino, S. Mori, M. Nose, Hideki Sawada","doi":"10.1109/CSB.2002.1039358","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039358","url":null,"abstract":"A novel recombinant congenic mouse strain, McRA1lpr/lpr, which was established by the intercrosses between MRL/Mp/-lpr/lpr and C3H/HeJ-lpr/lpr strains throughout more than F50 generations by means of selection based on swelling of ankle joints, manifested severe arthritis, followed by ankylosis, pathologically resembling rheumatoid arthritis in humans. To clarify the genetic mechanisms on the development of arthritis in this strain, we newly prepared \"GTree\" for analyzing the pedigree of pathological phenotypes of arthritis, splenomegaly and lymphadenopathy based on the collected data from over 700 McRA1-lpr/lpr mice collected and arranged by Shiro MORI. The data themselves are now dealt with by a PostgreSQL Linux server administered by Hideki SAWADA. We explain the algorithm of the program and its complexity, and show the pathological peculiarity of spleens and axillary lymph nodes which appear only in the group of RA mice.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"333-335"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039358","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A reference database for Medicago truncatula genes, proteins, and metabolites 短叶苜蓿基因、蛋白和代谢产物的参考数据库
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039366
D. Guo, Xingjing Li, A. Kamal, O. Brazhnik, P. Mendes
{"title":"A reference database for Medicago truncatula genes, proteins, and metabolites","authors":"D. Guo, Xingjing Li, A. Kamal, O. Brazhnik, P. Mendes","doi":"10.1109/CSB.2002.1039366","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039366","url":null,"abstract":"Summary form only given. As a model plant for legumes as well as a rich source of natural products (such as flavonoids, isoflavonoids and triterpenes), Medicago truncatula (Mt) is one of the subjects of current major US genomics initiatives. Nevertheless, data sources of gene, protein, and metabolite in relation to Mt are very limited in publicly available biological databases. Information about genes, proteins, and metabolites is usually distributed among multiple databases. Retrieval and organization of this information can be a laborious task. We present a relational database, B-Net, that is intended to gather information from multiple sources representing genes, proteins, metabolites, and biochemical reactions of Mt. This database represents known facts about the biochemistry of Mt, classified according to the Gene Ontology. We anticipate this new resource to be particularly useful as a reference data set but also a qualitative proteome and metabolome database.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"102 1","pages":"343-"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039366","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNA sequence compression using the Burrows-Wheeler Transform 利用Burrows-Wheeler变换的DNA序列压缩
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039352
D. Adjeroh, Yong Zhang, A. Mukherjee, M. Powell, T. Bell
{"title":"DNA sequence compression using the Burrows-Wheeler Transform","authors":"D. Adjeroh, Yong Zhang, A. Mukherjee, M. Powell, T. Bell","doi":"10.1109/CSB.2002.1039352","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039352","url":null,"abstract":"We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT and important pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"303-313"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039352","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Visualization techniques for genomic data 基因组数据可视化技术
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039354
A. Loraine, G. Helt
{"title":"Visualization techniques for genomic data","authors":"A. Loraine, G. Helt","doi":"10.1109/CSB.2002.1039354","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039354","url":null,"abstract":"In order to take full advantage of the newly available public human genome sequence data and associated annotations, biologists require visualization tools that can accommodate the high frequency of alternative splicing in human genes and other complexities. We describe techniques for presenting human genomic sequence data and annotations in an interactive, graphical format, with the aim of providing developers with a guide to what features are most likely to meet biologists' needs. These techniques include: one-dimensional semantic zooming to show sequence data alongside gene structures; moveable, adjustable tiers; visual encoding of the translation frame to show how alternative transcript structure affects encoded proteins; and display of protein domains in the context of genomic sequence to show how alternative splicing impacts protein structure and function.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"321-326"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039354","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Rapid large-scale oligonucleotide selection for microarrays 微阵列快速大规模寡核苷酸选择
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039329
S. Rahmann
{"title":"Rapid large-scale oligonucleotide selection for microarrays","authors":"S. Rahmann","doi":"10.1109/CSB.2002.1039329","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039329","url":null,"abstract":"We present the first algorithm that selects oligonucleotide probes (e.g. 25-mers) for microarray experiments on a large scale. For example, oligos for human genes can be found within 50 hours. This becomes possible by using the longest common substring as a specificity measure for candidate oligos. We present an algorithm based on a suffix array with additional information that is efficient both in terms of memory usage and running time to rank all candidate oligos according to their specificity. We also introduce the concept of master sequences to describe the sequences from which oligos are to be selected. Constraints such as oligo length, melting temperature, and self-complementarity are incorporated in the master sequence at a preprocessing stage and thus kept separate from the main selection problem. As a result, custom oligos can now be designed for any sequenced genome, just as the technology for on-site chip synthesis is becoming increasingly mature.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"54-63"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039329","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks 一种高效的蛋白质骨架核磁共振波峰分配分支结合算法
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039339
Guohui Lin, Dong Xu, Zhi-Zhong Chen, Tao Jiang, Jianjun Wen, Ying Xu
{"title":"An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks","authors":"Guohui Lin, Dong Xu, Zhi-Zhong Chen, Tao Jiang, Jianjun Wen, Ying Xu","doi":"10.1109/CSB.2002.1039339","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039339","url":null,"abstract":"NMR resonance assignment is one of the key steps in solving an NMR protein structure. The assignment process links resonance peaks to individual residues of the target protein sequence, providing the prerequisite for establishing intra- and inter-residue spatial relationships between atoms. The assignment process is tedious and time-consuming, which could take many weeks. Though there exist a number of computer programs to assist the assignment process, many NMR labs are still doing the assignments manually to ensure quality. This paper presents a new computational method based on our recent work towards automating the assignment process, particularly the process of backbone resonance peak assignment. We formulate the assignment problem as a constrained weighted bipartite matching problem. While the problem, in the most general situation, is NP-hard, we present an efficient solution based on a branch-and-bound algorithm with effective bounding techniques and a greedy filtering algorithm for reducing the search space. Our experimental results on 70 instances of (pseudo) real NMR data derived from 14 proteins demonstrate that the new solution runs much faster than a recently introduced (exhaustive) two-layer algorithm and recovers more correct peak assignments than the two-layer algorithm.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"165-174"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039339","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A bi-recursive neural network architecture for the prediction of protein coarse contact maps 一种用于蛋白质粗接触图预测的双递归神经网络结构
Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-08-14 DOI: 10.1109/CSB.2002.1039341
A. Vullo, P. Frasconi
{"title":"A bi-recursive neural network architecture for the prediction of protein coarse contact maps","authors":"A. Vullo, P. Frasconi","doi":"10.1109/CSB.2002.1039341","DOIUrl":"https://doi.org/10.1109/CSB.2002.1039341","url":null,"abstract":"Prediction of contact maps may be seen as a strategic step towards the solution of fundamental open problems in structural genomics. In this paper we focus on coarse grained maps that describe the spatial neighborhood relation between secondary structure elements (helices, strands, and coils) of a protein. We introduce a new machine learning approach for scoring candidate contact maps. The method combines a specialized noncausal recursive connectionist architecture and a heuristic graph search algorithm. The network is trained using candidate graphs generated during search. We show how the process of selecting and generating training examples is important for tuning the precision of the predictor.","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"1 1","pages":"187-196"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CSB.2002.1039341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62214307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信