A. T. Islam, S. Pramanik, Xinge Ji, J. Cole, Qiang Zhu
{"title":"Back translated peptide K-mer search and local alignment in large DNA sequence databases using BoND-SD-tree indexing","authors":"A. T. Islam, S. Pramanik, Xinge Ji, J. Cole, Qiang Zhu","doi":"10.1109/BIBE.2015.7367638","DOIUrl":null,"url":null,"abstract":"In the past, genome sequence databases had used main memory indexing, such as the suffix tree, for fast sequence searches. With next generation sequencing technologies, the amount of sequence data being generated is huge and main memory indexing is limited by the amount of memory available. K-mer based techniques are being more used for various genome sequence database applications such as local alignment. K-mer can also provide an excellent basis for creating efficient disk based indexing. In this paper, we have proposed a k-mer based database searching and local alignment tool using box queries on BoND-SD-tree indexing. BoND-tree is quite efficient for indexing and searching in Non-Ordered Discrete Data Space (NDDS). We have conducted experiments on searching DNA sequence databases using back translated protein query sequences and have compared with existing methods. We have also implemented local alignment of back translated protein query sequences with large DNA sequence databases using this index based k-mer search. Performances of this local alignment approach has been compared with that of Tblastn of NCBI. The results are quite promising and justify significance of the proposed approach.","PeriodicalId":422807,"journal":{"name":"2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2015.7367638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In the past, genome sequence databases had used main memory indexing, such as the suffix tree, for fast sequence searches. With next generation sequencing technologies, the amount of sequence data being generated is huge and main memory indexing is limited by the amount of memory available. K-mer based techniques are being more used for various genome sequence database applications such as local alignment. K-mer can also provide an excellent basis for creating efficient disk based indexing. In this paper, we have proposed a k-mer based database searching and local alignment tool using box queries on BoND-SD-tree indexing. BoND-tree is quite efficient for indexing and searching in Non-Ordered Discrete Data Space (NDDS). We have conducted experiments on searching DNA sequence databases using back translated protein query sequences and have compared with existing methods. We have also implemented local alignment of back translated protein query sequences with large DNA sequence databases using this index based k-mer search. Performances of this local alignment approach has been compared with that of Tblastn of NCBI. The results are quite promising and justify significance of the proposed approach.