基于哈希函数的过度表示库适配体的无环识别

Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis
{"title":"基于哈希函数的过度表示库适配体的无环识别","authors":"Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis","doi":"10.1109/NEBEC.2013.2","DOIUrl":null,"url":null,"abstract":"In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).","PeriodicalId":153112,"journal":{"name":"2013 39th Annual Northeast Bioengineering Conference","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acyclic Identification of Aptamer from Over-Represented Libraries Using Hash Functions\",\"authors\":\"Yiou Xiao, K. Mehrotra, C. Mohan, P. Borer, D. Allis\",\"doi\":\"10.1109/NEBEC.2013.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).\",\"PeriodicalId\":153112,\"journal\":{\"name\":\"2013 39th Annual Northeast Bioengineering Conference\",\"volume\":\"133 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 39th Annual Northeast Bioengineering Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NEBEC.2013.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 39th Annual Northeast Bioengineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEBEC.2013.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,随着快速测序技术的出现,基因组数据库发展迅速。生物信息学领域的研究人员期待更快、更准确的工具来有效地分析庞大的数据集。在适体搜索的背景下,目标是在同一芯片上与随机背景文库比较,搜索过度代表的DNA序列。哈希函数广泛应用于子串比较、序列比对和聚类工具中。我们开发了一个轻量级的工具,利用哈希函数来减小基因组数据的大小,并对质心序列进行k邻域搜索。与现有工具相比,这大大提高了搜索效率。此外,k邻居哈希值的计算减少了突变体搜索开销。在100万个序列的数据集中,该程序准确地计算了人类α -凝血酶序列的频率,并在不到40秒的时间内找到了目标序列的突变版本,而现有的方法需要8280秒(2小时13分钟)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Acyclic Identification of Aptamer from Over-Represented Libraries Using Hash Functions
In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信