Towards the Development of Tandem Repeat Analyzer for Genome Sequence Data

2010 Second International Conference on Computer Engineering and Applications Pub Date : 2010-03-19 DOI:10.1109/ICCEA.2010.285

Eesha Ingle, Abhiram Bhise

{"title":"Towards the Development of Tandem Repeat Analyzer for Genome Sequence Data","authors":"Eesha Ingle, Abhiram Bhise","doi":"10.1109/ICCEA.2010.285","DOIUrl":null,"url":null,"abstract":"A Tandem repeat in DNA is two or more contiguous approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human diseases, may play a variety of regulatory and evolutionary roles and are important laboratory and analytical tools. Extensive knowledge about pattern size, mutational history etc for tandem repeats has been limited by the inability to easily detect them in genome sequence data. In this paper, we present an algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistics based recognition criteria. Detection criteria are based on a stochastic model of tandem repeats specified by percent identity and frequency of insertions and deletions rather than some minimal alignment score. Finally, the program aligns repeat copies against a consensus sequence, revealing patterns of common mutations. These patterns yield insight into the history of duplications that produce the tandem repeats thus providing a potentially valuable tool for research","PeriodicalId":207234,"journal":{"name":"2010 Second International Conference on Computer Engineering and Applications","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEA.2010.285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A Tandem repeat in DNA is two or more contiguous approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human diseases, may play a variety of regulatory and evolutionary roles and are important laboratory and analytical tools. Extensive knowledge about pattern size, mutational history etc for tandem repeats has been limited by the inability to easily detect them in genome sequence data. In this paper, we present an algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistics based recognition criteria. Detection criteria are based on a stochastic model of tandem repeats specified by percent identity and frequency of insertions and deletions rather than some minimal alignment score. Finally, the program aligns repeat copies against a consensus sequence, revealing patterns of common mutations. These patterns yield insight into the history of duplications that produce the tandem repeats thus providing a potentially valuable tool for research

查看原文本刊更多论文

基因组序列数据串联重复序列分析仪的研制

DNA中的串联重复序列是一种核苷酸模式的两个或多个相邻的近似拷贝。串联重复序列已被证明可引起人类疾病，可能发挥多种调节和进化作用，是重要的实验室和分析工具。关于串联重复序列的模式大小、突变历史等广泛的知识由于无法在基因组序列数据中轻松检测到它们而受到限制。在本文中，我们提出了一种无需指定模式或模式大小即可找到串联重复的算法。我们通过相邻模式副本之间索引的百分比身份和频率来建模串联重复，并使用基于统计的识别标准。检测标准是基于串联重复序列的随机模型，由一致性百分比和插入和删除的频率指定，而不是一些最小的比对得分。最后，该程序将重复副本与一致序列对齐，揭示共同突变的模式。这些模式可以深入了解产生串联重复序列的复制历史，从而为研究提供了潜在的有价值的工具

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Second International Conference on Computer Engineering and Applications

自引率

0.00%

发文量