Classification of Tandem Repeats in the Human Genome

Yupu Liang, Dina Sokol, Sarah Zelikovitz, Sarah Ita Levitan
{"title":"Classification of Tandem Repeats in the Human Genome","authors":"Yupu Liang, Dina Sokol, Sarah Zelikovitz, Sarah Ita Levitan","doi":"10.4018/jkdb.2012070101","DOIUrl":null,"url":null,"abstract":"Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Discov. Bioinform.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/jkdb.2012070101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.
人类基因组串联重复序列的分类
DNA序列的串联重复序列在生物现象和诊断工具中具有极其重要的意义。发现这些串联重复序列的计算程序会产生大量数据,如果不进一步组织,这些数据通常很难破译。本文提出了一种通过聚类和分类对串联重复序列进行后处理的新方法。他们的工作提出了多种表达串联重复的方法,使用n-gram模型和不同的聚类距离度量。对人类基因组中串联重复序列的聚类分析表明,该方法产生了一个定义良好的分组,其中重复序列之间的相似性是明显的。作者的新的、无比对的方法有助于分析人类基因组中出现的无数串联重复序列,他们相信这项工作将导致串联重复序列的作用、起源和意义的新发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信