Classification of Tandem Repeats in the Human Genome

Int. J. Knowl. Discov. Bioinform. Pub Date : 2012-07-01 DOI:10.4018/jkdb.2012070101

Yupu Liang, Dina Sokol, Sarah Zelikovitz, Sarah Ita Levitan

引用次数: 0

Abstract

Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, the authors describe a new method for post-processing tandem repeats through clustering and classification. Their work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. The authors' new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and they believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.

查看原文本刊更多论文

人类基因组串联重复序列的分类

DNA序列的串联重复序列在生物现象和诊断工具中具有极其重要的意义。发现这些串联重复序列的计算程序会产生大量数据，如果不进一步组织，这些数据通常很难破译。本文提出了一种通过聚类和分类对串联重复序列进行后处理的新方法。他们的工作提出了多种表达串联重复的方法，使用n-gram模型和不同的聚类距离度量。对人类基因组中串联重复序列的聚类分析表明，该方法产生了一个定义良好的分组，其中重复序列之间的相似性是明显的。作者的新的、无比对的方法有助于分析人类基因组中出现的无数串联重复序列，他们相信这项工作将导致串联重复序列的作用、起源和意义的新发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Knowl. Discov. Bioinform.

自引率

0.00%

发文量