A species clustering method based on variation of molecular data with the aid of variance proportion

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) Pub Date : 2015-07-09 DOI:10.1109/ReTIS.2015.7232869

Abolfazl Ghavidel, Amin Rezaeian, M. Rezaee

{"title":"A species clustering method based on variation of molecular data with the aid of variance proportion","authors":"Abolfazl Ghavidel, Amin Rezaeian, M. Rezaee","doi":"10.1109/ReTIS.2015.7232869","DOIUrl":null,"url":null,"abstract":"In order to infer evolutionary relationships as well as reconstruct phylogenetic trees, evolutionists often employ two general approaches: character-based and distance-based. Inasmuch as character based methods could be inordinately expensive in computational process, researchers have to use some estimation methods with practical run time. In this context, distance based methods are exceedingly quicker due to the utilizing of distance matrices. In Computational Biology, sequence comparison is of fundamental importance which tries to find similar sequences. Many different techniques have been developed to calculate the right distance measure among DNA sequences, however, they are almost only used for making distance matrix; additionally, they usually work in the absence of using models of evolution too. In this paper, a novel technique, based on mathematical variance calculation, is proposed to show how much gene sequences in a group are all to be similar. In this strategy, we use mathematical formula of variance to acquire the average of differences amongst all sequences of a specific set (called cluster). Eventually, all sequences with variation lower than the predefined variance will be clustered into some groups while each group contains a phylogenetic tree. We are of the idea that our method, in spite of simplicity in design, could be used as a logical criterion to cluster sequences of DNA and it also could prove useful as a simple technique to build phylogenetic networks based on distance, especially when there are a large number of input sequences.","PeriodicalId":161306,"journal":{"name":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReTIS.2015.7232869","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In order to infer evolutionary relationships as well as reconstruct phylogenetic trees, evolutionists often employ two general approaches: character-based and distance-based. Inasmuch as character based methods could be inordinately expensive in computational process, researchers have to use some estimation methods with practical run time. In this context, distance based methods are exceedingly quicker due to the utilizing of distance matrices. In Computational Biology, sequence comparison is of fundamental importance which tries to find similar sequences. Many different techniques have been developed to calculate the right distance measure among DNA sequences, however, they are almost only used for making distance matrix; additionally, they usually work in the absence of using models of evolution too. In this paper, a novel technique, based on mathematical variance calculation, is proposed to show how much gene sequences in a group are all to be similar. In this strategy, we use mathematical formula of variance to acquire the average of differences amongst all sequences of a specific set (called cluster). Eventually, all sequences with variation lower than the predefined variance will be clustered into some groups while each group contains a phylogenetic tree. We are of the idea that our method, in spite of simplicity in design, could be used as a logical criterion to cluster sequences of DNA and it also could prove useful as a simple technique to build phylogenetic networks based on distance, especially when there are a large number of input sequences.

查看原文本刊更多论文

基于方差比例的分子数据变异的物种聚类方法

为了推断进化关系以及重建系统发育树，进化论者通常采用两种一般方法:基于特征的和基于距离的。由于基于字符的估计方法在计算过程中花费巨大，研究人员不得不使用一些具有实际运行时间的估计方法。在这种情况下，由于使用了距离矩阵，基于距离的方法非常快。在计算生物学中，序列比较是寻找相似序列的重要方法。目前已经开发了许多不同的技术来计算DNA序列之间的距离，然而，它们几乎只用于制作距离矩阵;此外，它们通常在没有使用进化模型的情况下也能工作。本文提出了一种基于数学方差计算的新技术来显示一个群体中有多少基因序列是相似的。在这种策略中，我们使用数学方差公式来获取特定集合(称为簇)中所有序列之间差异的平均值。最终，所有变异小于预定义变异的序列将被聚类成若干组，每组包含一个系统发育树。我们认为，尽管我们的方法设计简单，但它可以作为DNA序列聚类的逻辑标准，也可以作为一种基于距离构建系统发育网络的简单技术，特别是在有大量输入序列的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

自引率

0.00%

发文量