iK-means: an improvement of the iterative k-means partitioning algorithm

Thu Kim Le, L. Vinh, Dong Do Due, Bui Ngoc Thang, Thao Thi Phuong Nguyen
{"title":"iK-means: an improvement of the iterative k-means partitioning algorithm","authors":"Thu Kim Le, L. Vinh, Dong Do Due, Bui Ngoc Thang, Thao Thi Phuong Nguyen","doi":"10.1109/KSE50997.2020.9287221","DOIUrl":null,"url":null,"abstract":"The evolutionary processes vary among sites of an alignment that strongly affect the accuracy of phylogenetic tree reconstruction. Partitioning an alignment into sub-alignments of sites such that the evolutionary processes at sites in the same sub-alignment are highly similar is a proper strategy. Gene features might be used as reasonable indicators to partition an alignment. However, the gene feature information is not always available or efficient Computational partitioning methods like iterative k-means has been proposed to automatically partition sites into groups based on the similarity of evolutionary rates of sites. Despite obtaining compelling results in terms of AICc and BIC measurements, the k-means method forms a group of all and only invariant sites that results in bias/wrong phylogenetic trees. In this paper, we improve the k-means algorithm by re-classifying invariant sites into different sub-alignments based on their likelihood values. Experimental results on simulated and empirical DNA datasets showed that the new method, called iK-means, overcame the pitfall of the K-means method, consequently, helps improve the quality of the partitioning sub-alignments. We recommend using the iK-means method to level up the accuracy in inferring phylogenetic trees.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE50997.2020.9287221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The evolutionary processes vary among sites of an alignment that strongly affect the accuracy of phylogenetic tree reconstruction. Partitioning an alignment into sub-alignments of sites such that the evolutionary processes at sites in the same sub-alignment are highly similar is a proper strategy. Gene features might be used as reasonable indicators to partition an alignment. However, the gene feature information is not always available or efficient Computational partitioning methods like iterative k-means has been proposed to automatically partition sites into groups based on the similarity of evolutionary rates of sites. Despite obtaining compelling results in terms of AICc and BIC measurements, the k-means method forms a group of all and only invariant sites that results in bias/wrong phylogenetic trees. In this paper, we improve the k-means algorithm by re-classifying invariant sites into different sub-alignments based on their likelihood values. Experimental results on simulated and empirical DNA datasets showed that the new method, called iK-means, overcame the pitfall of the K-means method, consequently, helps improve the quality of the partitioning sub-alignments. We recommend using the iK-means method to level up the accuracy in inferring phylogenetic trees.
k-means:迭代k-means划分算法的改进
进化过程在同一序列的不同位点之间存在差异,这严重影响了系统发育树重建的准确性。将一个序列划分为位点的子序列,使得同一子序列中的位点的进化过程高度相似,这是一种适当的策略。基因特征可以作为划分亲缘的合理指标。然而,基因特征信息并不总是可用的,人们提出了迭代k-means等高效的计算划分方法,根据位点进化速率的相似性将位点自动划分为组。尽管在AICc和BIC测量方面获得了令人信服的结果,但k-means方法形成了一组所有且唯一的不变位点,导致偏差/错误的系统发育树。在本文中,我们改进了k-means算法,根据它们的似然值将不变位点重新分类为不同的子序列。在模拟和经验DNA数据集上的实验结果表明,K-means方法克服了K-means方法的缺陷,有助于提高划分子序列的质量。我们建议使用iK-means方法来提高推断系统发育树的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信