MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering.

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Dipro Sinha, Anu Sharma, Dwijesh Chandra Mishra, Anil Rai, Shashi Bhushan Lal, Sanjeev Kumar, Moh Samir Farooqi, Krishna Kumar Chaturvedi
{"title":"MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering.","authors":"Dipro Sinha, Anu Sharma, Dwijesh Chandra Mishra, Anil Rai, Shashi Bhushan Lal, Sanjeev Kumar, Moh Samir Farooqi, Krishna Kumar Chaturvedi","doi":"10.2174/1389202923666220413114659","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background</i>:</b> Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. <b><i>Objective</i>:</b> It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. <b><i>Methods</i>:</b> Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus-based method. <b><i>Results</i>:</b> Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods. <b><i>Conclusion</i>:</b> This is suggestive of the proposition that the consensus-based clustering approach is a promising method for automatically finding the number of bins for metagenomics data.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/8c/3c/CG-23-137.PMC9878838.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1389202923666220413114659","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. Objective: It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. Methods: Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus-based method. Results: Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods. Conclusion: This is suggestive of the proposition that the consensus-based clustering approach is a promising method for automatically finding the number of bins for metagenomics data.

Abstract Image

Abstract Image

Abstract Image

MetaConClust - 使用共识聚类对元基因组学数据进行无监督分选。
背景:元基因组读数的分选是一个活跃的研究领域,许多基于无监督机器学习的技术已被用于元基因组读数的分类独立分选。研究目的找到最佳簇数以及开发一种高效的解密微生物基因组复杂性的管道非常重要。方法:应用无监督聚类技术进行分选需要事先找到最佳聚类数目,据观察这是一项艰巨的任务。本文介绍了一种名为 MetaConClust 的新方法,该方法利用覆盖率信息对等位基因进行分组,并采用基于共识的聚类方法自动找出最佳聚类数目,以便对元基因组学数据进行分选。据观察,元基因组学样本中等位基因的覆盖率与样本中物种的丰度成正比,MetaConClust 在第一阶段使用等位基因的覆盖率对数据进行分组。在第二阶段,使用围绕中间值分区(PAM)方法进行聚类,通过基于共识的方法自动确定初始聚类的数量,从而生成分区。结果最后,使用剪影指数、兰德指数、召回率、精确度和准确度对所获得的分群质量进行测试。使用基准低复杂度模拟数据集和真实元基因组数据集,将 MetaConClust 的性能与最新的方法和工具进行了比较,发现无监督方法的性能更好,混合方法的性能相当。结论这表明,基于共识的聚类方法是一种很有前途的自动寻找元基因组数据分仓数的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信