A Deep Clustering-based Novel Approach for Binning of Metagenomics Data.

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Sharanbasappa D Madival, Dwijesh Chandra Mishra, Anu Sharma, Sanjeev Kumar, Arpan Kumar Maji, Neeraj Budhlakoti, Dipro Sinha, Anil Rai
{"title":"A Deep Clustering-based Novel Approach for Binning of Metagenomics Data.","authors":"Sharanbasappa D Madival, Dwijesh Chandra Mishra, Anu Sharma, Sanjeev Kumar, Arpan Kumar Maji, Neeraj Budhlakoti, Dipro Sinha, Anil Rai","doi":"10.2174/1389202923666220928150100","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>One major challenge in binning Metagenomics data is the limited availability of reference datasets, as only 1% of the total microbial population is yet cultured. This has given rise to the efficacy of unsupervised methods for binning in the absence of any reference datasets.</p><p><strong>Objective: </strong>To develop a deep clustering-based binning approach for Metagenomics data and to evaluate results with suitable measures.</p><p><strong>Methods: </strong>In this study, a deep learning-based approach has been taken for binning the Metagenomics data. The results are validated on different datasets by considering features such as Tetra-nucleotide frequency (TNF), Hexa-nucleotide frequency (HNF) and GC-Content. Convolutional Autoencoder is used for feature extraction and for binning; the K-means clustering method is used.</p><p><strong>Results: </strong>In most cases, it has been found that evaluation parameters such as the Silhouette index and Rand index are more than 0.5 and 0.8, respectively, which indicates that the proposed approach is giving satisfactory results. The performance of the developed approach is compared with current methods and tools using benchmarked low complexity simulated and real metagenomic datasets. It is found better for unsupervised and at par with semi-supervised methods.</p><p><strong>Conclusion: </strong>An unsupervised advanced learning-based approach for binning has been proposed, and the developed method shows promising results for various datasets. This is a novel approach for solving the lack of reference data problem of binning in metagenomics.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/72/5e/CG-23-353.PMC9878855.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1389202923666220928150100","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: One major challenge in binning Metagenomics data is the limited availability of reference datasets, as only 1% of the total microbial population is yet cultured. This has given rise to the efficacy of unsupervised methods for binning in the absence of any reference datasets.

Objective: To develop a deep clustering-based binning approach for Metagenomics data and to evaluate results with suitable measures.

Methods: In this study, a deep learning-based approach has been taken for binning the Metagenomics data. The results are validated on different datasets by considering features such as Tetra-nucleotide frequency (TNF), Hexa-nucleotide frequency (HNF) and GC-Content. Convolutional Autoencoder is used for feature extraction and for binning; the K-means clustering method is used.

Results: In most cases, it has been found that evaluation parameters such as the Silhouette index and Rand index are more than 0.5 and 0.8, respectively, which indicates that the proposed approach is giving satisfactory results. The performance of the developed approach is compared with current methods and tools using benchmarked low complexity simulated and real metagenomic datasets. It is found better for unsupervised and at par with semi-supervised methods.

Conclusion: An unsupervised advanced learning-based approach for binning has been proposed, and the developed method shows promising results for various datasets. This is a novel approach for solving the lack of reference data problem of binning in metagenomics.

Abstract Image

Abstract Image

Abstract Image

基于深度聚类的元基因组学数据分选新方法
背景:元基因组学数据分选的一个主要挑战是参考数据集的可用性有限,因为目前培养的微生物种群仅占总数的 1%。这就要求在没有任何参考数据集的情况下,采用无监督方法进行分选:目的:为元基因组学数据开发一种基于深度聚类的分选方法,并用合适的方法评估结果:本研究采用基于深度学习的方法对元基因组学数据进行分选。考虑到四核苷酸频率(TNF)、六核苷酸频率(HNF)和 GC-Content 等特征,在不同数据集上对结果进行了验证。卷积自动编码器用于特征提取和分选,K-means 聚类方法用于特征提取和分选:在大多数情况下,我们发现 Silhouette 指数和 Rand 指数等评价参数分别大于 0.5 和 0.8,这表明所提出的方法取得了令人满意的结果。利用基准低复杂度模拟数据集和真实元基因组数据集,将所开发方法的性能与现有方法和工具进行了比较。结果发现,无监督方法的性能更好,与半监督方法相当:提出了一种基于高级学习的无监督分选方法,所开发的方法在各种数据集上都显示出良好的效果。这是一种解决元基因组学中缺乏分选参考数据问题的新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信