Sharanbasappa D. Madival, Girish Kumar Jha, Dwijesh Chandra Mishra, Sunil Kumar, Neeraj Budhlakoti, Anu Sharma, Krishna Kumar Chaturvedi, S. Kabilan, Mohammad Samir Farooqi, Sudhir Srivastava
{"title":"基于深度对比卷积自动编码器的新颖分选方法,适用于独立于分类的元基因组学数据","authors":"Sharanbasappa D. Madival, Girish Kumar Jha, Dwijesh Chandra Mishra, Sunil Kumar, Neeraj Budhlakoti, Anu Sharma, Krishna Kumar Chaturvedi, S. Kabilan, Mohammad Samir Farooqi, Sudhir Srivastava","doi":"10.1007/s13562-024-00911-2","DOIUrl":null,"url":null,"abstract":"<p>In this study, we present an innovative binning approach for metagenomics data that combines Natural Language Processing (NLP) with a Deep Contrastive Convolutional Autoencoder (DCAE). We used NLP for feature extraction, specifically focusing on Tetra-nucleotide frequency (TNF) through CountVec and (Term Frequency -Inverse Document Frequency) TF-IDF, further enriched by integrating GC-Content into their respective feature matrices. The DCAE, equipped with advanced convolutional layers and a contrastive loss function, excels at capturing intricate patterns in the data, providing a sophisticated representation for binning. By applying k-means clustering to the latent representations obtained from the DCAE, our approach consistently achieves impressive results. To assess the performance of our method, we utilized three standard benchmark metagenomics datasets: 10s, 25s, and Sharon datasets. Across all datasets, we observed Silhouette Indices exceeding 0.6 and Rand Indices surpassing 0.8, demonstrating the superior performance of our proposed method. Compared to existing methodologies, our approach not only surpasses the Rand Index and Silhouette Index of current unsupervised methods but also performs on par with semi-supervised methods across datasets. This underscores the effectiveness and versatility of our approach in metagenomics analysis.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel deep contrastive convolutional autoencoder based binning approach for taxonomic independent metagenomics data\",\"authors\":\"Sharanbasappa D. Madival, Girish Kumar Jha, Dwijesh Chandra Mishra, Sunil Kumar, Neeraj Budhlakoti, Anu Sharma, Krishna Kumar Chaturvedi, S. Kabilan, Mohammad Samir Farooqi, Sudhir Srivastava\",\"doi\":\"10.1007/s13562-024-00911-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In this study, we present an innovative binning approach for metagenomics data that combines Natural Language Processing (NLP) with a Deep Contrastive Convolutional Autoencoder (DCAE). We used NLP for feature extraction, specifically focusing on Tetra-nucleotide frequency (TNF) through CountVec and (Term Frequency -Inverse Document Frequency) TF-IDF, further enriched by integrating GC-Content into their respective feature matrices. The DCAE, equipped with advanced convolutional layers and a contrastive loss function, excels at capturing intricate patterns in the data, providing a sophisticated representation for binning. By applying k-means clustering to the latent representations obtained from the DCAE, our approach consistently achieves impressive results. To assess the performance of our method, we utilized three standard benchmark metagenomics datasets: 10s, 25s, and Sharon datasets. Across all datasets, we observed Silhouette Indices exceeding 0.6 and Rand Indices surpassing 0.8, demonstrating the superior performance of our proposed method. Compared to existing methodologies, our approach not only surpasses the Rand Index and Silhouette Index of current unsupervised methods but also performs on par with semi-supervised methods across datasets. This underscores the effectiveness and versatility of our approach in metagenomics analysis.</p>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s13562-024-00911-2\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s13562-024-00911-2","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
A novel deep contrastive convolutional autoencoder based binning approach for taxonomic independent metagenomics data
In this study, we present an innovative binning approach for metagenomics data that combines Natural Language Processing (NLP) with a Deep Contrastive Convolutional Autoencoder (DCAE). We used NLP for feature extraction, specifically focusing on Tetra-nucleotide frequency (TNF) through CountVec and (Term Frequency -Inverse Document Frequency) TF-IDF, further enriched by integrating GC-Content into their respective feature matrices. The DCAE, equipped with advanced convolutional layers and a contrastive loss function, excels at capturing intricate patterns in the data, providing a sophisticated representation for binning. By applying k-means clustering to the latent representations obtained from the DCAE, our approach consistently achieves impressive results. To assess the performance of our method, we utilized three standard benchmark metagenomics datasets: 10s, 25s, and Sharon datasets. Across all datasets, we observed Silhouette Indices exceeding 0.6 and Rand Indices surpassing 0.8, demonstrating the superior performance of our proposed method. Compared to existing methodologies, our approach not only surpasses the Rand Index and Silhouette Index of current unsupervised methods but also performs on par with semi-supervised methods across datasets. This underscores the effectiveness and versatility of our approach in metagenomics analysis.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.