Refining biome labeling for large-scale microbial community samples: Leveraging neural networks and transfer learning

IF 14 1区 环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES
Nan Wang , Teng Wang , Kang Ning
{"title":"Refining biome labeling for large-scale microbial community samples: Leveraging neural networks and transfer learning","authors":"Nan Wang ,&nbsp;Teng Wang ,&nbsp;Kang Ning","doi":"10.1016/j.ese.2023.100304","DOIUrl":null,"url":null,"abstract":"<div><p>Microbiome research has generated an extensive amount of data, resulting in a wealth of publicly accessible samples. Accurate annotation of these samples is crucial for effectively utilizing microbiome data across scientific disciplines. However, a notable challenge arises from the lack of essential annotations, particularly regarding collection location and sample biome information, which significantly hinders environmental microbiome research. In this study, we introduce Meta-Sorter, a novel approach utilizing neural networks and transfer learning, to enhance biome labeling for thousands of microbiome samples in the MGnify database that have incomplete information. Our findings demonstrate that Meta-Sorter achieved a remarkable accuracy rate of 96.7% in classifying samples among the 16,507 lacking detailed biome annotations. Notably, Meta-Sorter provides precise classifications for representative environmental samples that were previously ambiguously labeled as “Marine” in MGnify, thereby elucidating their specific origins in benthic and water column environments. Moreover, Meta-Sorter effectively distinguishes samples derived from human-environment interactions, enabling clear differentiation between environmental and human-related studies. By improving the completeness of biome label information for numerous microbial community samples, our research facilitates more accurate knowledge discovery across diverse disciplines, with particular implications for environmental research.</p></div>","PeriodicalId":34434,"journal":{"name":"Environmental Science and Ecotechnology","volume":"17 ","pages":"Article 100304"},"PeriodicalIF":14.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/13/89/main.PMC10457426.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Science and Ecotechnology","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666498423000698","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Microbiome research has generated an extensive amount of data, resulting in a wealth of publicly accessible samples. Accurate annotation of these samples is crucial for effectively utilizing microbiome data across scientific disciplines. However, a notable challenge arises from the lack of essential annotations, particularly regarding collection location and sample biome information, which significantly hinders environmental microbiome research. In this study, we introduce Meta-Sorter, a novel approach utilizing neural networks and transfer learning, to enhance biome labeling for thousands of microbiome samples in the MGnify database that have incomplete information. Our findings demonstrate that Meta-Sorter achieved a remarkable accuracy rate of 96.7% in classifying samples among the 16,507 lacking detailed biome annotations. Notably, Meta-Sorter provides precise classifications for representative environmental samples that were previously ambiguously labeled as “Marine” in MGnify, thereby elucidating their specific origins in benthic and water column environments. Moreover, Meta-Sorter effectively distinguishes samples derived from human-environment interactions, enabling clear differentiation between environmental and human-related studies. By improving the completeness of biome label information for numerous microbial community samples, our research facilitates more accurate knowledge discovery across diverse disciplines, with particular implications for environmental research.

Abstract Image

优化大规模微生物群落样本的生物群落标记:利用神经网络和迁移学习
微生物组研究已经产生了大量的数据,产生了大量可供公众获取的样本。这些样本的准确注释对于有效利用科学学科中的微生物组数据至关重要。然而,一个显著的挑战来自于缺乏必要的注释,特别是关于采集位置和样本生物群落信息的注释,这严重阻碍了环境微生物组的研究。在这项研究中,我们引入了元分类器,这是一种利用神经网络和迁移学习的新方法,用于增强MGnify数据库中数千个信息不完整的微生物组样本的生物群落标记。我们的研究结果表明,在缺乏详细生物群落注释的16507个样本中,Meta-Ssorter在分类样本方面取得了96.7%的显著准确率。值得注意的是,Meta-Ssorter为之前在MGnify中被模糊标记为“海洋”的代表性环境样本提供了精确的分类,从而阐明了它们在海底和水柱环境中的具体来源。此外,Meta-Ssorter有效地区分了来自人类与环境相互作用的样本,使环境研究和人类相关研究之间能够明确区分。通过提高大量微生物群落样本的生物群落标签信息的完整性,我们的研究有助于在不同学科中更准确地发现知识,对环境研究具有特别的意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
20.40
自引率
6.30%
发文量
11
审稿时长
18 days
期刊介绍: Environmental Science & Ecotechnology (ESE) is an international, open-access journal publishing original research in environmental science, engineering, ecotechnology, and related fields. Authors publishing in ESE can immediately, permanently, and freely share their work. They have license options and retain copyright. Published by Elsevier, ESE is co-organized by the Chinese Society for Environmental Sciences, Harbin Institute of Technology, and the Chinese Research Academy of Environmental Sciences, under the supervision of the China Association for Science and Technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信