基于毒力基因的机器学习方法鉴定土壤中人类病原体。

IF 17.6
Eco-Environment & Health Pub Date : 2025-07-24 eCollection Date: 2025-09-01 DOI:10.1016/j.eehl.2025.100171
Shengchun Qi, Shuyan Wang, Yu Xia, Songcan Chen, Huijie Lu
{"title":"基于毒力基因的机器学习方法鉴定土壤中人类病原体。","authors":"Shengchun Qi, Shuyan Wang, Yu Xia, Songcan Chen, Huijie Lu","doi":"10.1016/j.eehl.2025.100171","DOIUrl":null,"url":null,"abstract":"<p><p>Soils are important reservoirs of human pathogenic bacteria that can spread to humans through various pathways. Metagenomics enables high-throughput pathogen identification by mapping sequencing reads to known pathogen genomes. However, this approach has several limitations, e.g., sequence assembly is time-consuming, and reliance on reference databases may overlook potential pathogens lacking close genomic matches. Here, we developed a novel, virulence factor (VF) based machine learning method using the K-Nearest Neighbors model (VF-KNN) for identifying human pathogenic bacteria from soil metagenomes. Through learning the VF features of pathogenic and non-pathogenic bacteria, VF-KNN could achieve the desired performance in soil pathogen identification (AUC: 0.95, Accuracy: 0.85). Model prediction accuracy (0.95) was further validated using 61 pathogenic strains isolated from soil. For the top 15 most frequent soil pathogens, the prediction accuracy was >0.90 ​at 0.4X-1.0X genome coverage. VFs contributing significantly to pathogen identification were associated with regulation, effector delivery, motility, etc. By using VF-KNN, the averaged abundance of total potential pathogens in topsoils across China was 0.44% (<i>n</i> ​= ​336), predominantly concentrated in the eastern coastal provinces. Compared with the conventional method based on a predefined pathogen list, VF-KNN identified 28% more potential pathogenic species, including some newly reported but not in the predefined list (e.g., <i>Mycolicibacterium cosmeticum</i>). Agricultural land exhibited significantly higher pathogen abundance and diversity than the other land types. This newly developed VF-KNN method is applicable for pathogen identification in broader environments.</p>","PeriodicalId":29813,"journal":{"name":"Eco-Environment & Health","volume":"4 3","pages":"100171"},"PeriodicalIF":17.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12355066/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of human pathogens in soil by virulence gene-based machine learning method.\",\"authors\":\"Shengchun Qi, Shuyan Wang, Yu Xia, Songcan Chen, Huijie Lu\",\"doi\":\"10.1016/j.eehl.2025.100171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Soils are important reservoirs of human pathogenic bacteria that can spread to humans through various pathways. Metagenomics enables high-throughput pathogen identification by mapping sequencing reads to known pathogen genomes. However, this approach has several limitations, e.g., sequence assembly is time-consuming, and reliance on reference databases may overlook potential pathogens lacking close genomic matches. Here, we developed a novel, virulence factor (VF) based machine learning method using the K-Nearest Neighbors model (VF-KNN) for identifying human pathogenic bacteria from soil metagenomes. Through learning the VF features of pathogenic and non-pathogenic bacteria, VF-KNN could achieve the desired performance in soil pathogen identification (AUC: 0.95, Accuracy: 0.85). Model prediction accuracy (0.95) was further validated using 61 pathogenic strains isolated from soil. For the top 15 most frequent soil pathogens, the prediction accuracy was >0.90 ​at 0.4X-1.0X genome coverage. VFs contributing significantly to pathogen identification were associated with regulation, effector delivery, motility, etc. By using VF-KNN, the averaged abundance of total potential pathogens in topsoils across China was 0.44% (<i>n</i> ​= ​336), predominantly concentrated in the eastern coastal provinces. Compared with the conventional method based on a predefined pathogen list, VF-KNN identified 28% more potential pathogenic species, including some newly reported but not in the predefined list (e.g., <i>Mycolicibacterium cosmeticum</i>). Agricultural land exhibited significantly higher pathogen abundance and diversity than the other land types. This newly developed VF-KNN method is applicable for pathogen identification in broader environments.</p>\",\"PeriodicalId\":29813,\"journal\":{\"name\":\"Eco-Environment & Health\",\"volume\":\"4 3\",\"pages\":\"100171\"},\"PeriodicalIF\":17.6000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12355066/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eco-Environment & Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.eehl.2025.100171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eco-Environment & Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.eehl.2025.100171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

土壤是人类致病菌的重要宿主,可通过各种途径传播给人类。宏基因组学通过将测序读数映射到已知病原体基因组,实现高通量病原体鉴定。然而,这种方法有一些局限性,例如,序列组装耗时,依赖参考数据库可能会忽略缺乏密切基因组匹配的潜在病原体。在这里,我们开发了一种新的基于毒力因子(VF)的机器学习方法,使用k -近邻模型(VF- knn)从土壤宏基因组中识别人类致病菌。通过学习致病菌和非致病菌的VF特征,VF- knn在土壤病原体鉴定中可以达到理想的性能(AUC: 0.95,准确率:0.85)。从土壤中分离的61株病原菌进一步验证了模型的预测精度(0.95)。对于前15种最常见的土壤病原体,在0.4X-1.0X基因组覆盖率下,预测精度为0.90。对病原体鉴定有重要贡献的VFs与调控、效应传递、运动性等有关。利用VF-KNN分析,中国表层土壤总潜在病原体平均丰度为0.44% (n = 336),主要集中在东部沿海省份。与基于预定义病原体清单的常规方法相比,VF-KNN方法鉴定出的潜在致病性物种增加了28%,其中包括一些新报道但未在预定义病原体清单中的物种(如化妆品分枝杆菌)。农用地的病原菌丰度和多样性显著高于其他土地类型。这种新开发的VF-KNN方法适用于更广泛环境下的病原体鉴定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification of human pathogens in soil by virulence gene-based machine learning method.

Soils are important reservoirs of human pathogenic bacteria that can spread to humans through various pathways. Metagenomics enables high-throughput pathogen identification by mapping sequencing reads to known pathogen genomes. However, this approach has several limitations, e.g., sequence assembly is time-consuming, and reliance on reference databases may overlook potential pathogens lacking close genomic matches. Here, we developed a novel, virulence factor (VF) based machine learning method using the K-Nearest Neighbors model (VF-KNN) for identifying human pathogenic bacteria from soil metagenomes. Through learning the VF features of pathogenic and non-pathogenic bacteria, VF-KNN could achieve the desired performance in soil pathogen identification (AUC: 0.95, Accuracy: 0.85). Model prediction accuracy (0.95) was further validated using 61 pathogenic strains isolated from soil. For the top 15 most frequent soil pathogens, the prediction accuracy was >0.90 ​at 0.4X-1.0X genome coverage. VFs contributing significantly to pathogen identification were associated with regulation, effector delivery, motility, etc. By using VF-KNN, the averaged abundance of total potential pathogens in topsoils across China was 0.44% (n ​= ​336), predominantly concentrated in the eastern coastal provinces. Compared with the conventional method based on a predefined pathogen list, VF-KNN identified 28% more potential pathogenic species, including some newly reported but not in the predefined list (e.g., Mycolicibacterium cosmeticum). Agricultural land exhibited significantly higher pathogen abundance and diversity than the other land types. This newly developed VF-KNN method is applicable for pathogen identification in broader environments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Eco-Environment & Health
Eco-Environment & Health 环境科学与生态学-生态、环境与健康
CiteScore
11.00
自引率
0.00%
发文量
18
审稿时长
22 days
期刊介绍: Eco-Environment & Health (EEH) is an international and multidisciplinary peer-reviewed journal designed for publications on the frontiers of the ecology, environment and health as well as their related disciplines. EEH focuses on the concept of “One Health” to promote green and sustainable development, dealing with the interactions among ecology, environment and health, and the underlying mechanisms and interventions. Our mission is to be one of the most important flagship journals in the field of environmental health. Scopes EEH covers a variety of research areas, including but not limited to ecology and biodiversity conservation, environmental behaviors and bioprocesses of emerging contaminants, human exposure and health effects, and evaluation, management and regulation of environmental risks. The key topics of EEH include: 1) Ecology and Biodiversity Conservation Biodiversity Ecological restoration Ecological safety Protected area 2) Environmental and Biological Fate of Emerging Contaminants Environmental behaviors Environmental processes Environmental microbiology 3) Human Exposure and Health Effects Environmental toxicology Environmental epidemiology Environmental health risk Food safety 4) Evaluation, Management and Regulation of Environmental Risks Chemical safety Environmental policy Health policy Health economics Environmental remediation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信