{"title":"Decoding oxygen preference: Machine learning discovers functional genes in Bacteria.","authors":"Siqi Wan, Haida Liu, Geyi Zhu, Yuanming Geng, Wenhao Li, Lijuan Chen, Yunhua Zhang, Guomin Han","doi":"10.1016/j.ygeno.2025.111095","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting bacterial oxygen preference and identifying associated genes is critical in microbiology. This study developed a machine learning model using genomic features to predict bacterial oxygen preference and discover potential functional genes. Trained on a dataset of 1813 bacterial genomes, a Random Forest model achieved 90.62 % accuracy in predicting oxygen preference, outperforming prior methods. Feature analysis pinpointed key protein domains and candidate genes. Experimental overexpression of model-identified genes (encoding SOD, SAM radical enzyme, GCV-T, FDH domains) in Escherichia coli enhanced growth under aerobic conditions, validating their role in oxygen adaptation. Applying the model to rumen metagenomes revealed a predominantly anaerobic community. This work establishes machine learning as an effective strategy for bacterial oxygen preference prediction and functional gene identification, offering a novel strategy and tool for in-depth understanding of bacterial oxygen adaptation mechanisms, discovering key functional genes, and efficient exploration of uncultured microbial resources.</p>","PeriodicalId":12521,"journal":{"name":"Genomics","volume":" ","pages":"111095"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ygeno.2025.111095","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting bacterial oxygen preference and identifying associated genes is critical in microbiology. This study developed a machine learning model using genomic features to predict bacterial oxygen preference and discover potential functional genes. Trained on a dataset of 1813 bacterial genomes, a Random Forest model achieved 90.62 % accuracy in predicting oxygen preference, outperforming prior methods. Feature analysis pinpointed key protein domains and candidate genes. Experimental overexpression of model-identified genes (encoding SOD, SAM radical enzyme, GCV-T, FDH domains) in Escherichia coli enhanced growth under aerobic conditions, validating their role in oxygen adaptation. Applying the model to rumen metagenomes revealed a predominantly anaerobic community. This work establishes machine learning as an effective strategy for bacterial oxygen preference prediction and functional gene identification, offering a novel strategy and tool for in-depth understanding of bacterial oxygen adaptation mechanisms, discovering key functional genes, and efficient exploration of uncultured microbial resources.
期刊介绍:
Genomics is a forum for describing the development of genome-scale technologies and their application to all areas of biological investigation.
As a journal that has evolved with the field that carries its name, Genomics focuses on the development and application of cutting-edge methods, addressing fundamental questions with potential interest to a wide audience. Our aim is to publish the highest quality research and to provide authors with rapid, fair and accurate review and publication of manuscripts falling within our scope.