{"title":"利用蛋白质序列数据预测微生物代谢中的碳氮源偏好。","authors":"Zhenfeng Wang, Shuzhen Li, Haixia Pan, Yunlong Li, Xue Wang, Hao Zhou, Jiajia Shan","doi":"10.1016/j.mimet.2025.107266","DOIUrl":null,"url":null,"abstract":"<div><div>Microbial precision cultivation technology holds significant application value in the field of environmental pollutant remediation. Precise quantification of microbial carbon and nitrogen requirements is critical for optimizing culture conditions and enhancing microbial growth and productivity. This study aims to explore the intrinsic relationship between microbial protein sequences and their specific nutritional requirements (e.g., the types of carbon and nitrogen sources as well as the optimal carbon-to‑nitrogen (<em>C/N</em>) ratio) using deep learning algorithms. A total of 432 microbial species and 61 culture media formulations were collected from authoritative databases, including Ensembl Bacteria, DSMZ, and NCBI Protein. For data analysis, microbial protein sequences were converted into high-dimensional numerical feature matrices using the Position-Specific Scoring Matrix (PSSM) and Pseudo Position-Specific Scoring Matrix (PsePSSM), followed by dimensionality reduction. Multiple machine learning algorithms were employed to construct predictive models for microbial <em>C/N</em> source utilization. Among the classification tasks, <em>C/N</em> ratio prediction performed best, with an accuracy of 99.60 %, followed by carbon source prediction with an accuracy of 82.76 % and nitrogen source prediction with an accuracy of 70.05 %, suggesting a strong correlation between microbial <em>C/N</em> requirements and their associated protein sequences. Furthermore, model interpretability was enhanced using the SHapley Additive exPlanations (SHAP) framework to analyze feature contributions. The primary contribution of this study lies in proposing an integrated framework that combines protein function annotation with sequence-based feature extraction for predicting microbial nutritional requirements, thereby offering new insights for optimizing microbial culture conditions.</div></div>","PeriodicalId":16409,"journal":{"name":"Journal of microbiological methods","volume":"238 ","pages":"Article 107266"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of carbon and nitrogen source preferences in microbial metabolism using protein sequence data\",\"authors\":\"Zhenfeng Wang, Shuzhen Li, Haixia Pan, Yunlong Li, Xue Wang, Hao Zhou, Jiajia Shan\",\"doi\":\"10.1016/j.mimet.2025.107266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Microbial precision cultivation technology holds significant application value in the field of environmental pollutant remediation. Precise quantification of microbial carbon and nitrogen requirements is critical for optimizing culture conditions and enhancing microbial growth and productivity. This study aims to explore the intrinsic relationship between microbial protein sequences and their specific nutritional requirements (e.g., the types of carbon and nitrogen sources as well as the optimal carbon-to‑nitrogen (<em>C/N</em>) ratio) using deep learning algorithms. A total of 432 microbial species and 61 culture media formulations were collected from authoritative databases, including Ensembl Bacteria, DSMZ, and NCBI Protein. For data analysis, microbial protein sequences were converted into high-dimensional numerical feature matrices using the Position-Specific Scoring Matrix (PSSM) and Pseudo Position-Specific Scoring Matrix (PsePSSM), followed by dimensionality reduction. Multiple machine learning algorithms were employed to construct predictive models for microbial <em>C/N</em> source utilization. Among the classification tasks, <em>C/N</em> ratio prediction performed best, with an accuracy of 99.60 %, followed by carbon source prediction with an accuracy of 82.76 % and nitrogen source prediction with an accuracy of 70.05 %, suggesting a strong correlation between microbial <em>C/N</em> requirements and their associated protein sequences. Furthermore, model interpretability was enhanced using the SHapley Additive exPlanations (SHAP) framework to analyze feature contributions. The primary contribution of this study lies in proposing an integrated framework that combines protein function annotation with sequence-based feature extraction for predicting microbial nutritional requirements, thereby offering new insights for optimizing microbial culture conditions.</div></div>\",\"PeriodicalId\":16409,\"journal\":{\"name\":\"Journal of microbiological methods\",\"volume\":\"238 \",\"pages\":\"Article 107266\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of microbiological methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167701225001824\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of microbiological methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167701225001824","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Prediction of carbon and nitrogen source preferences in microbial metabolism using protein sequence data
Microbial precision cultivation technology holds significant application value in the field of environmental pollutant remediation. Precise quantification of microbial carbon and nitrogen requirements is critical for optimizing culture conditions and enhancing microbial growth and productivity. This study aims to explore the intrinsic relationship between microbial protein sequences and their specific nutritional requirements (e.g., the types of carbon and nitrogen sources as well as the optimal carbon-to‑nitrogen (C/N) ratio) using deep learning algorithms. A total of 432 microbial species and 61 culture media formulations were collected from authoritative databases, including Ensembl Bacteria, DSMZ, and NCBI Protein. For data analysis, microbial protein sequences were converted into high-dimensional numerical feature matrices using the Position-Specific Scoring Matrix (PSSM) and Pseudo Position-Specific Scoring Matrix (PsePSSM), followed by dimensionality reduction. Multiple machine learning algorithms were employed to construct predictive models for microbial C/N source utilization. Among the classification tasks, C/N ratio prediction performed best, with an accuracy of 99.60 %, followed by carbon source prediction with an accuracy of 82.76 % and nitrogen source prediction with an accuracy of 70.05 %, suggesting a strong correlation between microbial C/N requirements and their associated protein sequences. Furthermore, model interpretability was enhanced using the SHapley Additive exPlanations (SHAP) framework to analyze feature contributions. The primary contribution of this study lies in proposing an integrated framework that combines protein function annotation with sequence-based feature extraction for predicting microbial nutritional requirements, thereby offering new insights for optimizing microbial culture conditions.
期刊介绍:
The Journal of Microbiological Methods publishes scholarly and original articles, notes and review articles. These articles must include novel and/or state-of-the-art methods, or significant improvements to existing methods. Novel and innovative applications of current methods that are validated and useful will also be published. JMM strives for scholarship, innovation and excellence. This demands scientific rigour, the best available methods and technologies, correctly replicated experiments/tests, the inclusion of proper controls, calibrations, and the correct statistical analysis. The presentation of the data must support the interpretation of the method/approach.
All aspects of microbiology are covered, except virology. These include agricultural microbiology, applied and environmental microbiology, bioassays, bioinformatics, biotechnology, biochemical microbiology, clinical microbiology, diagnostics, food monitoring and quality control microbiology, microbial genetics and genomics, geomicrobiology, microbiome methods regardless of habitat, high through-put sequencing methods and analysis, microbial pathogenesis and host responses, metabolomics, metagenomics, metaproteomics, microbial ecology and diversity, microbial physiology, microbial ultra-structure, microscopic and imaging methods, molecular microbiology, mycology, novel mathematical microbiology and modelling, parasitology, plant-microbe interactions, protein markers/profiles, proteomics, pyrosequencing, public health microbiology, radioisotopes applied to microbiology, robotics applied to microbiological methods,rumen microbiology, microbiological methods for space missions and extreme environments, sampling methods and samplers, soil and sediment microbiology, transcriptomics, veterinary microbiology, sero-diagnostics and typing/identification.