Computational pipeline for sustainable enzyme discovery through (re)use of metagenomic data

IF 8.4 2区环境科学与生态学 Q1 ENVIRONMENTAL SCIENCES

Journal of Environmental Management Pub Date : 2025-04-18 DOI:10.1016/j.jenvman.2025.125381

Karol Ciuchcinski , Anna-Karina Kaczorowska , Daria Biernacka , Sebastian Dorawa , Tadeusz Kaczorowski , Younginn Park , Karol Piekarski , Michal Stanowski , Takao Ishikawa , Runar Stokke , Ida Helene Steen , Lukasz Dziewit

{"title":"Computational pipeline for sustainable enzyme discovery through (re)use of metagenomic data","authors":"Karol Ciuchcinski , Anna-Karina Kaczorowska , Daria Biernacka , Sebastian Dorawa , Tadeusz Kaczorowski , Younginn Park , Karol Piekarski , Michal Stanowski , Takao Ishikawa , Runar Stokke , Ida Helene Steen , Lukasz Dziewit","doi":"10.1016/j.jenvman.2025.125381","DOIUrl":null,"url":null,"abstract":"<div><div>Enzymes derived from extremophilic organisms, also known as extremozymes, offer sustainable and efficient solutions for industrial applications. Valued for their resilience and low environmental impact, extremozymes have found use as catalysts in various processes, ranging from dairy production to pharmaceutical manufacturing. However, discovery of novel extremozymes is often hindered by challenges such as culturing difficulties, underrepresentation of extreme environments in reference databases, and limitations of traditional sequence-based screening methods. In this work, we present a computational pipeline designed to discover novel enzymes from metagenomic data derived from extreme environments. This pipeline represents a versatile and sustainable approach that promotes reuse and recycling of existing datasets and minimises the need for additional environmental sampling. In its core, the algorithm integrates both traditional bioinformatic techniques and recent advances in structural prediction, enabling rapid and accurate identification of enzymes. However, due to its design, the algorithm relies heavily on existing databases, which can limit its effectiveness in situations where reference data is scarce or when encountering novel protein families. As a proof-of-concept, we applied the pipeline to metagenomic data from deep-sea hydrothermal vents, with a focus on β-galactosidases. The pipeline identified 11 potential candidate proteins, out of which 10 showed <em>in vitro</em> activity. One of the selected enzymes, βGal_UW07, showed strong potential for industrial applications. The enzyme exhibited optimal activity at 70 °C and was exceptionally resistant to high pH and the presence of metal ions and reducing agents. Overall, our results indicate that the pipeline is highly accurate and can play a key role in sustainable bioprospecting, leveraging existing metagenomic datasets and minimising <em>in situ</em> interventions in pristine regions.</div></div>","PeriodicalId":356,"journal":{"name":"Journal of Environmental Management","volume":"382 ","pages":"Article 125381"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Environmental Management","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030147972501357X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Enzymes derived from extremophilic organisms, also known as extremozymes, offer sustainable and efficient solutions for industrial applications. Valued for their resilience and low environmental impact, extremozymes have found use as catalysts in various processes, ranging from dairy production to pharmaceutical manufacturing. However, discovery of novel extremozymes is often hindered by challenges such as culturing difficulties, underrepresentation of extreme environments in reference databases, and limitations of traditional sequence-based screening methods. In this work, we present a computational pipeline designed to discover novel enzymes from metagenomic data derived from extreme environments. This pipeline represents a versatile and sustainable approach that promotes reuse and recycling of existing datasets and minimises the need for additional environmental sampling. In its core, the algorithm integrates both traditional bioinformatic techniques and recent advances in structural prediction, enabling rapid and accurate identification of enzymes. However, due to its design, the algorithm relies heavily on existing databases, which can limit its effectiveness in situations where reference data is scarce or when encountering novel protein families. As a proof-of-concept, we applied the pipeline to metagenomic data from deep-sea hydrothermal vents, with a focus on β-galactosidases. The pipeline identified 11 potential candidate proteins, out of which 10 showed in vitro activity. One of the selected enzymes, βGal_UW07, showed strong potential for industrial applications. The enzyme exhibited optimal activity at 70 °C and was exceptionally resistant to high pH and the presence of metal ions and reducing agents. Overall, our results indicate that the pipeline is highly accurate and can play a key role in sustainable bioprospecting, leveraging existing metagenomic datasets and minimising in situ interventions in pristine regions.

Abstract Image

查看原文本刊更多论文

通过（重新）使用宏基因组数据的可持续酶发现的计算管道

从嗜极端生物中提取的酶，也称为极端酶，为工业应用提供了可持续的高效解决方案。极端酵素因其复原能力强、对环境影响小而备受重视，已被用作从乳制品生产到制药等各种工艺的催化剂。然而，新型极端酵素的发现往往受到各种挑战的阻碍，例如培养困难、参考数据库中极端环境的代表性不足以及传统的基于序列的筛选方法的局限性。在这项工作中，我们提出了一个计算管道，旨在从极端环境中获得的元基因组数据中发现新型酶。该管道是一种多功能、可持续的方法，可促进现有数据集的再利用和再循环，并最大限度地减少对额外环境采样的需求。该算法的核心是整合传统的生物信息学技术和结构预测方面的最新进展，从而能够快速准确地识别酶。然而，由于其设计，该算法在很大程度上依赖于现有的数据库，这可能会限制其在参考数据稀缺或遇到新型蛋白质家族时的有效性。作为概念验证，我们将该管道应用于深海热液喷口的元基因组数据，重点研究β-半乳糖苷酶。该管道确定了 11 个潜在候选蛋白质，其中 10 个显示出体外活性。其中一种被选中的酶，即βGal_UW07，显示出很强的工业应用潜力。该酶在 70 ℃ 时表现出最佳活性，并且对高 pH 值、金属离子和还原剂的存在具有极强的耐受性。总之，我们的研究结果表明，该管道非常准确，可以在可持续生物勘探中发挥关键作用，充分利用现有的元基因组数据集，最大限度地减少对原始区域的原位干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Environmental Management 环境科学-环境科学

CiteScore

13.70

自引率

5.70%

发文量

2477

审稿时长

84 days

期刊介绍： The Journal of Environmental Management is a journal for the publication of peer reviewed, original research for all aspects of management and the managed use of the environment, both natural and man-made.Critical review articles are also welcome; submission of these is strongly encouraged.