Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods.

IF 2.1 3区 生物学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Journal of Molecular Evolution Pub Date : 2024-08-01 Epub Date: 2024-06-17 DOI:10.1007/s00239-024-10179-8
Steven K Chen, Jing Liu, Alexander Van Nynatten, Benjamin M Tudor-Price, Belinda S W Chang
{"title":"Sampling Strategies for Experimentally Mapping Molecular Fitness Landscapes Using High-Throughput Methods.","authors":"Steven K Chen, Jing Liu, Alexander Van Nynatten, Benjamin M Tudor-Price, Belinda S W Chang","doi":"10.1007/s00239-024-10179-8","DOIUrl":null,"url":null,"abstract":"<p><p>Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.</p>","PeriodicalId":16366,"journal":{"name":"Journal of Molecular Evolution","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00239-024-10179-8","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/17 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Empirical studies of genotype-phenotype-fitness maps of proteins are fundamental to understanding the evolutionary process, in elucidating the space of possible genotypes accessible through mutations in a landscape of phenotypes and fitness effects. Yet, comprehensively mapping molecular fitness landscapes remains challenging since all possible combinations of amino acid substitutions for even a few protein sites are encoded by an enormous genotype space. High-throughput mapping of genotype space can be achieved using large-scale screening experiments known as multiplexed assays of variant effect (MAVEs). However, to accommodate such multi-mutational studies, the size of MAVEs has grown to the point where a priori determination of sampling requirements is needed. To address this problem, we propose calculations and simulation methods to approximate minimum sampling requirements for multi-mutational MAVEs, which we combine with a new library construction protocol to experimentally validate our approximation approaches. Analysis of our simulated data reveals how sampling trajectories differ between simulations of nucleotide versus amino acid variants and among mutagenesis schemes. For this, we show quantitatively that marginal gains in sampling efficiency demand increasingly greater sampling effort when sampling for nucleotide sequences over their encoded amino acid equivalents. We present a new library construction protocol that efficiently maximizes sequence variation, and demonstrate using ultradeep sequencing that the library encodes virtually all possible combinations of mutations within the experimental design. Insights learned from our analyses together with the methodological advances reported herein are immediately applicable toward pooled experimental screens of arbitrary design, enabling further assay upscaling and expanded testing of genotype space.

Abstract Image

使用高通量方法实验绘制分子健壮性景观的取样策略
蛋白质基因型-表型-适配性图谱的实证研究是了解进化过程的基础,它阐明了在表型和适配性效应景观中通过突变可获得的可能基因型空间。然而,全面绘制分子适配性景观图仍然具有挑战性,因为即使是几个蛋白质位点的所有可能的氨基酸替换组合都包含在一个巨大的基因型空间中。基因型空间的高通量图谱可以通过被称为变异效应多重检测(MAVE)的大规模筛选实验来实现。然而,为了适应这种多重变异研究,MAVE 的规模已经发展到需要先验确定取样要求的地步。为了解决这个问题,我们提出了计算和模拟方法来近似确定多变异 MAVE 的最低取样要求,并将其与新的文库构建方案相结合,通过实验验证了我们的近似方法。对模拟数据的分析揭示了核苷酸变体与氨基酸变体模拟之间以及不同诱变方案之间采样轨迹的差异。为此,我们从数量上表明,在对核苷酸序列进行采样时,采样效率的边际收益要求采样工作量越来越大,而对氨基酸等价物进行采样时则要求采样工作量越来越大。我们提出了一种新的文库构建方案,它能有效地将序列变异最大化,并利用超深度测序证明文库编码了实验设计中几乎所有可能的突变组合。从我们的分析中获得的启示以及本文所报告的方法学进展可立即应用于任意设计的集合实验筛选,从而实现进一步的检测升级和扩大基因型空间的测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Molecular Evolution
Journal of Molecular Evolution 生物-进化生物学
CiteScore
5.50
自引率
2.60%
发文量
36
审稿时长
3 months
期刊介绍: Journal of Molecular Evolution covers experimental, computational, and theoretical work aimed at deciphering features of molecular evolution and the processes bearing on these features, from the initial formation of macromolecular systems through their evolution at the molecular level, the co-evolution of their functions in cellular and organismal systems, and their influence on organismal adaptation, speciation, and ecology. Topics addressed include the evolution of informational macromolecules and their relation to more complex levels of biological organization, including populations and taxa, as well as the molecular basis for the evolution of ecological interactions of species and the use of molecular data to infer fundamental processes in evolutionary ecology. This coverage accommodates such subfields as new genome sequences, comparative structural and functional genomics, population genetics, the molecular evolution of development, the evolution of gene regulation and gene interaction networks, and in vitro evolution of DNA and RNA, molecular evolutionary ecology, and the development of methods and theory that enable molecular evolutionary inference, including but not limited to, phylogenetic methods.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信