{"title":"利用蛋白质语言模型进行序列设计的强化学习","authors":"Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Riashat Islam","doi":"arxiv-2407.03154","DOIUrl":null,"url":null,"abstract":"Protein sequence design, determined by amino acid sequences, are essential to\nprotein engineering problems in drug discovery. Prior approaches have resorted\nto evolutionary strategies or Monte-Carlo methods for protein design, but often\nfail to exploit the structure of the combinatorial search space, to generalize\nto unseen sequences. In the context of discrete black box optimization over\nlarge search spaces, learning a mutation policy to generate novel sequences\nwith reinforcement learning is appealing. Recent advances in protein language\nmodels (PLMs) trained on large corpora of protein sequences offer a potential\nsolution to this problem by scoring proteins according to their biological\nplausibility (such as the TM-score). In this work, we propose to use PLMs as a\nreward function to generate new sequences. Yet the PLM can be computationally\nexpensive to query due to its large size. To this end, we propose an\nalternative paradigm where optimization can be performed on scores from a\nsmaller proxy model that is periodically finetuned, jointly while learning the\nmutation policy. We perform extensive experiments on various sequence lengths\nto benchmark RL-based approaches, and provide comprehensive evaluations along\nbiological plausibility and diversity of the protein. Our experimental results\ninclude favorable evaluations of the proposed sequences, along with high\ndiversity scores, demonstrating that RL is a strong candidate for biological\nsequence design. Finally, we provide a modular open source implementation can\nbe easily integrated in most RL training loops, with support for replacing the\nreward model with other PLMs, to spur further research in this domain. The code\nfor all experiments is provided in the supplementary material.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning for Sequence Design Leveraging Protein Language Models\",\"authors\":\"Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Riashat Islam\",\"doi\":\"arxiv-2407.03154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein sequence design, determined by amino acid sequences, are essential to\\nprotein engineering problems in drug discovery. Prior approaches have resorted\\nto evolutionary strategies or Monte-Carlo methods for protein design, but often\\nfail to exploit the structure of the combinatorial search space, to generalize\\nto unseen sequences. In the context of discrete black box optimization over\\nlarge search spaces, learning a mutation policy to generate novel sequences\\nwith reinforcement learning is appealing. Recent advances in protein language\\nmodels (PLMs) trained on large corpora of protein sequences offer a potential\\nsolution to this problem by scoring proteins according to their biological\\nplausibility (such as the TM-score). In this work, we propose to use PLMs as a\\nreward function to generate new sequences. Yet the PLM can be computationally\\nexpensive to query due to its large size. To this end, we propose an\\nalternative paradigm where optimization can be performed on scores from a\\nsmaller proxy model that is periodically finetuned, jointly while learning the\\nmutation policy. We perform extensive experiments on various sequence lengths\\nto benchmark RL-based approaches, and provide comprehensive evaluations along\\nbiological plausibility and diversity of the protein. Our experimental results\\ninclude favorable evaluations of the proposed sequences, along with high\\ndiversity scores, demonstrating that RL is a strong candidate for biological\\nsequence design. Finally, we provide a modular open source implementation can\\nbe easily integrated in most RL training loops, with support for replacing the\\nreward model with other PLMs, to spur further research in this domain. The code\\nfor all experiments is provided in the supplementary material.\",\"PeriodicalId\":501022,\"journal\":{\"name\":\"arXiv - QuanBio - Biomolecules\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Biomolecules\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.03154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.03154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning for Sequence Design Leveraging Protein Language Models
Protein sequence design, determined by amino acid sequences, are essential to
protein engineering problems in drug discovery. Prior approaches have resorted
to evolutionary strategies or Monte-Carlo methods for protein design, but often
fail to exploit the structure of the combinatorial search space, to generalize
to unseen sequences. In the context of discrete black box optimization over
large search spaces, learning a mutation policy to generate novel sequences
with reinforcement learning is appealing. Recent advances in protein language
models (PLMs) trained on large corpora of protein sequences offer a potential
solution to this problem by scoring proteins according to their biological
plausibility (such as the TM-score). In this work, we propose to use PLMs as a
reward function to generate new sequences. Yet the PLM can be computationally
expensive to query due to its large size. To this end, we propose an
alternative paradigm where optimization can be performed on scores from a
smaller proxy model that is periodically finetuned, jointly while learning the
mutation policy. We perform extensive experiments on various sequence lengths
to benchmark RL-based approaches, and provide comprehensive evaluations along
biological plausibility and diversity of the protein. Our experimental results
include favorable evaluations of the proposed sequences, along with high
diversity scores, demonstrating that RL is a strong candidate for biological
sequence design. Finally, we provide a modular open source implementation can
be easily integrated in most RL training loops, with support for replacing the
reward model with other PLMs, to spur further research in this domain. The code
for all experiments is provided in the supplementary material.