{"title":"Protein Folding Structure Prediction using Reinforcement Learning with Application to Both 2D and 3D Environments","authors":"Jason Lu","doi":"10.1145/3569966.3570102","DOIUrl":null,"url":null,"abstract":"Proteins are critical for lives. They not only build 10%-35% of our body tissues, but also can be used to understand the structures of different viruses, and then help us to explore effective vaccines. Hence, predicting new protein structures is very important for human health. However, the structure of protein is complicated. Exploration using human experiments is cost-consuming. Recently, artificial intelligence (AI) technology, such as imitation learning and reinforcement learning (RL), has been rapidly developed and significantly improved the efficiency in many different domains. In this project, we will try to use RL to solve the protein folding structure prediction problem. First, we adopted the PH structure as a relatively simple representation of the protein structure, where different peptides can be categorized into two types: P(hydrophilic) and H(hydrophobic). The goal of the protein folding is to try to make more H pairs during the folding process. We then formulated the protein folding problem as a reinforcement learning process. If a new H pair is generated during folding, we collect -1 reward. Such RL reward is designed based on the protein dataset (Protein Data Bank). Finally, we implemented three RL algorithms: 1) Q-learning, 2) Deep Q-learning, and 3) Double Deep Q-learning (DDQN). We implemented and compared the three algorithms in terms of their accuracy and efficiency. We found that all three algorithms can accurately predict the structures of simple proteins. As protein structures become more complicated, the DDQN is performing better.","PeriodicalId":145580,"journal":{"name":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569966.3570102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Proteins are critical for lives. They not only build 10%-35% of our body tissues, but also can be used to understand the structures of different viruses, and then help us to explore effective vaccines. Hence, predicting new protein structures is very important for human health. However, the structure of protein is complicated. Exploration using human experiments is cost-consuming. Recently, artificial intelligence (AI) technology, such as imitation learning and reinforcement learning (RL), has been rapidly developed and significantly improved the efficiency in many different domains. In this project, we will try to use RL to solve the protein folding structure prediction problem. First, we adopted the PH structure as a relatively simple representation of the protein structure, where different peptides can be categorized into two types: P(hydrophilic) and H(hydrophobic). The goal of the protein folding is to try to make more H pairs during the folding process. We then formulated the protein folding problem as a reinforcement learning process. If a new H pair is generated during folding, we collect -1 reward. Such RL reward is designed based on the protein dataset (Protein Data Bank). Finally, we implemented three RL algorithms: 1) Q-learning, 2) Deep Q-learning, and 3) Double Deep Q-learning (DDQN). We implemented and compared the three algorithms in terms of their accuracy and efficiency. We found that all three algorithms can accurately predict the structures of simple proteins. As protein structures become more complicated, the DDQN is performing better.