{"title":"Neural Bandits for Protein Sequence Optimization","authors":"Chenyu Wang, Joseph Kim, Le Cong, Mengdi Wang","doi":"10.1109/CISS53076.2022.9751154","DOIUrl":null,"url":null,"abstract":"Protein design involves searching over a large combinatorial sequence space. Evaluating the fitness of new protein sequences often requires wet-lab experiments that are costly and time consuming. In this paper we propose a neural bandits algorithm that utilizes a modified upper-confidence bound algorithm for accelerating the search for optimal designs. The algorithm makes adaptive queries as guided by the kernelized neural bandits. The algorithm is tested on two public protein fitness datasets, the GB1 and WW domain. For both datasets, our algorithm consistently identifies top-fitness protein sequences. Notably, this approach finds a diverse and rich class of high fitness proteins using substantially fewer design queries compared to a range of alternative methods.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Protein design involves searching over a large combinatorial sequence space. Evaluating the fitness of new protein sequences often requires wet-lab experiments that are costly and time consuming. In this paper we propose a neural bandits algorithm that utilizes a modified upper-confidence bound algorithm for accelerating the search for optimal designs. The algorithm makes adaptive queries as guided by the kernelized neural bandits. The algorithm is tested on two public protein fitness datasets, the GB1 and WW domain. For both datasets, our algorithm consistently identifies top-fitness protein sequences. Notably, this approach finds a diverse and rich class of high fitness proteins using substantially fewer design queries compared to a range of alternative methods.