{"title":"Convolutions are competitive with transformers for protein sequence pretraining.","authors":"Kevin K Yang, Nicolo Fusi, Alex X Lu","doi":"10.1016/j.cels.2024.01.008","DOIUrl":null,"url":null,"abstract":"<p><p>Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"286-294.e2"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.01.008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.