{"title":"深度学习架构在蛋白质二级结构预测中实现了最先进(SOTA)的精度。","authors":"Zahra Nikfarjam, Majid Jafari, Farshid Zargari","doi":"10.1007/s11030-026-11570-x","DOIUrl":null,"url":null,"abstract":"<p><p>Protein secondary structure prediction represents an important intermediate step between a protein's linear amino acid sequence and its three-dimensional structure, with broad implications for synthetic biology, drug development, and disease research. Although experimental techniques such as X-ray crystallography provide highly accurate structural information, they are labor-intensive, time-consuming, and costly, which has motivated the development of computational alternatives. Early machine-learning approaches to this problem were limited in their ability to capture complex sequence-structure relationships. The introduction of convolutional and recurrent neural networks improved hierarchical feature extraction, and predictive performance advanced further with transformer-based architectures such as AlphaFold2. This review outlines recent advances in hybrid model design, benchmark datasets, and evaluation metrics for protein secondary structure prediction. We also discuss current methodological limitations, including data dependency and dataset bias, and outline future directions such as cross-species validation, uncertainty-aware modeling, and the still-emerging potential of incorporating heterogeneous biological data into next-generation PSSP frameworks.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2026-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning architectures achieve state-of-the-art (SOTA) accuracy in protein secondary structure prediction.\",\"authors\":\"Zahra Nikfarjam, Majid Jafari, Farshid Zargari\",\"doi\":\"10.1007/s11030-026-11570-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Protein secondary structure prediction represents an important intermediate step between a protein's linear amino acid sequence and its three-dimensional structure, with broad implications for synthetic biology, drug development, and disease research. Although experimental techniques such as X-ray crystallography provide highly accurate structural information, they are labor-intensive, time-consuming, and costly, which has motivated the development of computational alternatives. Early machine-learning approaches to this problem were limited in their ability to capture complex sequence-structure relationships. The introduction of convolutional and recurrent neural networks improved hierarchical feature extraction, and predictive performance advanced further with transformer-based architectures such as AlphaFold2. This review outlines recent advances in hybrid model design, benchmark datasets, and evaluation metrics for protein secondary structure prediction. We also discuss current methodological limitations, including data dependency and dataset bias, and outline future directions such as cross-species validation, uncertainty-aware modeling, and the still-emerging potential of incorporating heterogeneous biological data into next-generation PSSP frameworks.</p>\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2026-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-026-11570-x\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-026-11570-x","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
Deep learning architectures achieve state-of-the-art (SOTA) accuracy in protein secondary structure prediction.
Protein secondary structure prediction represents an important intermediate step between a protein's linear amino acid sequence and its three-dimensional structure, with broad implications for synthetic biology, drug development, and disease research. Although experimental techniques such as X-ray crystallography provide highly accurate structural information, they are labor-intensive, time-consuming, and costly, which has motivated the development of computational alternatives. Early machine-learning approaches to this problem were limited in their ability to capture complex sequence-structure relationships. The introduction of convolutional and recurrent neural networks improved hierarchical feature extraction, and predictive performance advanced further with transformer-based architectures such as AlphaFold2. This review outlines recent advances in hybrid model design, benchmark datasets, and evaluation metrics for protein secondary structure prediction. We also discuss current methodological limitations, including data dependency and dataset bias, and outline future directions such as cross-species validation, uncertainty-aware modeling, and the still-emerging potential of incorporating heterogeneous biological data into next-generation PSSP frameworks.
期刊介绍:
Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including:
combinatorial chemistry and parallel synthesis;
small molecule libraries;
microwave synthesis;
flow synthesis;
fluorous synthesis;
diversity oriented synthesis (DOS);
nanoreactors;
click chemistry;
multiplex technologies;
fragment- and ligand-based design;
structure/function/SAR;
computational chemistry and molecular design;
chemoinformatics;
screening techniques and screening interfaces;
analytical and purification methods;
robotics, automation and miniaturization;
targeted libraries;
display libraries;
peptides and peptoids;
proteins;
oligonucleotides;
carbohydrates;
natural diversity;
new methods of library formulation and deconvolution;
directed evolution, origin of life and recombination;
search techniques, landscapes, random chemistry and more;