{"title":"利用深度突变扫描更新变异效应预测器的基准。","authors":"Benjamin J Livesey, Joseph A Marsh","doi":"10.15252/msb.202211474","DOIUrl":null,"url":null,"abstract":"<p><p>The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.</p>","PeriodicalId":18906,"journal":{"name":"Molecular Systems Biology","volume":"19 8","pages":"e11474"},"PeriodicalIF":8.5000,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10407742/pdf/","citationCount":"0","resultStr":"{\"title\":\"Updated benchmarking of variant effect predictors using deep mutational scanning.\",\"authors\":\"Benjamin J Livesey, Joseph A Marsh\",\"doi\":\"10.15252/msb.202211474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.</p>\",\"PeriodicalId\":18906,\"journal\":{\"name\":\"Molecular Systems Biology\",\"volume\":\"19 8\",\"pages\":\"e11474\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2023-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10407742/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Systems Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.15252/msb.202211474\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.15252/msb.202211474","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Updated benchmarking of variant effect predictors using deep mutational scanning.
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.
期刊介绍:
Systems biology is a field that aims to understand complex biological systems by studying their components and how they interact. It is an integrative discipline that seeks to explain the properties and behavior of these systems.
Molecular Systems Biology is a scholarly journal that publishes top-notch research in the areas of systems biology, synthetic biology, and systems medicine. It is an open access journal, meaning that its content is freely available to readers, and it is peer-reviewed to ensure the quality of the published work.