Variant Effect Prediction in the Age of Machine Learning

IF 6.9 2区 生物学 Q1 CELL BIOLOGY
Yana Bromberg, R. Prabakaran, Anowarul Kabir, Amarda Shehu
{"title":"Variant Effect Prediction in the Age of Machine Learning","authors":"Yana Bromberg, R. Prabakaran, Anowarul Kabir, Amarda Shehu","doi":"10.1101/cshperspect.a041467","DOIUrl":null,"url":null,"abstract":"Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.","PeriodicalId":10494,"journal":{"name":"Cold Spring Harbor perspectives in biology","volume":null,"pages":null},"PeriodicalIF":6.9000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cold Spring Harbor perspectives in biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/cshperspect.a041467","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.
机器学习时代的变异效应预测
多年来,人们创造了许多计算方法,用于分析基因组编码区单核苷酸变异产生的单氨基酸置换的影响。从历史上看,所有方法都是有监督的,因此受到实验数据集规模不足和缺乏变异效应标准化定义的限制。基于深度学习(DL)的无监督方法的出现提出了一个重要问题:机器能否从未注释的蛋白质序列数据中学习到足够好的生命语言,以识别蛋白质 "句子 "中的重大错误?我们的分析表明,一些无监督方法的表现与现有的有监督方法不相上下,甚至更好。无监督方法的速度也更快,因此可用于大规模变异评估。然而,对于所有其他方法来说,它们的性能因评价指标和预测的变异效应类型而异。我们还注意到,对研究较少的非人类蛋白质的方法性能评估仍然缺乏,而无监督方法在这方面最有前途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
15.00
自引率
1.40%
发文量
56
审稿时长
3-8 weeks
期刊介绍: Cold Spring Harbor Perspectives in Biology offers a comprehensive platform in the molecular life sciences, featuring reviews that span molecular, cell, and developmental biology, genetics, neuroscience, immunology, cancer biology, and molecular pathology. This online publication provides in-depth insights into various topics, making it a valuable resource for those engaged in diverse aspects of biological research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信