变异分类不一致:影响因素与预测模型。

IF 3.4 3区 医学 Q1 PATHOLOGY
Hamid Ghaedi , Scott K. Davey , Harriet Feilotter
{"title":"变异分类不一致:影响因素与预测模型。","authors":"Hamid Ghaedi ,&nbsp;Scott K. Davey ,&nbsp;Harriet Feilotter","doi":"10.1016/j.jmoldx.2023.11.002","DOIUrl":null,"url":null,"abstract":"<div><p>An ever-growing catalog of human variants is hosted in the ClinVar database. In this database, submissions on a variant are combined into a multisubmitter record; and in the case of discordance in variant classification between submitters, the record is labeled as conflicting. The current study used ClinVar data to identify characteristics that would make variants more likely to be associated with the conflict class of variants. Furthermore, the Extreme Gradient Boosting algorithm was used to train classifier models to provide prediction of classification discordance for single submission variants in ClinVar database. Population allele frequency, the gene harboring the variant, variant type, consequence on protein, variant deleteriousness score, first submitter identity, and submission count were associated with conflict in variant classification. Using such features, the optimized classifier showed accuracy on the test set of 88% with the weighted average of precision, recall, and f1-score of 0.84, 0.88, and 0.85, respectively. There were pronounced associations between variant classification discordance and allele frequency, gene type, and the identity of the first submitter. The study provides the predicted discordance status for single-submitter variants deposited in ClinVar. This approach can be used to assess whether single-submitter variants are likely to be supported, or in conflict with, future entries; this knowledge may help laboratories with clinical variant assessment.</p></div>","PeriodicalId":50128,"journal":{"name":"Journal of Molecular Diagnostics","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1525157823002738/pdfft?md5=e89086f67f83997c28d16831557eb9b2&pid=1-s2.0-S1525157823002738-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Variant Classification Discordance\",\"authors\":\"Hamid Ghaedi ,&nbsp;Scott K. Davey ,&nbsp;Harriet Feilotter\",\"doi\":\"10.1016/j.jmoldx.2023.11.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>An ever-growing catalog of human variants is hosted in the ClinVar database. In this database, submissions on a variant are combined into a multisubmitter record; and in the case of discordance in variant classification between submitters, the record is labeled as conflicting. The current study used ClinVar data to identify characteristics that would make variants more likely to be associated with the conflict class of variants. Furthermore, the Extreme Gradient Boosting algorithm was used to train classifier models to provide prediction of classification discordance for single submission variants in ClinVar database. Population allele frequency, the gene harboring the variant, variant type, consequence on protein, variant deleteriousness score, first submitter identity, and submission count were associated with conflict in variant classification. Using such features, the optimized classifier showed accuracy on the test set of 88% with the weighted average of precision, recall, and f1-score of 0.84, 0.88, and 0.85, respectively. There were pronounced associations between variant classification discordance and allele frequency, gene type, and the identity of the first submitter. The study provides the predicted discordance status for single-submitter variants deposited in ClinVar. This approach can be used to assess whether single-submitter variants are likely to be supported, or in conflict with, future entries; this knowledge may help laboratories with clinical variant assessment.</p></div>\",\"PeriodicalId\":50128,\"journal\":{\"name\":\"Journal of Molecular Diagnostics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1525157823002738/pdfft?md5=e89086f67f83997c28d16831557eb9b2&pid=1-s2.0-S1525157823002738-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Molecular Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1525157823002738\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1525157823002738","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

ClinVar数据库中托管着一个不断增长的人类变体目录。在这个数据库中,一个变体的提交被合并到一个多提交者记录中,如果提交者之间的变体分类不一致,则该记录被标记为冲突。我们使用ClinVar数据来识别使变体更有可能与变体的冲突类相关联的特征。此外,我们使用极端梯度提升(XGBoost)算法来训练分类器模型,以提供ClinVar数据库中单个提交变量的分类不一致性预测。我们发现,群体等位基因频率、携带变异的基因、变异类型、对蛋白质的影响、变异有害评分、首次提交者身份和提交数与变异分类中的冲突有关。利用这些特征,优化后的分类器在测试集上的准确率为88%,精密度、召回率和f1分数的加权平均值分别为0.84、0.88和0.85。变异分类不一致与等位基因频率、基因类型和首次提交者的身份之间存在明显的关联。我们提供了在ClinVar中沉积的单个提交者变体的预测不一致状态。我们的方法可以用来评估单一提交者的变体是否可能被支持,或者与未来的条目相冲突;这些知识可能有助于实验室进行临床变异评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Variant Classification Discordance

An ever-growing catalog of human variants is hosted in the ClinVar database. In this database, submissions on a variant are combined into a multisubmitter record; and in the case of discordance in variant classification between submitters, the record is labeled as conflicting. The current study used ClinVar data to identify characteristics that would make variants more likely to be associated with the conflict class of variants. Furthermore, the Extreme Gradient Boosting algorithm was used to train classifier models to provide prediction of classification discordance for single submission variants in ClinVar database. Population allele frequency, the gene harboring the variant, variant type, consequence on protein, variant deleteriousness score, first submitter identity, and submission count were associated with conflict in variant classification. Using such features, the optimized classifier showed accuracy on the test set of 88% with the weighted average of precision, recall, and f1-score of 0.84, 0.88, and 0.85, respectively. There were pronounced associations between variant classification discordance and allele frequency, gene type, and the identity of the first submitter. The study provides the predicted discordance status for single-submitter variants deposited in ClinVar. This approach can be used to assess whether single-submitter variants are likely to be supported, or in conflict with, future entries; this knowledge may help laboratories with clinical variant assessment.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.10
自引率
2.40%
发文量
143
审稿时长
43 days
期刊介绍: The Journal of Molecular Diagnostics, the official publication of the Association for Molecular Pathology (AMP), co-owned by the American Society for Investigative Pathology (ASIP), seeks to publish high quality original papers on scientific advances in the translation and validation of molecular discoveries in medicine into the clinical diagnostic setting, and the description and application of technological advances in the field of molecular diagnostic medicine. The editors welcome for review articles that contain: novel discoveries or clinicopathologic correlations including studies in oncology, infectious diseases, inherited diseases, predisposition to disease, clinical informatics, or the description of polymorphisms linked to disease states or normal variations; the application of diagnostic methodologies in clinical trials; or the development of new or improved molecular methods which may be applied to diagnosis or monitoring of disease or disease predisposition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信