Efficient HLA imputation from sequential SNPs data by transformer

IF 2.6 3区 生物学 Q2 GENETICS & HEREDITY
Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita
{"title":"Efficient HLA imputation from sequential SNPs data by transformer","authors":"Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita","doi":"10.1038/s10038-024-01278-x","DOIUrl":null,"url":null,"abstract":"Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named “HLA Reliable IMpuatioN by Transformer (HLARIMNT)” designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT’s performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.","PeriodicalId":16077,"journal":{"name":"Journal of Human Genetics","volume":"69 10","pages":"533-540"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s10038-024-01278-x.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s10038-024-01278-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named “HLA Reliable IMpuatioN by Transformer (HLARIMNT)” designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT’s performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.

Abstract Image

Abstract Image

通过转换器从序列 SNPs 数据中高效推算 HLA。
人类白细胞抗原(HLA)基因与多种疾病相关,但直接进行 HLA 等位基因分型既费时又费钱。因此,人们提出了各种利用序列单核苷酸多态性(SNPs)数据的估算方法,采用统计或深度学习模型,如基于卷积神经网络(CNN)的模型 DEEP*HLA。然而,这些方法对于不常见的等位基因的估算效率有限,而且需要大量的参考数据集。在这种情况下,我们开发了一种基于变换器的 HLA 等位基因估算模型,命名为 "HLA Reliable IMpuatioN by Transformer (HLARIMNT)",旨在利用 SNPs 数据的连续性。我们使用两个不同的参考面板(泛亚参考面板(n = 530)和 1 型糖尿病遗传学联盟(T1DGC)参考面板(n = 5225))以及一个组合面板(n = 1060)评估了 HLARIMNT 的性能。在多个指标上,HLARIMNT 的准确性都优于 DEEP*HLA,特别是对于不常见的等位基因。此外,我们还探讨了不同训练数据规模对估算准确性的影响,发现在所有数据规模下,HLARIMNT 的表现始终优于 DEEP*HLA。这些研究结果表明,基于 Transformer 的模型不仅能有效地归因 HLA 类型,还可能从序列 SNPs 数据中归因其他基因类型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Human Genetics
Journal of Human Genetics 生物-遗传学
CiteScore
7.20
自引率
0.00%
发文量
101
审稿时长
4-8 weeks
期刊介绍: The Journal of Human Genetics is an international journal publishing articles on human genetics, including medical genetics and human genome analysis. It covers all aspects of human genetics, including molecular genetics, clinical genetics, behavioral genetics, immunogenetics, pharmacogenomics, population genetics, functional genomics, epigenetics, genetic counseling and gene therapy. Articles on the following areas are especially welcome: genetic factors of monogenic and complex disorders, genome-wide association studies, genetic epidemiology, cancer genetics, personal genomics, genotype-phenotype relationships and genome diversity.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信