Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.

Andrew Stubbs, Elizabeth A McClellan, Sebastiaan Horsman, Saskia D Hiltemann, Ivo Palli, Stephan Nouwens, Anton Hj Koning, Frits Hoogland, Joke Reumers, Daphne Heijsman, Sigrid Swagemakers, Andreas Kremer, Jules Meijerink, Diether Lambrechts, Peter J van der Spek
{"title":"Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.","authors":"Andrew Stubbs,&nbsp;Elizabeth A McClellan,&nbsp;Sebastiaan Horsman,&nbsp;Saskia D Hiltemann,&nbsp;Ivo Palli,&nbsp;Stephan Nouwens,&nbsp;Anton Hj Koning,&nbsp;Frits Hoogland,&nbsp;Joke Reumers,&nbsp;Daphne Heijsman,&nbsp;Sigrid Swagemakers,&nbsp;Andreas Kremer,&nbsp;Jules Meijerink,&nbsp;Diether Lambrechts,&nbsp;Peter J van der Spek","doi":"10.1186/2043-9113-2-19","DOIUrl":null,"url":null,"abstract":"<p><strong>Unlabelled: </strong></p><p><strong>Background: </strong>Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.</p><p><strong>Description: </strong>We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.</p><p><strong>Conclusion: </strong>Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.</p>","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"2 1","pages":"19"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-2-19","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of clinical bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/2043-9113-2-19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Unlabelled:

Background: Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.

Description: We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.

Conclusion: Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.

Abstract Image

Abstract Image

Abstract Image

Huvariome:全基因组下一代等位基因频率测序的web服务器资源,以帮助病理候选基因选择。
背景:下一代测序为临床研究科学家提供了无数变异的直接读取,包括个人,病理和常见的良性变异。重测序研究的目的是确定来自个体基因组、基于家族或肿瘤/正常基因组比较的候选致病变异。虽然在实验设计中使用适当的控制可以最大限度地减少选择的假阳性变异的数量,但在候选基因选择之前,使用高质量的全基因组参考数据可以进一步减少假阳性变异的数量。此外,使用平台相关的测序误差模型可以帮助从低覆盖率数据中恢复模棱两可的基因型。描述:我们开发了一个人类遗传变异的全基因组数据库Huvariome,由全基因组深度测序数据确定,具有高覆盖率和低错误率。该数据库被设计为独立于测序技术,但目前已由165个个体全基因组组成,包括小谱系和匹配的肿瘤/正常样本,通过完整基因组测序平台测序。已经确定了比荷卢经济联盟人群队列的常见变异,并与两组对照数据(165个基因组中的73个)的结果一起表示为基因型,Huvariome Core包括来自比荷卢经济联盟地区的31名健康个体,多样性小组包括代表10个不同人群的46名健康个体和三个谱系中的21个样本。用户可以通过网络界面通过基因或位置查询数据库,结果显示为数据集中检测到的变异频率。我们证明Huvariome可以提供准确的参考等位基因频率,以消除重测序实验中产生的测序不一致。Huvariome已被用于支持在参考队列中具有纯合子基因型的心肌病相关候选基因的选择。该数据库允许用户查看Huvariome核心样本中哪些选择的变异是常见变异(次要等位基因频率> 5%),从而通过过滤掉未在其他公共基因组变异数据库中列出的常见变异来帮助选择潜在的致病变异。Huvariome的未调用率和等位基因调用的准确性为用户提供了识别与人类基因组特定区域相关的平台依赖性错误的可能性。结论:Huvariome是一个简单易用的资源,可用于验证NGS实验获得的重测序结果。高序列覆盖率和低错误率为科学家提供了从谱系研究中消除假阳性结果的能力。结果通过web界面返回,该界面显示基于位置的遗传变异频率、对蛋白质功能的影响、与已知遗传变异的关联以及源自Huvariome Core和Diversity Panel数据的变异基础的质量评分。这些结果可用于识别和优先考虑罕见的变异,例如,可能与疾病相关。在测试Huvariome数据库的准确性时,在所有病例中都成功地预测了一种被模糊地称为编码单核苷酸变异的选择等位基因。通过限制从宿主机构获取患者衍生基因组来确保个人数据保护,这与未来的分子诊断相关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信