有效的引物设计用于在大规模基因组数据集中检测高度分化病毒的基因型和亚型。

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Burak Demiralay, Tolga Can
{"title":"有效的引物设计用于在大规模基因组数据集中检测高度分化病毒的基因型和亚型。","authors":"Burak Demiralay, Tolga Can","doi":"10.1186/s12859-025-06251-9","DOIUrl":null,"url":null,"abstract":"<p><p>Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"223"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400757/pdf/","citationCount":"0","resultStr":"{\"title\":\"Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.\",\"authors\":\"Burak Demiralay, Tolga Can\",\"doi\":\"10.1186/s12859-025-06251-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"223\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400757/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06251-9\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06251-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

鉴定生物样品中的微生物是诊断、病原体筛选、生物医学研究、进化研究、农业和生物威胁评估的关键步骤。虽然在研究大型生物体方面取得了进展,但仍需要一种有效和可扩展的方法,能够处理具有高突变率和遗传多样性的生物体(如单链病毒)的数千个全基因组。在这项研究中,我们开发了一种新的方法来识别子序列,用于检测(meta)基因组样本中的给定物种/亚种,使用聚合酶链反应(PCR)方法。任何分析中的物种检测都高度依赖于测量方法,并且由于热力学相互作用在PCR中至关重要,因此热力学是所提出方法的主要驱动力。我们的方法在多个步骤中并行化,并涉及从目标基因组中提取所有寡核苷酸。然后,我们使用构建的后缀阵列和局部比对以及热力学相互作用评估来定位每个寡核苷酸的目标位点。亚种鉴定的一个重要要求是避免扩增非目标基因组集,我们的方法解决了这个问题。我们将这种方法应用于三种高度分化的病毒;(1)丙型肝炎病毒(HCV),其亚型之间的核苷酸位点平均差异为31-33%;(2)人类免疫缺陷病毒(HIV),其亚型之间的差异为25-35%,亚型内的差异为15-20%;(3)登革热病毒,其各自的基因组(仅DENV 1-4)具有60%的序列一致性。使用我们的方法,我们能够选择能够在计算机上识别99.9%的1657个HCV基因组,99.7%的11838个HIV基因组和95.4%的4016个登革热基因组的寡核苷酸。HCV基因型1-6和登革病毒基因型1-4的亚种鉴定结果显示,平均真阳性率大于99.5%,假阳性率小于0.05%。没有一种最先进的方法可以像本文所研究的那样,在高度分化的病毒基因组上产生具有这种特异性和敏感性的寡核苷酸。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信