有效的引物设计用于在大规模基因组数据集中检测高度分化病毒的基因型和亚型。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-09-01 DOI:10.1186/s12859-025-06251-9

Burak Demiralay, Tolga Can

{"title":"有效的引物设计用于在大规模基因组数据集中检测高度分化病毒的基因型和亚型。","authors":"Burak Demiralay, Tolga Can","doi":"10.1186/s12859-025-06251-9","DOIUrl":null,"url":null,"abstract":"Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"223"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400757/pdf/","citationCount":"0","resultStr":"{\"title\":\"Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.\",\"authors\":\"Burak Demiralay, Tolga Can\",\"doi\":\"10.1186/s12859-025-06251-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"223\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400757/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06251-9\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06251-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

鉴定生物样品中的微生物是诊断、病原体筛选、生物医学研究、进化研究、农业和生物威胁评估的关键步骤。虽然在研究大型生物体方面取得了进展，但仍需要一种有效和可扩展的方法，能够处理具有高突变率和遗传多样性的生物体（如单链病毒）的数千个全基因组。在这项研究中，我们开发了一种新的方法来识别子序列，用于检测（meta）基因组样本中的给定物种/亚种，使用聚合酶链反应（PCR）方法。任何分析中的物种检测都高度依赖于测量方法，并且由于热力学相互作用在PCR中至关重要，因此热力学是所提出方法的主要驱动力。我们的方法在多个步骤中并行化，并涉及从目标基因组中提取所有寡核苷酸。然后，我们使用构建的后缀阵列和局部比对以及热力学相互作用评估来定位每个寡核苷酸的目标位点。亚种鉴定的一个重要要求是避免扩增非目标基因组集，我们的方法解决了这个问题。我们将这种方法应用于三种高度分化的病毒；(1)丙型肝炎病毒（HCV），其亚型之间的核苷酸位点平均差异为31-33%；(2)人类免疫缺陷病毒（HIV），其亚型之间的差异为25-35%，亚型内的差异为15-20%；(3)登革热病毒，其各自的基因组（仅DENV 1-4）具有60%的序列一致性。使用我们的方法，我们能够选择能够在计算机上识别99.9%的1657个HCV基因组，99.7%的11838个HIV基因组和95.4%的4016个登革热基因组的寡核苷酸。HCV基因型1-6和登革病毒基因型1-4的亚种鉴定结果显示，平均真阳性率大于99.5%，假阳性率小于0.05%。没有一种最先进的方法可以像本文所研究的那样，在高度分化的病毒基因组上产生具有这种特异性和敏感性的寡核苷酸。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

查看原文本刊更多论文

Effective primer design for genotype and subtype detection of highly divergent viruses in large scale genome datasets.

Identification of microorganisms in a biological sample is a crucial step in diagnostics, pathogen screening, biomedical research, evolutionary studies, agriculture, and biological threat assessment. While progress has been made in studying larger organisms, there is a need for an efficient and scalable method that can handle thousands of whole genomes for organisms with high mutation rates and genetic diversity such as single stranded viruses. In this study, we developed a novel method to identify subsequences for detection of a given species/subspecies in a (meta)genomic sample using the Polymerase Chain Reaction (PCR) method. Species detection in any analysis depends highly on the measurement method and since thermodynamic interactions are critical in PCR, thermodynamics is the main driving force in the proposed methodology. Our method is parallelized in multiple steps and involves extracting all oligonucleotides from target genomes. We then locate the target sites for each oligonucleotide using the constructed suffix array and local alignment followed by thermodynamic interaction assessment. An important requirement for subspecies identification is to avoid amplifying a non-target set of genomes and our method addresses this. We applied our method to three highly divergent viruses; (1) Hepatitis C virus (HCV), where the subtypes differ in 31-33% of nucleotide sites on average, (2) Human immunodeficiency virus (HIV), for which, 25-35% between-subtype and 15-20% within-subtype variation is observed, and (3) the Dengue virus, whose respective genomes (only DENV 1-4) share 60% sequence identity to each other. Using our method, we were able to select oligonucleotides that can identify in silico 99.9% of 1657 HCV genomes, 99.7% of 11,838 HIV genomes, and 95.4% of 4016 Dengue genomes. We also show subspecies identification on genotypes 1-6 of HCV and genotypes 1-4 of the Dengue virus with more than 99.5% true positive and less than 0.05% false positive rate, on average. None of the state-of-the-art methods can produce oligonucleotides with this specificity and sensitivity on highly divergent viral genomes like the ones studied in this article.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.