NAR Genomics and Bioinformatics最新文献

筛选
英文 中文
Benchmarking genetic interaction scoring methods for identifying synthetic lethality from combinatorial CRISPR screens. 从组合CRISPR筛选中鉴定合成致死性的基因相互作用评分方法的基准测试。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-26 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf129
Hamda Ajmal, Sutanu Nandi, Narod Kebabci, Colm J Ryan
{"title":"Benchmarking genetic interaction scoring methods for identifying synthetic lethality from combinatorial CRISPR screens.","authors":"Hamda Ajmal, Sutanu Nandi, Narod Kebabci, Colm J Ryan","doi":"10.1093/nargab/lqaf129","DOIUrl":"10.1093/nargab/lqaf129","url":null,"abstract":"<p><p>Synthetic lethality (SL) is an extreme form of negative genetic interaction, where simultaneous disruption of two non-essential genes causes cell death. SL can be exploited to develop cancer therapies that target tumour cells with specific mutations, potentially limiting toxicity. Pooled combinatorial CRISPR screens, where two genes are simultaneously perturbed and the resulting impacts on fitness estimated, are now widely used for the identification of SL targets in cancer. Various scoring methods have been developed to infer SL genetic interactions from these screens, but there has been no systematic comparison of these approaches. Here, we performed a comprehensive analysis of five scoring methods for SL detection using five combinatorial CRISPR datasets. We assessed the performance of each algorithm on each screen dataset using two different benchmarks of paralog SL. We find that no single method performs best across all screens but identify two methods that perform well across most datasets. Of these two scores, Gemini-Sensitive has an available R package that can be applied to most screen designs, making it a reasonable first choice.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf129"},"PeriodicalIF":2.8,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12464814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proteins need extra attention: improving the predictive power of protein language models on mutational datasets with hint tokens. 蛋白质需要特别注意:用提示符号提高蛋白质语言模型对突变数据集的预测能力。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-26 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf128
Xinning Li, Ryann M Perez, Sam Giannakoulias, E James Petersson
{"title":"Proteins need extra attention: improving the predictive power of protein language models on mutational datasets with hint tokens.","authors":"Xinning Li, Ryann M Perez, Sam Giannakoulias, E James Petersson","doi":"10.1093/nargab/lqaf128","DOIUrl":"10.1093/nargab/lqaf128","url":null,"abstract":"<p><p>In this computational study, we address the challenge of predicting protein functions following mutations by fine-tuning protein language models (PLMs) using a novel tokenization strategy, hint token learning (HTL). To evaluate the effectiveness of HTL, we benchmarked this approach across four pretrained models with varying architectures and sizes on four diverse protein mutational datasets. Our results showed significant improvements in weighted F1 scores in most cases when HTL was applied. To understand how HTL enhances protein mutational predictions, we trained sparse autoencoders on embeddings derived from the fine-tuned PLMs. Analysis of the latent spaces revealed that the number of activated residues within functional protein domains increased by PLM training with HTL. These findings indicate that PLMs fine-tuned with HTL may capture more biologically relevant representations of proteins. Our study highlights the potential of HTL to advance protein function prediction and provides insights into how HTL enables PLMs to capture mutational impacts at the functional level. All data and code are available at: https://github.com/ejp-lab/EJPLab_Computational_Projects/tree/master/HintTokenLearning.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf128"},"PeriodicalIF":2.8,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12464817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
nf-core/detaxizer: a benchmarking study for decontamination from human sequences. Nf-core /去氧剂:人类序列去污的基准研究。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-23 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf125
Jannik Seidel, Camill Kaipf, Daniel Straub, Sven Nahnsen
{"title":"nf-core/detaxizer: a benchmarking study for decontamination from human sequences.","authors":"Jannik Seidel, Camill Kaipf, Daniel Straub, Sven Nahnsen","doi":"10.1093/nargab/lqaf125","DOIUrl":"10.1093/nargab/lqaf125","url":null,"abstract":"<p><p>Privacy is paramount in health data, particularly in human genetics, where information extends beyond individuals to their relatives. Metagenomic datasets contain substantial human genetic material, necessitating careful handling to mitigate data leakage risks when sharing or publishing. The same applies to genetic datasets from the environment or datasets from contaminated laboratory samples, although to a lesser extent. Completely removing human sequence data while retaining unbiased nonhuman reads is not achievable currently, but several tools exist. To address these topics, we developed nf-core/detaxizer, a nextflow-based pipeline that employs Kraken2 and bbmap/bbduk for taxonomic classification, identifying and optionally filtering <i>Homo sapiens</i> reads. Due to its generalized design, other taxa can also be classified and filtered. We benchmark its filtering efficacy for human reads against Hostile and CLEAN, demonstrating its utility for secure data preprocessing. The comparison showed that the choice of tool and database can result in differences of up to an order of magnitude in both the amount of human data not removed and the amount of microbial data mistakenly removed. As part of the nf-core initiative, nf-core/detaxizer adheres to best practices, leveraging containerized dependencies for streamlined installation. The source code is openly available under the MIT license: https://github.com/nf-core/detaxizer.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf125"},"PeriodicalIF":2.8,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12455401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145138739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IMGT® analysis of the human IGH locus: unveiling novel polymorphisms and copy number variations in 15 genome assemblies from diverse ancestral backgrounds. 人类IGH位点的IMGT®分析:揭示来自不同祖先背景的15个基因组组装的新多态性和拷贝数变化。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-17 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf127
Ariadni Papadaki, Maria Georga, Joumana Jabado-Michaloud, Géraldine Folch, Guilhem Zeitoun, Patrice Duroux, Véronique Giudicelli, Sofia Kossida
{"title":"IMGT<sup>®</sup> analysis of the human IGH locus: unveiling novel polymorphisms and copy number variations in 15 genome assemblies from diverse ancestral backgrounds.","authors":"Ariadni Papadaki, Maria Georga, Joumana Jabado-Michaloud, Géraldine Folch, Guilhem Zeitoun, Patrice Duroux, Véronique Giudicelli, Sofia Kossida","doi":"10.1093/nargab/lqaf127","DOIUrl":"10.1093/nargab/lqaf127","url":null,"abstract":"<p><p>Unraveling the genetic complexity of the human immunoglobulin heavy (IGH) chain locus provides valuable insights into the mechanisms underlying the efficacy and specificity of the adaptive immune response. Despite its crucial role, the IGH locus remains insufficiently characterized, with its allelic diversity and polymorphisms inadequately investigated. In this study, we present an analysis of the human IGH locus, incorporating 15 human genome assemblies from diverse ancestries, including African, European, Asian, Saudi, and mixed backgrounds. Through our examination of both maternal and paternal assemblies, we uncover novel IGH alleles, copy number variations (CNV), and polymorphisms, particularly within the variable (IGHV) region. Our findings reveal extensive and previously uncharacterized genetic variability in the constant (IGHC) region and distinct IMGT CNV forms across individuals. This research contributes to a significant enrichment of the IMGT<sup>®</sup> IGH reference directory, databases, tools and web resources, and lays the groundwork for an IMGT<sup>®</sup> haplotype database which can be progressively enriched as additional datasets become available. Such a resource promises to propel personalized immunogenomics forward, with exciting applications in cancer immunotherapy, COVID-19, and other immune-related diseases.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf127"},"PeriodicalIF":2.8,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145114359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
araCNA: somatic copy number profiling using long-range sequence models. araCNA:使用长程序列模型的体细胞拷贝数分析。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-09 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf124
Ellen Visscher, Christopher Yau
{"title":"araCNA: somatic copy number profiling using long-range sequence models.","authors":"Ellen Visscher, Christopher Yau","doi":"10.1093/nargab/lqaf124","DOIUrl":"10.1093/nargab/lqaf124","url":null,"abstract":"<p><p>Somatic copy number alterations (CNAs) are hallmarks of cancer. Current algorithms that call CNAs from whole-genome sequenced (WGS) data have not exploited deep learning methods owing to computational scaling limitations. Here, we present a novel deep-learning approach, araCNA, trained only on simulated data that can accurately predict CNAs in real WGS cancer genomes. araCNA uses novel transformer alternatives (e.g. Mamba) to handle genomic-scale sequence lengths (∼1M) and learn long-range interactions. Results are extremely accurate on simulated data, and this zero-shot approach is on par with existing methods when applied to 50 WGS samples from the Cancer Genome Atlas. Notably, our approach requires only a tumour sample and not a matched normal sample, has fewer markers of overfitting, and performs inference in only a few minutes. araCNA demonstrates how domain knowledge can be used to simulate training sets that harness the power of modern machine learning in biological applications.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf124"},"PeriodicalIF":2.8,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12418177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145041624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell-free DNA as a potential alternative to genomic DNA in genetic studies. 无细胞DNA在遗传学研究中作为基因组DNA的潜在替代品。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-09 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf119
Jingyu Zeng, Huanhuan Zhu, Yu Wang, Guodan Zeng, Panhong Liu, Rijing Ou, Xianmei Lan, Yuhui Zheng, Chenhui Zhao, Linxuan Li, Haiqiang Zhang, Jianhua Yin, Mingzhi Liao, Yan Zhang, Xin Jin
{"title":"Cell-free DNA as a potential alternative to genomic DNA in genetic studies.","authors":"Jingyu Zeng, Huanhuan Zhu, Yu Wang, Guodan Zeng, Panhong Liu, Rijing Ou, Xianmei Lan, Yuhui Zheng, Chenhui Zhao, Linxuan Li, Haiqiang Zhang, Jianhua Yin, Mingzhi Liao, Yan Zhang, Xin Jin","doi":"10.1093/nargab/lqaf119","DOIUrl":"10.1093/nargab/lqaf119","url":null,"abstract":"<p><p>Next-generation sequencing has greatly advanced genomics, enabling large-scale studies of population genetics and complex traits. Genomic DNA (gDNA) from white blood cells has traditionally been the main data source, but cell-free DNA (cfDNA), found in bodily fluids as fragmented DNA, is increasingly recognized as a valuable biomarker in clinical and genetic studies. However, a direct comparison between cfDNA and gDNA has not been fully explored. In this study, we analyzed cfDNA and gDNA from 186 healthy individuals, using the same sequencing platform. We compared sequencing quality, variant detection, allele frequencies (AF), genotype concordance, population structure, and genomic association results (genome-wide association study and expression quantitative trait locus). While cfDNA showed higher duplication rates and lower effective sequencing depth, both DNA types displayed similar quality metrics at the same depth. We also observed that significant depth differences between cfDNA and gDNA were mainly found in centromeric regions. While gDNA identified more variants with more uniform coverage, AF spectra, population structure, and genomic associations were largely consistent between the two DNA types. This study provides a detailed comparison of cfDNA and gDNA, highlighting the potential of cfDNA as an alternative to gDNA in genomic research. Our findings could serve as a reference for future studies on cfDNA and gDNA.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf119"},"PeriodicalIF":2.8,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale composite hypothesis testing procedure for omics data analyses. 组学数据分析的大规模复合假设检验程序。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-05 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf118
Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard
{"title":"Large-scale composite hypothesis testing procedure for omics data analyses.","authors":"Annaïg De Walsche, Franck Gauthier, Nathalie Boissot, Alain Charcosset, Tristan Mary-Huard","doi":"10.1093/nargab/lqaf118","DOIUrl":"10.1093/nargab/lqaf118","url":null,"abstract":"<p><p>Composite hypothesis testing using summary statistics is a well-established approach for assessing the effect of a single marker or gene across multiple traits or omics levels. Numerous procedures have been developed for this task and have been successfully applied to identify complex patterns of association between traits, conditions, or phenotypes. However, existing methods often struggle with scalability in large datasets or fail to account for dependencies between traits or omics levels, limiting their ability to control false positives effectively. To overcome these challenges, we present the qch_copula approach, which integrates mixture models with a copula function to capture dependencies between traits or omics and provides rigorously defined <i>P</i>-values for any composite hypothesis. Through a comprehensive benchmark against eight state-of-the-art methods, we demonstrate that qch_copula controls Type I error rates effectively while enhancing the detection of joint association patterns. Compared to other mixture model-based approaches, our method notably reduces memory usage during the EM algorithm, allowing the analysis of up to 20 traits and 10<sup>5</sup>-10<sup>6</sup> markers. The effectiveness of qch_copula is further validated through two application cases in human and plant genetics. The method is available in the R package qch, accessible on CRAN.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf118"},"PeriodicalIF":2.8,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
proRate: an R package to infer gene transcription rates with a novel least sum of squares method. 比例:一个R包推断基因转录率与新颖的最小平方和方法。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-05 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf123
Yu Liu, Fadhl Alakwaa
{"title":"<i>proRate</i>: an R package to infer gene transcription rates with a novel least sum of squares method.","authors":"Yu Liu, Fadhl Alakwaa","doi":"10.1093/nargab/lqaf123","DOIUrl":"10.1093/nargab/lqaf123","url":null,"abstract":"<p><p>The dynamics of transcriptional elongation influence many biological activities, such as RNA splicing, polyadenylation, and nuclear export. To quantify the elongation rate, a typical method is to treat cells with drugs that inhibit RNA polymerase II (Pol II) from entering the gene body and then track Pol II using Pro-seq or Gro-seq. However, the downstream data analysis is challenged by the problem of identifying the transition point between the gene regions inhibited by the drug and not, which is necessary to calculate the transcription rate. Although the traditional hidden Markov model (HMM) can be used to solve it, this method is complicated with its hidden variable and many parameters to be estimated. Hence, we developed the R package <i>proRate</i>, which identifies the transition point with a novel least sum of squares (LSS) method and calculates the elongation rate accordingly. In addition, <i>proRate</i> also covers other functions frequently used in transcription dynamic study, including metagene plotting, pause index calculation, gene structure analysis, etc. The effectiveness of this package is proved by its performance on three Pro-seq or Gro-seq datasets, showing higher accuracy than HMM. <i>proRate</i> is freely available at https://github.com/yuabrahamliu/proRate or https://github.com/FADHLyemen/proRate.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf123"},"PeriodicalIF":2.8,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12412782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-resolution meiotic crossover map from single-nucleus ATAC-seq reveals insights into the recombination landscape in mammals. 来自单核ATAC-seq的高分辨率减数分裂交叉图揭示了对哺乳动物重组景观的见解。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-03 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf122
Stevan Novakovic, Caitlin Harris, Ruijie Liu, Davis J McCarthy, Wayne Crismani
{"title":"A high-resolution meiotic crossover map from single-nucleus ATAC-seq reveals insights into the recombination landscape in mammals.","authors":"Stevan Novakovic, Caitlin Harris, Ruijie Liu, Davis J McCarthy, Wayne Crismani","doi":"10.1093/nargab/lqaf122","DOIUrl":"10.1093/nargab/lqaf122","url":null,"abstract":"<p><p>Meiotic crossovers promote correct chromosome segregation and the shuffling of genetic diversity. However, the measurement of crossovers remains challenging, impeding our ability to decipher the molecular mechanisms that are necessary for their formation and regulation. Here we demonstrate a novel repurposing of the single-nucleus Assay for Transposase Accessible Chromatin with sequencing (snATAC-seq) as a simple and high-throughput method to identify and characterize meiotic crossovers from haploid testis nuclei. We first validate the feasibility of obtaining genome-wide coverage from snATAC-seq by using ATAC-seq on bulk haploid mouse testis nuclei, ensuring adequate variant detection for haplotyping. Subsequently, we adapt droplet-based snATAC-seq for crossover detection, revealing >25 000 crossovers in F<sub>1</sub> hybrid mice. Comparison between the wild type and a hyper-recombinogenic <i>Fancm</i>-deficient mutant mouse model confirmed an increase in crossover rates in this genotype, however with a distribution which was unchanged. We also find that regions with the highest rate of crossover formation are enriched for PRDM9. Our findings demonstrate the utility of snATAC-seq as a robust and scalable tool for high-throughput crossover detection, offering insights into meiotic crossover dynamics and elucidating the underlying molecular mechanisms. It is possible that the research presented here with snATAC-seq of haploid post-meiotic nuclei could be extended into fertility-related diagnostics.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf122"},"PeriodicalIF":2.8,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GViNC: an innovative framework for genome graph comparison reveals hidden patterns in the genetic diversity of human populations. GViNC:基因组图谱比较的创新框架,揭示了人类种群遗传多样性的隐藏模式。
IF 2.8
NAR Genomics and Bioinformatics Pub Date : 2025-09-03 eCollection Date: 2025-09-01 DOI: 10.1093/nargab/lqaf121
Venkatesh Kamaraj, Ayam Gupta, Karthik Raman, Manikandan Narayanan, Himanshu Sinha
{"title":"GViNC: an innovative framework for genome graph comparison reveals hidden patterns in the genetic diversity of human populations.","authors":"Venkatesh Kamaraj, Ayam Gupta, Karthik Raman, Manikandan Narayanan, Himanshu Sinha","doi":"10.1093/nargab/lqaf121","DOIUrl":"10.1093/nargab/lqaf121","url":null,"abstract":"<p><p>Genome graphs provide a powerful reference structure for representing genetic diversity. Their structure emphasizes the polymorphic regions in a collection of genomes, enabling network-based comparisons of population-level variation. However, current tools are limited in their ability to quantify and compare structural features across large genome graphs. We introduce GViNC, Genome graph Visualization, Navigation, and Comparison, a novel framework that enables partitioning genome graphs into interpretable subgraphs, mapping linear coordinates to graph nodes, and summarizing both local and global structural variation using new metrics for variability, hypervariability, and graph distances. We applied GViNC to multiple pan-genomic and population-specific genome graphs constructed with over 85M variants in 2504 individuals from the 1000 Genomes Project. We found that genomic complexity varied by ancestry and across chromosomes, with rare variants increasing variability by 10-fold and hypervariability by 50-fold. GViNC highlighted key regions of the human genome, such as Human Leukocyte Antigen and DEFB loci, and many previously unreported high-diversity regions, some with population-specific signatures in protein-coding and regulatory genes. By bridging sequence-level variation and graph-level topology, GViNC enables scalable, quantitative exploration of genome structure across populations. GViNC's versatility can aid researchers in extensively investigating the genetic diversity of different cohorts, populations, or species of interest.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 3","pages":"lqaf121"},"PeriodicalIF":2.8,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信