Exploration of chaos game representation and integrative deep learning approaches for whole-genome sequencing-based grapevine genetic testing.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf193

Andrew Vu, Brendan Park, Yifeng Li, Ping Liang

{"title":"Exploration of chaos game representation and integrative deep learning approaches for whole-genome sequencing-based grapevine genetic testing.","authors":"Andrew Vu, Brendan Park, Yifeng Li, Ping Liang","doi":"10.1093/bioadv/vbaf193","DOIUrl":null,"url":null,"abstract":"Motivation: The identification of grapevine species, cultivars, and clones associated with desired traits is an important component of viticulture. True-to-type identification is very challenging for grapevine due to the existence of a large number of cultivars and clones and the historical issues of synonyms and homonyms. DNA-based identification, superior to morphology-based methods, has been used as the current standard true-to-type method for grapevine, but not without shortcomings, such as the limited number of biomarkers and accessibility of services.Results: To overcome some of the limitations of traditional microsatellite-marker-based genetic testing, we explored a whole-genome-sequencing (WGS)-based approach to achieve the best accuracy at an affordable cost. To address the challenges of the extreme high dimensionality of the WGS data, we examined the effectiveness of using chaos game representation (CGR) to represent the genome sequence data and using deep learning for species and cultivar identification. CGR images provide a meaningful way to capture patterns for use with visual analysis, with the best results showing a 99% balanced accuracy in classifying five species, and a 80% balanced accuracy in predicting 41 cultivars. Our preliminary research highlights the potential for CGR and deep learning as a complementary tool for WGS-based species- and cultivar-level classification.Availability and implementation: Our implementation, including the pipeline for data processing and the four predictive models, is available at https://github.com/pliang64/CGR.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf193"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449056/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: The identification of grapevine species, cultivars, and clones associated with desired traits is an important component of viticulture. True-to-type identification is very challenging for grapevine due to the existence of a large number of cultivars and clones and the historical issues of synonyms and homonyms. DNA-based identification, superior to morphology-based methods, has been used as the current standard true-to-type method for grapevine, but not without shortcomings, such as the limited number of biomarkers and accessibility of services.

Results: To overcome some of the limitations of traditional microsatellite-marker-based genetic testing, we explored a whole-genome-sequencing (WGS)-based approach to achieve the best accuracy at an affordable cost. To address the challenges of the extreme high dimensionality of the WGS data, we examined the effectiveness of using chaos game representation (CGR) to represent the genome sequence data and using deep learning for species and cultivar identification. CGR images provide a meaningful way to capture patterns for use with visual analysis, with the best results showing a 99% balanced accuracy in classifying five species, and a 80% balanced accuracy in predicting 41 cultivars. Our preliminary research highlights the potential for CGR and deep learning as a complementary tool for WGS-based species- and cultivar-level classification.

Availability and implementation: Our implementation, including the pipeline for data processing and the four predictive models, is available at https://github.com/pliang64/CGR.

Abstract Image

查看原文本刊更多论文

基于全基因组测序的葡萄基因检测中混沌博弈表征和综合深度学习方法的探索。

动机：葡萄品种、栽培品种和与所需性状相关的无性系的鉴定是葡萄栽培的重要组成部分。由于葡萄藤品种和无性系数量众多，且历史上存在同音异义的问题，葡萄藤的真型鉴定非常具有挑战性。基于dna的葡萄藤鉴定方法优于基于形态学的方法，是目前葡萄藤鉴定的标准方法，但也存在生物标志物数量有限和服务可及性等缺点。结果：为了克服传统的基于微卫星标记的基因检测的一些局限性，我们探索了一种基于全基因组测序（WGS）的方法，以可承受的成本实现最佳的准确性。为了解决WGS数据极高维度的挑战，我们研究了使用混沌博弈表示（CGR）来表示基因组序列数据和使用深度学习来识别物种和品种的有效性。CGR图像为视觉分析提供了一种有意义的捕获模式的方法，对5个品种的分类达到99%的平衡准确率，对41个品种的预测达到80%的平衡准确率。我们的初步研究强调了CGR和深度学习作为基于wgs的物种和品种水平分类的补充工具的潜力。可用性和实现：我们的实现，包括用于数据处理的管道和四个预测模型，可以在https://github.com/pliang64/CGR上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量