{"title":"Exploration of chaos game representation and integrative deep learning approaches for whole-genome sequencing-based grapevine genetic testing.","authors":"Andrew Vu, Brendan Park, Yifeng Li, Ping Liang","doi":"10.1093/bioadv/vbaf193","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The identification of grapevine species, cultivars, and clones associated with desired traits is an important component of viticulture. True-to-type identification is very challenging for grapevine due to the existence of a large number of cultivars and clones and the historical issues of synonyms and homonyms. DNA-based identification, superior to morphology-based methods, has been used as the current standard true-to-type method for grapevine, but not without shortcomings, such as the limited number of biomarkers and accessibility of services.</p><p><strong>Results: </strong>To overcome some of the limitations of traditional microsatellite-marker-based genetic testing, we explored a whole-genome-sequencing (WGS)-based approach to achieve the best accuracy at an affordable cost. To address the challenges of the extreme high dimensionality of the WGS data, we examined the effectiveness of using chaos game representation (CGR) to represent the genome sequence data and using deep learning for species and cultivar identification. CGR images provide a meaningful way to capture patterns for use with visual analysis, with the best results showing a 99% balanced accuracy in classifying five species, and a 80% balanced accuracy in predicting 41 cultivars. Our preliminary research highlights the potential for CGR and deep learning as a complementary tool for WGS-based species- and cultivar-level classification.</p><p><strong>Availability and implementation: </strong>Our implementation, including the pipeline for data processing and the four predictive models, is available at https://github.com/pliang64/CGR.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf193"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12449056/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: The identification of grapevine species, cultivars, and clones associated with desired traits is an important component of viticulture. True-to-type identification is very challenging for grapevine due to the existence of a large number of cultivars and clones and the historical issues of synonyms and homonyms. DNA-based identification, superior to morphology-based methods, has been used as the current standard true-to-type method for grapevine, but not without shortcomings, such as the limited number of biomarkers and accessibility of services.
Results: To overcome some of the limitations of traditional microsatellite-marker-based genetic testing, we explored a whole-genome-sequencing (WGS)-based approach to achieve the best accuracy at an affordable cost. To address the challenges of the extreme high dimensionality of the WGS data, we examined the effectiveness of using chaos game representation (CGR) to represent the genome sequence data and using deep learning for species and cultivar identification. CGR images provide a meaningful way to capture patterns for use with visual analysis, with the best results showing a 99% balanced accuracy in classifying five species, and a 80% balanced accuracy in predicting 41 cultivars. Our preliminary research highlights the potential for CGR and deep learning as a complementary tool for WGS-based species- and cultivar-level classification.
Availability and implementation: Our implementation, including the pipeline for data processing and the four predictive models, is available at https://github.com/pliang64/CGR.