{"title":"The case for bovine pangenome","authors":"Wai Yee Low","doi":"10.1002/aro2.86","DOIUrl":null,"url":null,"abstract":"<p>The single reference genome assembly approach has been shown to be insufficient for capturing the full spectrum of genetic variation. This inadequacy has been well-documented in human genomics [<span>1</span>] and the solution is to create a pangenome reference. A pangenome reference is a comprehensive genomic representation that captures the full genetic diversity within a species by incorporating multiple individual genomes. In agricultural genomics, the creation of a bovine pangenome is important for designing or selecting animal genomes that are better adapted to climate change, capable of reducing methane emissions, and conducive to producing healthy food for a growing global population. The Bovine Pangenome Consortium (BPC) [<span>2</span>], which has over 60 members spread across 20 countries, has been established to coordinate global efforts in this area. At present, the BPC has collected more than 100 long-read-based genome assemblies representing ∼60 unique breeds/species. The primary goal is to construct a pangenome to enable accurate detection of genetic variation, which includes single nucleotide polymorphisms (SNPs) and structural variants (SVs) in bovine species especially cattle.</p><p>The BPC uses collaborative open science model and requires samples and expertise from multiple laboratories worldwide. The project focuses on global cattle breeds, including both taurine and indicine subspecies. Beyond cattle, the BPC aims to include other members of the Bovini tribe, such as water buffalo, yak, and bison, in the pangenome. In the case of water buffalo, there is a plan for a pangenome specific for the species as part of the 1000 Buffalo Genomes Project [<span>17</span>]. The inclusion of bovine species other than cattle will facilitate comparative genomic analysis and enhance the understanding of evolutionary processes and potential introgression events [<span>3</span>].</p><p>Current genetic variant detection tools are highly sensitive to the quality and representation of reference genomes, often resulting in reference bias [<span>4</span>]. Identification of SVs and copy number variants is sensitive to the specific reference genome chosen [<span>5</span>]. Detection of epigenetic markers such as DNA methylation is also sensitive to the choice of reference genome [<span>6</span>]. It is expected that in highly polymorphic and repetitive sequences, such as the major histocompatibility complex region [<span>7</span>], a single linear reference is problematic to represent the genetic variants at this locus. These issues are some of the reasons why the BPC was formed to create bovine pangenome to improve the accuracy of genetic analyses.</p><p>Building pangenome graphs can be computationally challenging, especially when the number of genomes being included is high (e.g., >100), and hence determining the best way to construct these references is crucial. There are at least three main methods to build a pangenome: reference-guided [<span>8</span>], assembling reads that failed to align to reference [<span>9</span>], and multiple whole genome assemblies [<span>10, 11</span>]. Besides the pangenome construction methods, researchers should also consider what assemblies to include. Some factors to consider are how many individuals per breed, whether haplotype-resolved assembly is necessary, and minimum assembly metrics such as quality value, contig N50, and BUSCO score. While some researchers have used short reads [<span>9</span>] to create a cattle pangenome, most of the recent efforts have been on using long reads [<span>12</span>] to construct the pangenome. It is interesting to note that a study has shown that the representation of SVs in a bovine pangenome is consistent regardless of the sequencing platform, choice of assembler, or sequencing coverage [<span>12</span>]. An area that has been less investigated is the potential impact of including telomere-to-telomere (T2T) chromosomes as input for pangenome construction. It is expected that centromeric sequences will be better represented by using T2T chromosomes; however, these sequences are hard to map and centromeres are prone to mis-assemblies [<span>13</span>].</p><p>Some cattle breeds or bovine species are common in low-income countries and the availability of these reference genomes will help researchers from these countries to apply genomics tools to improve these animals. For example, the swamp-buffalo genome reference is a part of the BPC, and this species is important to the economies in Asia such as the Philippines [<span>14</span>]. Some cattle breeds or bovine species to be included in the BPC are rare. Genomic analysis of rare breeds or species can aid in preserving disappearing or threatened animals by documenting their unique genetic contributions [<span>15</span>]. Comparing bovine genome assemblies will provide insights into conserved loci underlying phenotypic diversity and help understand domestication signals [<span>16</span>].</p><p>Enhancing the pangenome reference has significant economic implications, as it influences the accuracy of all genetic variant calls made using it. For instance, bovine species such as cattle contribute to a multi-billion dollar export industry in Australia. Even a minor improvement in genomic prediction accuracy of production traits such as marbling lead to considerable economic gains.</p><p>The construction of a bovine pangenome is motivated by the need to improve SNP and SV calling accuracy. A well-constructed pangenome should replace breed-specific assemblies and facilitate multibreed comparisons, ultimately advancing both agricultural and evolutionary genomics research.</p><p><b>Wai Yee Low</b>: Conceptualization; funding acquisition; writing—review & editing; writing—original draft.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":100086,"journal":{"name":"Animal Research and One Health","volume":"2 4","pages":"363-365"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aro2.86","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animal Research and One Health","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aro2.86","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The single reference genome assembly approach has been shown to be insufficient for capturing the full spectrum of genetic variation. This inadequacy has been well-documented in human genomics [1] and the solution is to create a pangenome reference. A pangenome reference is a comprehensive genomic representation that captures the full genetic diversity within a species by incorporating multiple individual genomes. In agricultural genomics, the creation of a bovine pangenome is important for designing or selecting animal genomes that are better adapted to climate change, capable of reducing methane emissions, and conducive to producing healthy food for a growing global population. The Bovine Pangenome Consortium (BPC) [2], which has over 60 members spread across 20 countries, has been established to coordinate global efforts in this area. At present, the BPC has collected more than 100 long-read-based genome assemblies representing ∼60 unique breeds/species. The primary goal is to construct a pangenome to enable accurate detection of genetic variation, which includes single nucleotide polymorphisms (SNPs) and structural variants (SVs) in bovine species especially cattle.
The BPC uses collaborative open science model and requires samples and expertise from multiple laboratories worldwide. The project focuses on global cattle breeds, including both taurine and indicine subspecies. Beyond cattle, the BPC aims to include other members of the Bovini tribe, such as water buffalo, yak, and bison, in the pangenome. In the case of water buffalo, there is a plan for a pangenome specific for the species as part of the 1000 Buffalo Genomes Project [17]. The inclusion of bovine species other than cattle will facilitate comparative genomic analysis and enhance the understanding of evolutionary processes and potential introgression events [3].
Current genetic variant detection tools are highly sensitive to the quality and representation of reference genomes, often resulting in reference bias [4]. Identification of SVs and copy number variants is sensitive to the specific reference genome chosen [5]. Detection of epigenetic markers such as DNA methylation is also sensitive to the choice of reference genome [6]. It is expected that in highly polymorphic and repetitive sequences, such as the major histocompatibility complex region [7], a single linear reference is problematic to represent the genetic variants at this locus. These issues are some of the reasons why the BPC was formed to create bovine pangenome to improve the accuracy of genetic analyses.
Building pangenome graphs can be computationally challenging, especially when the number of genomes being included is high (e.g., >100), and hence determining the best way to construct these references is crucial. There are at least three main methods to build a pangenome: reference-guided [8], assembling reads that failed to align to reference [9], and multiple whole genome assemblies [10, 11]. Besides the pangenome construction methods, researchers should also consider what assemblies to include. Some factors to consider are how many individuals per breed, whether haplotype-resolved assembly is necessary, and minimum assembly metrics such as quality value, contig N50, and BUSCO score. While some researchers have used short reads [9] to create a cattle pangenome, most of the recent efforts have been on using long reads [12] to construct the pangenome. It is interesting to note that a study has shown that the representation of SVs in a bovine pangenome is consistent regardless of the sequencing platform, choice of assembler, or sequencing coverage [12]. An area that has been less investigated is the potential impact of including telomere-to-telomere (T2T) chromosomes as input for pangenome construction. It is expected that centromeric sequences will be better represented by using T2T chromosomes; however, these sequences are hard to map and centromeres are prone to mis-assemblies [13].
Some cattle breeds or bovine species are common in low-income countries and the availability of these reference genomes will help researchers from these countries to apply genomics tools to improve these animals. For example, the swamp-buffalo genome reference is a part of the BPC, and this species is important to the economies in Asia such as the Philippines [14]. Some cattle breeds or bovine species to be included in the BPC are rare. Genomic analysis of rare breeds or species can aid in preserving disappearing or threatened animals by documenting their unique genetic contributions [15]. Comparing bovine genome assemblies will provide insights into conserved loci underlying phenotypic diversity and help understand domestication signals [16].
Enhancing the pangenome reference has significant economic implications, as it influences the accuracy of all genetic variant calls made using it. For instance, bovine species such as cattle contribute to a multi-billion dollar export industry in Australia. Even a minor improvement in genomic prediction accuracy of production traits such as marbling lead to considerable economic gains.
The construction of a bovine pangenome is motivated by the need to improve SNP and SV calling accuracy. A well-constructed pangenome should replace breed-specific assemblies and facilitate multibreed comparisons, ultimately advancing both agricultural and evolutionary genomics research.