P. Castro, A. Carmona, A. Perez-Rial, T. Millan, J. Rubio, J. Gil, J. V. Die
{"title":"就参考基因组达成共识:蚕豆案例研究","authors":"P. Castro, A. Carmona, A. Perez-Rial, T. Millan, J. Rubio, J. Gil, J. V. Die","doi":"10.1002/leg3.224","DOIUrl":null,"url":null,"abstract":"<p>Chickpea (<i>Cicer arietinum</i> L.) is the second most important grain legume in the world, grown on about 15 million hectares worldwide. The 1990s marked a significant turning point in genetic research on chickpea. In 1991, researchers at Muenster University unveiled the mRNA sequence responsible for an isoflavone oxidoreductase, which was the first sequence available for this species (X60755; Genbank, NCBI). As the new century unfolded, the nucleotide database accumulated over 265 accessions for chickpea. The availability of these new sequences was closely linked to the development of genetic maps. Throughout the 1990s and early 2000s, numerous studies explored populations resulting from crosses between cultivated <i>C. arietinum</i> and wild-sampled accessions of <i>C. reticulatum</i> and <i>C. echinospermum</i> (Benko-Iseppon et al. <span>2003</span>; Gaur and Slinkard <span>1990</span>; Gaur and Stinkard <span>1990</span>; Kazan et al. <span>1993</span>; Pfaff and Kahl <span>2003</span>; Radhika et al. <span>2007</span>; Rakshit et al. <span>2003</span>; Ratnaparkhe, Tekeoglu, and Muehlbauer <span>1998</span>; Santra et al. <span>2000</span>; Simon and Muehibauer <span>1997</span>; Tekeoglu, Santra, et al. <span>2000</span>; Tekeoglu, Tullu, et al., <span>2000</span>; Tekeoglu, Rajesh, and Muehlbauer <span>2002</span>; Winter et al. <span>1999</span>, <span>2000</span>).</p><p>The following advance in genetic maps was represented by those primarily constructed using narrow crosses, focusing on two distinct chickpea types: “desi” and “kabuli”. Molecular markers had played a crucial role in uncovering that kabuli and desi types possessed contrasting genetic backgrounds (Chowdhury, Vandenberg, and Warkentin <span>2002</span>; Iruela et al. <span>2002</span>). As a result, the majority of genetic maps developed during this period were derived from crosses between kabuli and desi chickpea cultivars (Cho et al. <span>2002</span>; Cho, Chen, and Muehlbauer <span>2004</span>; Cobos et al. <span>2005</span>, <span>2007</span>; Iruela et al. <span>2006</span>, <span>2007</span>; Lichtenzveig et al. <span>2006</span>; Millan et al. <span>2003</span>; Sharma et al. <span>2004</span>; Tar'an et al. <span>2007</span>; Udupa and Baum <span>2003</span>).</p><p>The development of microsatellite markers (SSR) expedited the identification of markers closely linked to traits of interest (Choudhary et al. <span>2006</span>, <span>2009</span>; Hüttel et al. <span>1999</span>; Lichtenzveig et al. <span>2005</span>; Sethy, Choudhary, et al. <span>2006</span>; Sethy, Shokeen, et al. <span>2006</span>; Winter et al. <span>1999</span>). However, the valuable information and resources provided by these maps could only be fully utilized when direct comparisons were made using common SSR markers. Although the marker-linkage group assignments in different populations generally agreed, discrepancies between maps arose due to variations in population type and size, marker density at specific genomic regions of interest and software processing. These discrepancies hindered breeders' ability to select appropriate segregating plant materials containing desirable genes. In 2010, an international consortium of leading researchers constructed a consensus genetic map of chickpea based on multiple populations (Millan et al. <span>2010</span>). This consensus map became a valuable practical tool to assist breeders to accurately select suitable markers tightly linked to agronomically important genomic regions for marker-assisted selection.</p><p>Following the consensus genetic map, the first completion of the chickpea genome sequencing (CDC Frontier, a kabuli type) was released in 2013 (Varshney et al. <span>2013</span>). The scientific breakthrough was announced by political representatives of the Indian government (Varshney <span>2016</span>). Using illumina technology, 87.65-Gb of high-quality sequence data were assembled into 530-Mb of genomic sequence scaffolds representing 74% of 740-Mb chickpea genome. More than 25 K out of 28 K non-redundant predicted gene models could be functionally annotated. Since then, the NCBI assembly GCF_000331145.1 has become the de facto reference genome for the kabuli genotype (Jain et al. <span>2022</span>). In another effort to sequence the chickpea genome, ICC4958 (desi genotype) was targeted for generating a draft genome assembly using NGS platforms along with BAC end sequences and a genetic map (Jain et al. <span>2013</span>). Shortly after, an improved version of cultivar ICC 4958 with 2.7-fold increase in the length of pseudomolecules was reported (Parween et al. <span>2015</span>). The desi assembly is currently available in GenBank under the accession ASM34727v4.</p><p>However, after these important achievements, other improved and curated sequences have become available, but the hosting of these sequences is outside the NCBI reference database. Thus, based on the analysis of recombination patterns, the kabuli genome was improved (Bayer et al. <span>2015</span>) and made available in 2016 in the repository <i>CyVerse Data Commons</i>, (dataset Kabuli_UWA-v.2.6.3; Edwards <span>2016</span>). More recently, using in situ Hi-C data, an improved chromosome-length genome assembly of chickpea was developed. The dataset is available under Cicar.CDCFrontier_v2.0 in the online repository <i>Legumepedia</i> (Garg et al. <span>2022</span>).</p><p>There is no doubt that the availability of several genomes will have a massive impact in a variety of ways, including diversity assessment, genome structure validation, and gene–trait association. One of the overwhelming uses of genomes is in the availability of high-density molecular markers which can be used to quickly map agronomically desirable traits and to identify candidate genes within a region of interest. The issue, however, emerges when different results are shown based solely on the specific genome sequences each research group employs. Variations in the sequence of a genome can lead to substantial differences in the organization of genetic information. For example, GWAS analyses may reveal the existence of SNPs in significant <i>loci</i>. An insightful approach involves examining the gene expression profiles of genes located in the vicinity of these identified <i>loci</i>. Such an analysis can shed light on the functional roles of these genes, demonstrating their potential as modulators of specific traits. However, when variations in a genomic sequence are solely based on different sequence versions, the notion of mapping genes to the vicinity of a <i>locus</i>, the promoter sequences analysis, or the identification of cis-acting variants, may lose its biological relevance, ultimately yielding misleading results (Figure 1).</p><p>In essence, this situation replicates the problem encountered in the 1990s with genetic maps, where findings were not readily transferable between studies. It is not uncommon for reviewers of present-day scientific manuscripts to inquire about the choice of one genome sequence over another. In our latest study, we successfully identified genomic blocks associated with Ascochyta blight resistance utilizing the reference CDC Frontier. During the peer-review process, we were requested to employ a different sequence as the reference, necessitating us to map the markers from one sequence to another (Carmona et al. <span>2023</span>). It is our opinion that the selection of a reference sequence should not be left to the discretion of reviewers, editors, or even researchers. The NCBI Reference Sequence Database offers a comprehensive, integrated, non-redundant, and generally well-annotated collection of reference sequences. Occasionally, even when an improved version is released, raw data supporting the new assemblies may be accessible as a BioProject in NCBI. Moreover, certain repositories hosting the new assemblies provide useful analytical resources such as BLAST, which serves as NCBI's flagship alignment tool. The question is: Would it not be beneficial to have the improved versions of genomes available on NCBI itself? The centralized availability of sequences would provide us with convenient updates on the latest genome versions, eliminating the need to navigate through multiple web pages or become familiar with the local setting tools of individual repositories.</p><p>The selection of the reference is open to discussion. There are several forums where this meeting could take place. Both, the International Legume Society Conference and the International Congress on Legume Genetics and Genomics, with a mission to bring together scientists who work on research aspects of legume biology using genetic and genomic tools, with those working on applied aspects and breeding of crop and pasture species is an excellent opportunity for this debate to reach a conclusion. However, the debate must be addressed by the research community in advance in an open and constructive way. Plant breeding is an ever-evolving scientific field, constantly adapting to the advancements in technology. The emergence of genomics and the availability of genome sequences have proven to be an invaluable asset to plant breeders, empowering them to harness the vast diversity of plant species. A well-established pipeline, supported by widely accepted tools and genome references, enables the development of novel cultivars with enhanced traits. This equips us to effectively tackle both current and future challenges.</p><p><b>P. Castro:</b> Conceptualization; Writing – original draft; Writing – review and editing. <b>A. Carmona:</b> Conceptualization; Formal analysis; Writing – review and editing. <b>A. Perez-Rial:</b> Conceptualization; Writing – review and editing. <b>T. Millan:</b> Conceptualization; Writing – review and editing. <b>J. Rubio:</b> Conceptualization; Funding acquisition; Writing – review and editing. <b>J. Gil:</b> Conceptualization; Writing – review and editing. <b>J. V. Die:</b> Conceptualization; Funding acquisition; Writing – original draft; Writing – review and editing.</p><p>The authors declare no conflicts of interest.</p>","PeriodicalId":17929,"journal":{"name":"Legume Science","volume":"6 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/leg3.224","citationCount":"0","resultStr":"{\"title\":\"Finding Consensus on the Reference Genomes : A Chickpea Case Study\",\"authors\":\"P. Castro, A. Carmona, A. Perez-Rial, T. Millan, J. Rubio, J. Gil, J. V. Die\",\"doi\":\"10.1002/leg3.224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Chickpea (<i>Cicer arietinum</i> L.) is the second most important grain legume in the world, grown on about 15 million hectares worldwide. The 1990s marked a significant turning point in genetic research on chickpea. In 1991, researchers at Muenster University unveiled the mRNA sequence responsible for an isoflavone oxidoreductase, which was the first sequence available for this species (X60755; Genbank, NCBI). As the new century unfolded, the nucleotide database accumulated over 265 accessions for chickpea. The availability of these new sequences was closely linked to the development of genetic maps. Throughout the 1990s and early 2000s, numerous studies explored populations resulting from crosses between cultivated <i>C. arietinum</i> and wild-sampled accessions of <i>C. reticulatum</i> and <i>C. echinospermum</i> (Benko-Iseppon et al. <span>2003</span>; Gaur and Slinkard <span>1990</span>; Gaur and Stinkard <span>1990</span>; Kazan et al. <span>1993</span>; Pfaff and Kahl <span>2003</span>; Radhika et al. <span>2007</span>; Rakshit et al. <span>2003</span>; Ratnaparkhe, Tekeoglu, and Muehlbauer <span>1998</span>; Santra et al. <span>2000</span>; Simon and Muehibauer <span>1997</span>; Tekeoglu, Santra, et al. <span>2000</span>; Tekeoglu, Tullu, et al., <span>2000</span>; Tekeoglu, Rajesh, and Muehlbauer <span>2002</span>; Winter et al. <span>1999</span>, <span>2000</span>).</p><p>The following advance in genetic maps was represented by those primarily constructed using narrow crosses, focusing on two distinct chickpea types: “desi” and “kabuli”. Molecular markers had played a crucial role in uncovering that kabuli and desi types possessed contrasting genetic backgrounds (Chowdhury, Vandenberg, and Warkentin <span>2002</span>; Iruela et al. <span>2002</span>). As a result, the majority of genetic maps developed during this period were derived from crosses between kabuli and desi chickpea cultivars (Cho et al. <span>2002</span>; Cho, Chen, and Muehlbauer <span>2004</span>; Cobos et al. <span>2005</span>, <span>2007</span>; Iruela et al. <span>2006</span>, <span>2007</span>; Lichtenzveig et al. <span>2006</span>; Millan et al. <span>2003</span>; Sharma et al. <span>2004</span>; Tar'an et al. <span>2007</span>; Udupa and Baum <span>2003</span>).</p><p>The development of microsatellite markers (SSR) expedited the identification of markers closely linked to traits of interest (Choudhary et al. <span>2006</span>, <span>2009</span>; Hüttel et al. <span>1999</span>; Lichtenzveig et al. <span>2005</span>; Sethy, Choudhary, et al. <span>2006</span>; Sethy, Shokeen, et al. <span>2006</span>; Winter et al. <span>1999</span>). However, the valuable information and resources provided by these maps could only be fully utilized when direct comparisons were made using common SSR markers. Although the marker-linkage group assignments in different populations generally agreed, discrepancies between maps arose due to variations in population type and size, marker density at specific genomic regions of interest and software processing. These discrepancies hindered breeders' ability to select appropriate segregating plant materials containing desirable genes. In 2010, an international consortium of leading researchers constructed a consensus genetic map of chickpea based on multiple populations (Millan et al. <span>2010</span>). This consensus map became a valuable practical tool to assist breeders to accurately select suitable markers tightly linked to agronomically important genomic regions for marker-assisted selection.</p><p>Following the consensus genetic map, the first completion of the chickpea genome sequencing (CDC Frontier, a kabuli type) was released in 2013 (Varshney et al. <span>2013</span>). The scientific breakthrough was announced by political representatives of the Indian government (Varshney <span>2016</span>). Using illumina technology, 87.65-Gb of high-quality sequence data were assembled into 530-Mb of genomic sequence scaffolds representing 74% of 740-Mb chickpea genome. More than 25 K out of 28 K non-redundant predicted gene models could be functionally annotated. Since then, the NCBI assembly GCF_000331145.1 has become the de facto reference genome for the kabuli genotype (Jain et al. <span>2022</span>). In another effort to sequence the chickpea genome, ICC4958 (desi genotype) was targeted for generating a draft genome assembly using NGS platforms along with BAC end sequences and a genetic map (Jain et al. <span>2013</span>). Shortly after, an improved version of cultivar ICC 4958 with 2.7-fold increase in the length of pseudomolecules was reported (Parween et al. <span>2015</span>). The desi assembly is currently available in GenBank under the accession ASM34727v4.</p><p>However, after these important achievements, other improved and curated sequences have become available, but the hosting of these sequences is outside the NCBI reference database. Thus, based on the analysis of recombination patterns, the kabuli genome was improved (Bayer et al. <span>2015</span>) and made available in 2016 in the repository <i>CyVerse Data Commons</i>, (dataset Kabuli_UWA-v.2.6.3; Edwards <span>2016</span>). More recently, using in situ Hi-C data, an improved chromosome-length genome assembly of chickpea was developed. The dataset is available under Cicar.CDCFrontier_v2.0 in the online repository <i>Legumepedia</i> (Garg et al. <span>2022</span>).</p><p>There is no doubt that the availability of several genomes will have a massive impact in a variety of ways, including diversity assessment, genome structure validation, and gene–trait association. One of the overwhelming uses of genomes is in the availability of high-density molecular markers which can be used to quickly map agronomically desirable traits and to identify candidate genes within a region of interest. The issue, however, emerges when different results are shown based solely on the specific genome sequences each research group employs. Variations in the sequence of a genome can lead to substantial differences in the organization of genetic information. For example, GWAS analyses may reveal the existence of SNPs in significant <i>loci</i>. An insightful approach involves examining the gene expression profiles of genes located in the vicinity of these identified <i>loci</i>. Such an analysis can shed light on the functional roles of these genes, demonstrating their potential as modulators of specific traits. However, when variations in a genomic sequence are solely based on different sequence versions, the notion of mapping genes to the vicinity of a <i>locus</i>, the promoter sequences analysis, or the identification of cis-acting variants, may lose its biological relevance, ultimately yielding misleading results (Figure 1).</p><p>In essence, this situation replicates the problem encountered in the 1990s with genetic maps, where findings were not readily transferable between studies. It is not uncommon for reviewers of present-day scientific manuscripts to inquire about the choice of one genome sequence over another. In our latest study, we successfully identified genomic blocks associated with Ascochyta blight resistance utilizing the reference CDC Frontier. During the peer-review process, we were requested to employ a different sequence as the reference, necessitating us to map the markers from one sequence to another (Carmona et al. <span>2023</span>). It is our opinion that the selection of a reference sequence should not be left to the discretion of reviewers, editors, or even researchers. The NCBI Reference Sequence Database offers a comprehensive, integrated, non-redundant, and generally well-annotated collection of reference sequences. Occasionally, even when an improved version is released, raw data supporting the new assemblies may be accessible as a BioProject in NCBI. Moreover, certain repositories hosting the new assemblies provide useful analytical resources such as BLAST, which serves as NCBI's flagship alignment tool. The question is: Would it not be beneficial to have the improved versions of genomes available on NCBI itself? The centralized availability of sequences would provide us with convenient updates on the latest genome versions, eliminating the need to navigate through multiple web pages or become familiar with the local setting tools of individual repositories.</p><p>The selection of the reference is open to discussion. There are several forums where this meeting could take place. Both, the International Legume Society Conference and the International Congress on Legume Genetics and Genomics, with a mission to bring together scientists who work on research aspects of legume biology using genetic and genomic tools, with those working on applied aspects and breeding of crop and pasture species is an excellent opportunity for this debate to reach a conclusion. However, the debate must be addressed by the research community in advance in an open and constructive way. Plant breeding is an ever-evolving scientific field, constantly adapting to the advancements in technology. The emergence of genomics and the availability of genome sequences have proven to be an invaluable asset to plant breeders, empowering them to harness the vast diversity of plant species. A well-established pipeline, supported by widely accepted tools and genome references, enables the development of novel cultivars with enhanced traits. This equips us to effectively tackle both current and future challenges.</p><p><b>P. Castro:</b> Conceptualization; Writing – original draft; Writing – review and editing. <b>A. Carmona:</b> Conceptualization; Formal analysis; Writing – review and editing. <b>A. Perez-Rial:</b> Conceptualization; Writing – review and editing. <b>T. Millan:</b> Conceptualization; Writing – review and editing. <b>J. Rubio:</b> Conceptualization; Funding acquisition; Writing – review and editing. <b>J. Gil:</b> Conceptualization; Writing – review and editing. <b>J. V. Die:</b> Conceptualization; Funding acquisition; Writing – original draft; Writing – review and editing.</p><p>The authors declare no conflicts of interest.</p>\",\"PeriodicalId\":17929,\"journal\":{\"name\":\"Legume Science\",\"volume\":\"6 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/leg3.224\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Legume Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/leg3.224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Legume Science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/leg3.224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
Finding Consensus on the Reference Genomes : A Chickpea Case Study
Chickpea (Cicer arietinum L.) is the second most important grain legume in the world, grown on about 15 million hectares worldwide. The 1990s marked a significant turning point in genetic research on chickpea. In 1991, researchers at Muenster University unveiled the mRNA sequence responsible for an isoflavone oxidoreductase, which was the first sequence available for this species (X60755; Genbank, NCBI). As the new century unfolded, the nucleotide database accumulated over 265 accessions for chickpea. The availability of these new sequences was closely linked to the development of genetic maps. Throughout the 1990s and early 2000s, numerous studies explored populations resulting from crosses between cultivated C. arietinum and wild-sampled accessions of C. reticulatum and C. echinospermum (Benko-Iseppon et al. 2003; Gaur and Slinkard 1990; Gaur and Stinkard 1990; Kazan et al. 1993; Pfaff and Kahl 2003; Radhika et al. 2007; Rakshit et al. 2003; Ratnaparkhe, Tekeoglu, and Muehlbauer 1998; Santra et al. 2000; Simon and Muehibauer 1997; Tekeoglu, Santra, et al. 2000; Tekeoglu, Tullu, et al., 2000; Tekeoglu, Rajesh, and Muehlbauer 2002; Winter et al. 1999, 2000).
The following advance in genetic maps was represented by those primarily constructed using narrow crosses, focusing on two distinct chickpea types: “desi” and “kabuli”. Molecular markers had played a crucial role in uncovering that kabuli and desi types possessed contrasting genetic backgrounds (Chowdhury, Vandenberg, and Warkentin 2002; Iruela et al. 2002). As a result, the majority of genetic maps developed during this period were derived from crosses between kabuli and desi chickpea cultivars (Cho et al. 2002; Cho, Chen, and Muehlbauer 2004; Cobos et al. 2005, 2007; Iruela et al. 2006, 2007; Lichtenzveig et al. 2006; Millan et al. 2003; Sharma et al. 2004; Tar'an et al. 2007; Udupa and Baum 2003).
The development of microsatellite markers (SSR) expedited the identification of markers closely linked to traits of interest (Choudhary et al. 2006, 2009; Hüttel et al. 1999; Lichtenzveig et al. 2005; Sethy, Choudhary, et al. 2006; Sethy, Shokeen, et al. 2006; Winter et al. 1999). However, the valuable information and resources provided by these maps could only be fully utilized when direct comparisons were made using common SSR markers. Although the marker-linkage group assignments in different populations generally agreed, discrepancies between maps arose due to variations in population type and size, marker density at specific genomic regions of interest and software processing. These discrepancies hindered breeders' ability to select appropriate segregating plant materials containing desirable genes. In 2010, an international consortium of leading researchers constructed a consensus genetic map of chickpea based on multiple populations (Millan et al. 2010). This consensus map became a valuable practical tool to assist breeders to accurately select suitable markers tightly linked to agronomically important genomic regions for marker-assisted selection.
Following the consensus genetic map, the first completion of the chickpea genome sequencing (CDC Frontier, a kabuli type) was released in 2013 (Varshney et al. 2013). The scientific breakthrough was announced by political representatives of the Indian government (Varshney 2016). Using illumina technology, 87.65-Gb of high-quality sequence data were assembled into 530-Mb of genomic sequence scaffolds representing 74% of 740-Mb chickpea genome. More than 25 K out of 28 K non-redundant predicted gene models could be functionally annotated. Since then, the NCBI assembly GCF_000331145.1 has become the de facto reference genome for the kabuli genotype (Jain et al. 2022). In another effort to sequence the chickpea genome, ICC4958 (desi genotype) was targeted for generating a draft genome assembly using NGS platforms along with BAC end sequences and a genetic map (Jain et al. 2013). Shortly after, an improved version of cultivar ICC 4958 with 2.7-fold increase in the length of pseudomolecules was reported (Parween et al. 2015). The desi assembly is currently available in GenBank under the accession ASM34727v4.
However, after these important achievements, other improved and curated sequences have become available, but the hosting of these sequences is outside the NCBI reference database. Thus, based on the analysis of recombination patterns, the kabuli genome was improved (Bayer et al. 2015) and made available in 2016 in the repository CyVerse Data Commons, (dataset Kabuli_UWA-v.2.6.3; Edwards 2016). More recently, using in situ Hi-C data, an improved chromosome-length genome assembly of chickpea was developed. The dataset is available under Cicar.CDCFrontier_v2.0 in the online repository Legumepedia (Garg et al. 2022).
There is no doubt that the availability of several genomes will have a massive impact in a variety of ways, including diversity assessment, genome structure validation, and gene–trait association. One of the overwhelming uses of genomes is in the availability of high-density molecular markers which can be used to quickly map agronomically desirable traits and to identify candidate genes within a region of interest. The issue, however, emerges when different results are shown based solely on the specific genome sequences each research group employs. Variations in the sequence of a genome can lead to substantial differences in the organization of genetic information. For example, GWAS analyses may reveal the existence of SNPs in significant loci. An insightful approach involves examining the gene expression profiles of genes located in the vicinity of these identified loci. Such an analysis can shed light on the functional roles of these genes, demonstrating their potential as modulators of specific traits. However, when variations in a genomic sequence are solely based on different sequence versions, the notion of mapping genes to the vicinity of a locus, the promoter sequences analysis, or the identification of cis-acting variants, may lose its biological relevance, ultimately yielding misleading results (Figure 1).
In essence, this situation replicates the problem encountered in the 1990s with genetic maps, where findings were not readily transferable between studies. It is not uncommon for reviewers of present-day scientific manuscripts to inquire about the choice of one genome sequence over another. In our latest study, we successfully identified genomic blocks associated with Ascochyta blight resistance utilizing the reference CDC Frontier. During the peer-review process, we were requested to employ a different sequence as the reference, necessitating us to map the markers from one sequence to another (Carmona et al. 2023). It is our opinion that the selection of a reference sequence should not be left to the discretion of reviewers, editors, or even researchers. The NCBI Reference Sequence Database offers a comprehensive, integrated, non-redundant, and generally well-annotated collection of reference sequences. Occasionally, even when an improved version is released, raw data supporting the new assemblies may be accessible as a BioProject in NCBI. Moreover, certain repositories hosting the new assemblies provide useful analytical resources such as BLAST, which serves as NCBI's flagship alignment tool. The question is: Would it not be beneficial to have the improved versions of genomes available on NCBI itself? The centralized availability of sequences would provide us with convenient updates on the latest genome versions, eliminating the need to navigate through multiple web pages or become familiar with the local setting tools of individual repositories.
The selection of the reference is open to discussion. There are several forums where this meeting could take place. Both, the International Legume Society Conference and the International Congress on Legume Genetics and Genomics, with a mission to bring together scientists who work on research aspects of legume biology using genetic and genomic tools, with those working on applied aspects and breeding of crop and pasture species is an excellent opportunity for this debate to reach a conclusion. However, the debate must be addressed by the research community in advance in an open and constructive way. Plant breeding is an ever-evolving scientific field, constantly adapting to the advancements in technology. The emergence of genomics and the availability of genome sequences have proven to be an invaluable asset to plant breeders, empowering them to harness the vast diversity of plant species. A well-established pipeline, supported by widely accepted tools and genome references, enables the development of novel cultivars with enhanced traits. This equips us to effectively tackle both current and future challenges.
P. Castro: Conceptualization; Writing – original draft; Writing – review and editing. A. Carmona: Conceptualization; Formal analysis; Writing – review and editing. A. Perez-Rial: Conceptualization; Writing – review and editing. T. Millan: Conceptualization; Writing – review and editing. J. Rubio: Conceptualization; Funding acquisition; Writing – review and editing. J. Gil: Conceptualization; Writing – review and editing. J. V. Die: Conceptualization; Funding acquisition; Writing – original draft; Writing – review and editing.