Genome-wide identification of SSR markers from coding regions for endangered Argania spinosa L. skeels and construction of SSR database: AsSSRdb.

IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Karim Rabeh, Najoua Mghazli, Fatima Gaboun, Abdelkarim Filali-Maltouf, Laila Sbabou, Bouchra Belkadi
{"title":"Genome-wide identification of SSR markers from coding regions for endangered Argania spinosa L. skeels and construction of SSR database: AsSSRdb.","authors":"Karim Rabeh, Najoua Mghazli, Fatima Gaboun, Abdelkarim Filali-Maltouf, Laila Sbabou, Bouchra Belkadi","doi":"10.1093/database/baae118","DOIUrl":null,"url":null,"abstract":"<p><p>Microsatellites [simple sequence repeats (SSRs)] are one of the most widely used sources of genetic markers, particularly prevalent in plants. Despite their importance in various applications, a comprehensive genome-wide identification of coding sequence (CDS)-associated SSR markers in the Argania spinosa L. genome has yet to be conducted. In this study, 66 280 CDSs containing 5351 SSRs within 4535 A. spinosa L. CDSs were identified. Among these, tri-nucleotide motifs (58.96%) were the most common, followed by hexa-nucleotide (15.71%) and di-nucleotide motifs (13.32%). The predominant SSR motif in the tri-nucleotide category was AAG (24.4%), while AG (94.1%) was the most abundant among di-nucleotide repeats. Furthermore, the extracted CDSs containing SSRs were subjected to functional annotation; 3396 CDSs (74.88%) exhibited homology with known proteins, 3341 CDSs (73.7%) were assigned Gene Ontology terms, 1004 CDSs were annotated with Enzyme Commission numbers, and 832 (18.3%) were annotated with KEGG pathways. A total of 3475 primer pairs were designed, out of which 3264 were successfully validated in silico against the A. spinosa L. genome, with 99.6% representing high-resolution markers yielding no more than three products. Additionally, the SSR markers demonstrated a low rate of transferability through in-silico verification in two species within the Sapotaceae family. Furthermore, we developed an online database, the \"Argania spinosa L. SSR database: https://as-fmmdb.shinyapps.io/asssrdb/\" (AsSSRdb) to provide access to the CDS-associated SSRs identified in this study. Overall, this research provides valuable marker resources for DNA fingerprinting, genetic studies, and molecular breeding in argan and related species. Database URL: https://as-fmmdb.shinyapps.io/asssrdb/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2024 ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baae118","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Microsatellites [simple sequence repeats (SSRs)] are one of the most widely used sources of genetic markers, particularly prevalent in plants. Despite their importance in various applications, a comprehensive genome-wide identification of coding sequence (CDS)-associated SSR markers in the Argania spinosa L. genome has yet to be conducted. In this study, 66 280 CDSs containing 5351 SSRs within 4535 A. spinosa L. CDSs were identified. Among these, tri-nucleotide motifs (58.96%) were the most common, followed by hexa-nucleotide (15.71%) and di-nucleotide motifs (13.32%). The predominant SSR motif in the tri-nucleotide category was AAG (24.4%), while AG (94.1%) was the most abundant among di-nucleotide repeats. Furthermore, the extracted CDSs containing SSRs were subjected to functional annotation; 3396 CDSs (74.88%) exhibited homology with known proteins, 3341 CDSs (73.7%) were assigned Gene Ontology terms, 1004 CDSs were annotated with Enzyme Commission numbers, and 832 (18.3%) were annotated with KEGG pathways. A total of 3475 primer pairs were designed, out of which 3264 were successfully validated in silico against the A. spinosa L. genome, with 99.6% representing high-resolution markers yielding no more than three products. Additionally, the SSR markers demonstrated a low rate of transferability through in-silico verification in two species within the Sapotaceae family. Furthermore, we developed an online database, the "Argania spinosa L. SSR database: https://as-fmmdb.shinyapps.io/asssrdb/" (AsSSRdb) to provide access to the CDS-associated SSRs identified in this study. Overall, this research provides valuable marker resources for DNA fingerprinting, genetic studies, and molecular breeding in argan and related species. Database URL: https://as-fmmdb.shinyapps.io/asssrdb/.

从濒危刺阿干树(Argania spinosa L. skeels)编码区鉴定全基因组 SSR 标记并构建 SSR 数据库:AsSSRdb.
微卫星[简单序列重复序列(SSR)]是应用最广泛的遗传标记来源之一,在植物中尤其普遍。尽管它们在各种应用中都很重要,但目前尚未对刺阿干树基因组中与编码序列(CDS)相关的 SSR 标记进行全面的全基因组鉴定。本研究在 4535 个 A. spinosa L. CDSs 中鉴定了 66 280 个 CDSs,包含 5351 个 SSR。其中,三核苷酸基团(58.96%)最为常见,其次是六核苷酸基团(15.71%)和二核苷酸基团(13.32%)。三核苷酸类别中最主要的 SSR 主题是 AAG(24.4%),而二核苷酸重复序列中最多的是 AG(94.1%)。此外,还对提取的含有 SSR 的 CDS 进行了功能注释;3396 个 CDS(74.88%)与已知蛋白质具有同源性,3341 个 CDS(73.7%)被赋予了基因本体术语,1004 个 CDS 被注释为酶委员会编号,832 个 CDS(18.3%)被注释为 KEGG 通路。共设计了 3475 对引物,其中 3264 对引物成功通过了刺芹基因组的硅验证,99.6% 的引物为高分辨率标记,产生的产物不超过 3 个。此外,通过对无患子科(Sapotaceae)中两个物种的体内验证,SSR 标记的转移率较低。此外,我们还开发了一个在线数据库,即 "Argania spinosa L. SSR 数据库:https://as-fmmdb.shinyapps.io/asssrdb/"(AsSSRdb),以提供对本研究中发现的 CDS 相关 SSR 的访问。总之,这项研究为坚果及其相关物种的 DNA 指纹测定、遗传研究和分子育种提供了宝贵的标记资源。数据库网址:https://as-fmmdb.shinyapps.io/asssrdb/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Database: The Journal of Biological Databases and Curation
Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
9.00
自引率
3.40%
发文量
100
审稿时长
>12 weeks
期刊介绍: Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信