Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang
{"title":"Orthoptera-TElib:用于 TE 注释的 Orthoptera 转座元件库。","authors":"Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang","doi":"10.1186/s13100-024-00316-x","DOIUrl":null,"url":null,"abstract":"<p><p>Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of \"level1/level2-level3\". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.</p>","PeriodicalId":18854,"journal":{"name":"Mobile DNA","volume":"15 1","pages":"5"},"PeriodicalIF":4.7000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941475/pdf/","citationCount":"0","resultStr":"{\"title\":\"Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.\",\"authors\":\"Xuanzeng Liu, Lina Zhao, Muhammad Majid, Yuan Huang\",\"doi\":\"10.1186/s13100-024-00316-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of \\\"level1/level2-level3\\\". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.</p>\",\"PeriodicalId\":18854,\"journal\":{\"name\":\"Mobile DNA\",\"volume\":\"15 1\",\"pages\":\"5\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941475/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mobile DNA\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13100-024-00316-x\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mobile DNA","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13100-024-00316-x","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
可转座元件(TE)是真核生物基因组的主要组成部分,几乎存在于所有真核生物中。在物种之间和物种内部,可转座元具有高度的动态性,这极大地影响了可转座元数据库的普遍适用性。直翅目是昆虫类中唯一一个已知基因组显著增大(0.93-21.48 Gb)的类群。使用现有的 TE 公共数据库分析庞大的基因组时,TE 注释的效率并不理想。为了解决这一局限性,当务之急是不断更新现有的 TE 资源库,并且随着更多昆虫基因组的公开,有必要建立一个直翅目昆虫特异性资源库。在这里,我们利用 12 个直翅目物种的全基因组数据对 TE 进行了全新注释,然后对未分类的 TE 进行了人工再注释,从而构建了一个非冗余的直翅目特异性 TE 库:Orthoptera-TElib。Orthoptera-TElib 包含 24,021 个 TE 条目,其中包括 13,964 个未知 TE 的重新标注结果。Orthoptera-TElib 中 TE 条目的命名采用了与 RepeatMasker 和 Dfam 相同的命名方式,并以 "level1/level2-level3 "的三级形式编码。Orthoptera-TElib 可直接用作输入参考数据库,与 RepeatMasker 和 dnaPipeTE 等主流重复序列分析软件兼容。在分析直翅目物种的 TE 时,无论使用低覆盖率测序数据还是基因组组装数据,Orthoptera-TElib 的 TE 注释效果都优于 Dfam 和 Repbase。TE注释结果改善最大的物种是蝼蛄(Angaracris rhodopa),从占基因组的 7.89% 增加到 53.28%。最后,Orthoptera-TElib 存储在 Sqlite3 中,以方便数据更新和用户访问。
Orthoptera-TElib: a library of Orthoptera transposable elements for TE annotation.
Transposable elements (TEs) are a major component of eukaryotic genomes and are present in almost all eukaryotic organisms. TEs are highly dynamic between and within species, which significantly affects the general applicability of the TE databases. Orthoptera is the only known group in the class Insecta with a significantly enlarged genome (0.93-21.48 Gb). When analyzing the large genome using the existing TE public database, the efficiency of TE annotation is not satisfactory. To address this limitation, it becomes imperative to continually update the available TE resource library and the need for an Orthoptera-specific library as more insect genomes are publicly available. Here, we used the complete genome data of 12 Orthoptera species to de novo annotate TEs, then manually re-annotate the unclassified TEs to construct a non-redundant Orthoptera-specific TE library: Orthoptera-TElib. Orthoptera-TElib contains 24,021 TE entries including the re-annotated results of 13,964 unknown TEs. The naming of TE entries in Orthoptera-TElib adopts the same naming as RepeatMasker and Dfam and is encoded as the three-level form of "level1/level2-level3". Orthoptera-TElib can be directly used as an input reference database and is compatible with mainstream repetitive sequence analysis software such as RepeatMasker and dnaPipeTE. When analyzing TEs of Orthoptera species, Orthoptera-TElib performs better TE annotation as compared to Dfam and Repbase regardless of using low-coverage sequencing or genome assembly data. The most improved TE annotation result is Angaracris rhodopa, which has increased from 7.89% of the genome to 53.28%. Finally, Orthoptera-TElib is stored in Sqlite3 for the convenience of data updates and user access.
期刊介绍:
Mobile DNA is an online, peer-reviewed, open access journal that publishes articles providing novel insights into DNA rearrangements in all organisms, ranging from transposition and other types of recombination mechanisms to patterns and processes of mobile element and host genome evolution. In addition, the journal will consider articles on the utility of mobile genetic elements in biotechnological methods and protocols.