{"title":"Discovery of a novel viroid-like circular RNA in colorectal cancer","authors":"Meini Wu, Wenliang Li, Ningzhu Hu, Changning Liu, Jianfang Li, Yanhan Li, Ning Xu, Jiandong Shi, Jing Sun, Jing Li, Yunzhang Hu","doi":"10.1002/cac2.12626","DOIUrl":null,"url":null,"abstract":"<p>Viroids, the smallest known infectious agents, were initially discovered in plants [<span>1</span>], and have caused significant agricultural diseases [<span>2-4</span>]. Recently, viroids have been identified in fungi [<span>5-7</span>] and bacteria [<span>8</span>], but none have been identified in animals. To date, no studies have explored the presence of viroids in colorectal cancer (CRC). We employed a reference-free computational method to identify a novel viroid-like circular RNA (circRNA) in CRC patients. Our study suggests that a broader class of viroids may exist in living systems.</p><p>We utilized a computer algorithm developed by Qingfa Wu's team [<span>9, 10</span>], which is unique in its homology independence. It employs the splitting longer reads into shorter fragments (SLS) technique and progressively filters overlapping small RNAs 2 (PFOR2), facilitating the detection of novel viroid-like circRNAs through deep RNA sequencing. SLS segments long RNA sequences into 21-nt virtual small RNAs, followed by PFOR2 analysis, which retains only 21-nt virtual internal small RNAs (ISRs) for circRNA assembly [<span>9</span>]. Our goal was to employ reference-free computer algorithms to investigate the existence of viroids in CRC tissues, aiming to gain new insights into the pathogenesis and treatment of CRC.</p><p>Through deep sequencing, we compiled whole transcriptome data from 12 clinical pairs of CRC samples (Supplementary Table S1). The data analysis flowchart is depicted in Figure 1A. After excluding low-quality bases and irrelevant reads, clean, high-quality reads were aligned to the human genome (hg38) using Tophat, with matched reads being discarded. The remaining reads were transformed into virtual small RNAs using the SLS program, followed by contig assembly using PFOR2. To enhance computational efficiency, the “step” parameter in the SLS program was set to 8. During this process, over 10,000 contigs were assembled. These contigs were compared to the human genome (hg38) using BLAST, and highly homologous contigs were removed, leaving only nonhomologous ones. After screening, 5,235 contigs remained, with a GC content of 51.35% and a total length of 1,114,562 bp. The length distribution is illustrated in Figure 1B, with most fragment lengths clustered between 100 bp and 200 bp. To more accurately eliminate contigs derived from the human genome, the 5,235 contigs were individually aligned to the human genome under more stringent conditions, resulting in the identification of 130 contigs (Supplementary Table S2). Throughout the experimental design, we systematically discarded sequencing data homologous to the human genome whenever possible and used non-homologous data as the foundational dataset for PFOR2 program operations.</p><p>A total of 130 primer pairs were designed based on these sequences (Supplementary Table S3) and subsequently verified through PCR and Sanger sequencing in CRC tissues, ultimately identifying three contigs. However, only one of the three contigs tested positive in the in situ hybridization analysis using the BaseScope™ Detection Reagent Kit v2-RED, leading to its identification as a novel viroid-like circRNA in CRC patients. We propose naming it CRC-associated viroid (CCAV).</p><p>The nucleotide sequence has been submitted to GenBank, with the GenBank accession number OR538373 (https://www.ncbi.nlm.nih.gov/nuccore/OR538373). <i>CCAV</i> spans 114 nt and has a GC content of 42.11% (Figure 1C). Based on our data processing, we argue that <i>CCAV</i> is not derived from splicing of the human genome. Comparison with genomes of other species revealed no homology. The secondary structure of <i>CCAV</i>, predicted using RNAfold, shows a circular ring, distinct from the rod-like conformation with extensive base-pairing that is characteristic of all known viroids (Figure 1C).</p><p>We utilized the BaseScope™ Detection Reagent Kit v2-RED to examine <i>CCAV</i> levels in fresh CRC pathological sections sourced from the First Affiliated Hospital of Kunming Medical University. The BaseScope™ Probe for <i>CCAV</i> was designed for the target region 61-13 bp. The presence of <i>CCAV</i> was confirmed in fresh pathological sections using in situ hybridization analysis with the BaseScope™ Detection Reagent Kit v2-RED (Figure 1D). As depicted in the figure, the expression of <i>CCAV</i> in the tissue was not highly pronounced.</p><p>In further validation experiments, <i>CCAV</i> was detected in both tumorous and adjacent tissues of 100 CRC patients with stages I, II, III, and IV (Supplementary Table S4). During this step, linear RNAs were pre-digested by RNase R. Compared to adjacent tissues, <i>CCAV</i> exhibited significantly higher expression in cancer tissues (*<i>P</i> < 0.05) (Figure 1E, Supplementary Figure S1).</p><p>Numerous studies have confirmed the association between the occurrence of CRC and bacteria. Therefore, we isolated 36 strains of bacteria from fresh CRC tissues and confirmed the presence of <i>CCAV</i> in these 36 bacteria strains through qPCR and Sanger sequencing (Supplementary Table S5). The results demonstrated the presence of <i>CCAV</i> in all 36 strains of bacteria. Further experiments are required to determine whether bacteria act as carriers or hosts of <i>CCAV</i> and to elucidate the nature of their relationship.</p><p>The <i>CCAV</i> overexpression vector was constructed to further investigate the role of <i>CCAV</i> in CRC. The vector pCE-RB-Mam-NeoR served as the backbone (Supplementary Figure S2). To induce circularization in vivo, side flanking repeat sequences (∼180 bp 5’ flanking and ∼200 bp 3’ flanking sequence) were added to both sides of the <i>CCAV</i> complete sequence (Supplementary Figure S2). We confirmed that the overexpression vector of <i>CCAV</i> could form loops after being transfected into 293T cells (Figure 1F-H). The <i>CCAV</i> expression level in the overexpression vector-transfected group was significantly higher than in the empty vector control group (Figure 1F-G), and the splice junction sequencing was correct (Figure 1H).</p><p>The expression of <i>CCAV</i> was measured at various time points (0h, 1h, 3h, 6h, 9h, 12h, 24h, 36h, and 48h) after transfecting SW480 cells with the <i>CCAV</i> overexpression vector, with peak expression observed at 9h and 12h (<i>*P</i> < 0.05, <i>**P</i> < 0.01) (Figure 1I). Based on these results, SW480 cell samples transfected with the <i>CCAV</i> overexpression vector were collected at 6h, 24h, and 48h for deep sequencing, aiming to further understand the changes in SW480 cells after <i>CCAV</i> infection.</p><p>In this analysis, whole transcriptome sequencing of 21 samples was completed, yielding a total of 321.41 GB of CleanData. The number of differentially expressed genes (DEGs) was 595, 144, and 235 at 6h, 24h, and 48h, respectively (Figure 1J). Differential mRNA Gene Ontology (GO) analysis (Figure 1K) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis (Figure 1L) were performed using the hypergeometric distribution test. Whole transcriptome deep sequencing revealed that DEGs generated at different time points after transfection of SW480 cells with the <i>CCAV</i> overexpression vector were mainly associated with viral infection, cellular immune diseases, and tumors (Figure 1L).</p><p>While substantial experimental evidence is still needed to understand its role in tumor development, this discovery represents the identification of a mammalian viroid for the first time. It changes the traditional understanding that viroids exist only in plants and expands the range of viroids. This breakthrough opens new avenues for investigating human diseases and warrants further exploration.</p><p>Yunzhang Hu, Changning Liu, Meini Wu, and Ningzhu Hu contributed to the study's conceptualization or design. Wenliang Li and Ning Xu enrolled patients, provided study material, and elaborated the clinical information. Meini Wu, Jianfang Li, Yanhan Li, Jing Sun, and Jiandong Shi carried out the experiments and helped collect and assemble the data. Changning Liu and Jing Li performed and interpreted bioinformatic analysis. Meini Wu wrote the manuscript with input from all authors. All authors were involved in the critical review of the manuscript and approved the final version.</p><p>This study was supported by grants from the National Natural Science Foundation of China (No. 31500748) and the CAMS Innovation Fund for Medical Sciences (No. 2017-I2M-3-022).</p><p>The research was approved by the Ethics Committee of the First Affiliated Hospital of Kunming Medical University (2017L3). Informed consent was signed and received from each patient.</p>","PeriodicalId":9495,"journal":{"name":"Cancer Communications","volume":"45 1","pages":"46-50"},"PeriodicalIF":20.1000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758353/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Communications","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cac2.12626","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Viroids, the smallest known infectious agents, were initially discovered in plants [1], and have caused significant agricultural diseases [2-4]. Recently, viroids have been identified in fungi [5-7] and bacteria [8], but none have been identified in animals. To date, no studies have explored the presence of viroids in colorectal cancer (CRC). We employed a reference-free computational method to identify a novel viroid-like circular RNA (circRNA) in CRC patients. Our study suggests that a broader class of viroids may exist in living systems.
We utilized a computer algorithm developed by Qingfa Wu's team [9, 10], which is unique in its homology independence. It employs the splitting longer reads into shorter fragments (SLS) technique and progressively filters overlapping small RNAs 2 (PFOR2), facilitating the detection of novel viroid-like circRNAs through deep RNA sequencing. SLS segments long RNA sequences into 21-nt virtual small RNAs, followed by PFOR2 analysis, which retains only 21-nt virtual internal small RNAs (ISRs) for circRNA assembly [9]. Our goal was to employ reference-free computer algorithms to investigate the existence of viroids in CRC tissues, aiming to gain new insights into the pathogenesis and treatment of CRC.
Through deep sequencing, we compiled whole transcriptome data from 12 clinical pairs of CRC samples (Supplementary Table S1). The data analysis flowchart is depicted in Figure 1A. After excluding low-quality bases and irrelevant reads, clean, high-quality reads were aligned to the human genome (hg38) using Tophat, with matched reads being discarded. The remaining reads were transformed into virtual small RNAs using the SLS program, followed by contig assembly using PFOR2. To enhance computational efficiency, the “step” parameter in the SLS program was set to 8. During this process, over 10,000 contigs were assembled. These contigs were compared to the human genome (hg38) using BLAST, and highly homologous contigs were removed, leaving only nonhomologous ones. After screening, 5,235 contigs remained, with a GC content of 51.35% and a total length of 1,114,562 bp. The length distribution is illustrated in Figure 1B, with most fragment lengths clustered between 100 bp and 200 bp. To more accurately eliminate contigs derived from the human genome, the 5,235 contigs were individually aligned to the human genome under more stringent conditions, resulting in the identification of 130 contigs (Supplementary Table S2). Throughout the experimental design, we systematically discarded sequencing data homologous to the human genome whenever possible and used non-homologous data as the foundational dataset for PFOR2 program operations.
A total of 130 primer pairs were designed based on these sequences (Supplementary Table S3) and subsequently verified through PCR and Sanger sequencing in CRC tissues, ultimately identifying three contigs. However, only one of the three contigs tested positive in the in situ hybridization analysis using the BaseScope™ Detection Reagent Kit v2-RED, leading to its identification as a novel viroid-like circRNA in CRC patients. We propose naming it CRC-associated viroid (CCAV).
The nucleotide sequence has been submitted to GenBank, with the GenBank accession number OR538373 (https://www.ncbi.nlm.nih.gov/nuccore/OR538373). CCAV spans 114 nt and has a GC content of 42.11% (Figure 1C). Based on our data processing, we argue that CCAV is not derived from splicing of the human genome. Comparison with genomes of other species revealed no homology. The secondary structure of CCAV, predicted using RNAfold, shows a circular ring, distinct from the rod-like conformation with extensive base-pairing that is characteristic of all known viroids (Figure 1C).
We utilized the BaseScope™ Detection Reagent Kit v2-RED to examine CCAV levels in fresh CRC pathological sections sourced from the First Affiliated Hospital of Kunming Medical University. The BaseScope™ Probe for CCAV was designed for the target region 61-13 bp. The presence of CCAV was confirmed in fresh pathological sections using in situ hybridization analysis with the BaseScope™ Detection Reagent Kit v2-RED (Figure 1D). As depicted in the figure, the expression of CCAV in the tissue was not highly pronounced.
In further validation experiments, CCAV was detected in both tumorous and adjacent tissues of 100 CRC patients with stages I, II, III, and IV (Supplementary Table S4). During this step, linear RNAs were pre-digested by RNase R. Compared to adjacent tissues, CCAV exhibited significantly higher expression in cancer tissues (*P < 0.05) (Figure 1E, Supplementary Figure S1).
Numerous studies have confirmed the association between the occurrence of CRC and bacteria. Therefore, we isolated 36 strains of bacteria from fresh CRC tissues and confirmed the presence of CCAV in these 36 bacteria strains through qPCR and Sanger sequencing (Supplementary Table S5). The results demonstrated the presence of CCAV in all 36 strains of bacteria. Further experiments are required to determine whether bacteria act as carriers or hosts of CCAV and to elucidate the nature of their relationship.
The CCAV overexpression vector was constructed to further investigate the role of CCAV in CRC. The vector pCE-RB-Mam-NeoR served as the backbone (Supplementary Figure S2). To induce circularization in vivo, side flanking repeat sequences (∼180 bp 5’ flanking and ∼200 bp 3’ flanking sequence) were added to both sides of the CCAV complete sequence (Supplementary Figure S2). We confirmed that the overexpression vector of CCAV could form loops after being transfected into 293T cells (Figure 1F-H). The CCAV expression level in the overexpression vector-transfected group was significantly higher than in the empty vector control group (Figure 1F-G), and the splice junction sequencing was correct (Figure 1H).
The expression of CCAV was measured at various time points (0h, 1h, 3h, 6h, 9h, 12h, 24h, 36h, and 48h) after transfecting SW480 cells with the CCAV overexpression vector, with peak expression observed at 9h and 12h (*P < 0.05, **P < 0.01) (Figure 1I). Based on these results, SW480 cell samples transfected with the CCAV overexpression vector were collected at 6h, 24h, and 48h for deep sequencing, aiming to further understand the changes in SW480 cells after CCAV infection.
In this analysis, whole transcriptome sequencing of 21 samples was completed, yielding a total of 321.41 GB of CleanData. The number of differentially expressed genes (DEGs) was 595, 144, and 235 at 6h, 24h, and 48h, respectively (Figure 1J). Differential mRNA Gene Ontology (GO) analysis (Figure 1K) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis (Figure 1L) were performed using the hypergeometric distribution test. Whole transcriptome deep sequencing revealed that DEGs generated at different time points after transfection of SW480 cells with the CCAV overexpression vector were mainly associated with viral infection, cellular immune diseases, and tumors (Figure 1L).
While substantial experimental evidence is still needed to understand its role in tumor development, this discovery represents the identification of a mammalian viroid for the first time. It changes the traditional understanding that viroids exist only in plants and expands the range of viroids. This breakthrough opens new avenues for investigating human diseases and warrants further exploration.
Yunzhang Hu, Changning Liu, Meini Wu, and Ningzhu Hu contributed to the study's conceptualization or design. Wenliang Li and Ning Xu enrolled patients, provided study material, and elaborated the clinical information. Meini Wu, Jianfang Li, Yanhan Li, Jing Sun, and Jiandong Shi carried out the experiments and helped collect and assemble the data. Changning Liu and Jing Li performed and interpreted bioinformatic analysis. Meini Wu wrote the manuscript with input from all authors. All authors were involved in the critical review of the manuscript and approved the final version.
This study was supported by grants from the National Natural Science Foundation of China (No. 31500748) and the CAMS Innovation Fund for Medical Sciences (No. 2017-I2M-3-022).
The research was approved by the Ethics Committee of the First Affiliated Hospital of Kunming Medical University (2017L3). Informed consent was signed and received from each patient.
期刊介绍:
Cancer Communications is an open access, peer-reviewed online journal that encompasses basic, clinical, and translational cancer research. The journal welcomes submissions concerning clinical trials, epidemiology, molecular and cellular biology, and genetics.