Detection of Fusion Genes from Human Breast Cancer Cell-Line RNA-Seq Data Using Shifted Short Read Clustering

Yoshiaki Sota, S. Seno, Hironori Shigeta, N. Osato, M. Shimoda, S. Noguchi, H. Matsuda
{"title":"Detection of Fusion Genes from Human Breast Cancer Cell-Line RNA-Seq Data Using Shifted Short Read Clustering","authors":"Yoshiaki Sota, S. Seno, Hironori Shigeta, N. Osato, M. Shimoda, S. Noguchi, H. Matsuda","doi":"10.1109/BIBE.2018.00038","DOIUrl":null,"url":null,"abstract":"Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three cell lines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.","PeriodicalId":127507,"journal":{"name":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2018.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Fusion genes make for one of the mechanisms of tumorigenesis. The identification of fusion genes by RNA-Seq has attracted attention. Various methods for detecting fusion genes have been proposed, but their accuracy is not sufficient. One of the causes of this problem is the relatively short reading length in RNA-Seq data. Therefore, before mapping RNA-Seq data, we proposed a method, which is based on shifted short-read clustering (SSC), to identify shifted reads of the same origin and extend them as representative sequences. As a result, we assumed that the percentage of uniquely mapped reads would be increased, and the detection rates of the fusion genes could be improved. To verify these hypotheses, we applied the SSC method to RNA-Seq data from three cell lines (BT-474, MCF-7, and SKBR-3). When only one base was shifted, the average read lengths of BT-474, MCF-7, and SKBR-3 were extended from 201 to 223 bases (111%), 201 to 214 bases (106%), and 201 to 213 bases (106%), respectively. Furthermore, the effectiveness of the SSC method is demonstrated by comparing the performances of a fusion gene detection tool's results, STAR-Fusion, with and without the SSC method of the reads. The percentage of uniquely mapped reads of BT-474, MCF-7, and SKBR-3 were improved from 88% to 93%, 88% to 94%, and 92% to 95%, respectively. Finally, the fusion gene detection rates of BT-474, MCF-7, and SKBR-3 were increased from 48% to 57%, 49% to 53%, and 50% to 53% respectively. The SSC method is considered to be an effective method not only for improving the percentage of uniquely mapped reads but also for fusion gene detection.
利用移位短读聚类技术检测人乳腺癌细胞系RNA-Seq数据中的融合基因
融合基因是肿瘤发生的机制之一。利用RNA-Seq技术鉴定融合基因已受到广泛关注。人们提出了多种检测融合基因的方法,但它们的准确性都不够。造成这一问题的原因之一是RNA-Seq数据的读取长度相对较短。因此,在绘制RNA-Seq数据之前,我们提出了一种基于移位短读聚类(SSC)的方法来识别同源移位读,并将其扩展为具有代表性的序列。因此,我们认为唯一映射reads的百分比将会增加,融合基因的检出率将会提高。为了验证这些假设,我们将SSC方法应用于三个细胞系(BT-474、MCF-7和SKBR-3)的RNA-Seq数据。仅移位1个碱基时,BT-474、MCF-7和SKBR-3的平均读长分别从201个碱基增加到223个碱基(111%)、201个碱基增加到214个碱基(106%)和201个碱基增加到213个碱基(106%)。此外,通过比较融合基因检测工具STAR-Fusion在使用和不使用reads的SSC方法时的性能,证明了SSC方法的有效性。BT-474、MCF-7和SKBR-3的唯一定位reads的百分比分别从88%提高到93%、88%提高到94%和92%提高到95%。最后,BT-474、MCF-7、SKBR-3的融合基因检出率分别由48%提高到57%、49%提高到53%、50%提高到53%。SSC方法被认为是一种有效的方法,不仅可以提高唯一定位reads的百分比,而且可以用于融合基因的检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信