An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length

2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE) Pub Date : 2017-10-01 DOI:10.1109/BIBE.2017.00-16

Yu Geng, Zhongmeng Zhao, Xingjian Cui, Rong Zhang, Tian Zheng, Xuanping Zhang, Jiayin Wang

{"title":"An Efficient Algorithm for Identifying Genomic Structural Inversion with Wide-spectrum of Length","authors":"Yu Geng, Zhongmeng Zhao, Xingjian Cui, Rong Zhang, Tian Zheng, Xuanping Zhang, Jiayin Wang","doi":"10.1109/BIBE.2017.00-16","DOIUrl":null,"url":null,"abstract":"Genomic structural inversion is a class of structural variations, and has been widely associated to a series of complex traits and diseases. It has great significance in accurately identifying the inversions from the high-throughput sequencing data for both research and clinical practice. However, detecting inversion is a challenging computational problem. Existing approaches either limit to detect the inversions with specific length intervals or require a significant distribution of the coverage across the candidate interval. In this paper, we propose a novel detection algorithm to accurately identify the inversions with wide-spectrum of length. The proposed algorithm consists of two components: a clustering step and a segmentation and extension step. It first clusters the pair–ended reads to squeeze the candidate intervals. Then, it utilizes the contig assembly strategy to reconstruct the candidate intervals. Meanwhile, a segmentation and extension strategy is implemented. For each candidate interval, a feature vector is calculated, based on the characteristic values. Finally, the algorithm combines the comparison verification results to filter out some potential false positives, and then returns the inversion breakpoints on base-pair resolution. We conduct a series of simulation experiments to verify the performance of proposed algorithm and compare to two very popular approaches, DELLY and Pindel. The results demonstrate that the proposed approach provides better results on handling the inversions with wide-spectrum of length, especially when the inversions with short-to-medium length exist.","PeriodicalId":262603,"journal":{"name":"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2017.00-16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Genomic structural inversion is a class of structural variations, and has been widely associated to a series of complex traits and diseases. It has great significance in accurately identifying the inversions from the high-throughput sequencing data for both research and clinical practice. However, detecting inversion is a challenging computational problem. Existing approaches either limit to detect the inversions with specific length intervals or require a significant distribution of the coverage across the candidate interval. In this paper, we propose a novel detection algorithm to accurately identify the inversions with wide-spectrum of length. The proposed algorithm consists of two components: a clustering step and a segmentation and extension step. It first clusters the pair–ended reads to squeeze the candidate intervals. Then, it utilizes the contig assembly strategy to reconstruct the candidate intervals. Meanwhile, a segmentation and extension strategy is implemented. For each candidate interval, a feature vector is calculated, based on the characteristic values. Finally, the algorithm combines the comparison verification results to filter out some potential false positives, and then returns the inversion breakpoints on base-pair resolution. We conduct a series of simulation experiments to verify the performance of proposed algorithm and compare to two very popular approaches, DELLY and Pindel. The results demonstrate that the proposed approach provides better results on handling the inversions with wide-spectrum of length, especially when the inversions with short-to-medium length exist.

查看原文本刊更多论文

一种宽谱长度基因组结构反转的高效识别算法

基因组结构反转是一类结构变异，与一系列复杂性状和疾病有着广泛的关联。这对于从高通量测序数据中准确识别逆转录具有重要的研究和临床意义。然而，检测反演是一个具有挑战性的计算问题。现有的方法要么限制了对特定长度区间的反转的检测，要么要求在候选区间内的覆盖范围有很大的分布。在本文中，我们提出了一种新的检测算法来准确识别具有宽谱长度的反演。该算法由两个部分组成:聚类步骤和分割扩展步骤。它首先将对端读取聚类以压缩候选间隔。然后，利用组合装配策略重构候选区间。同时，实现了细分和扩展策略。对于每个候选区间，基于特征值计算一个特征向量。最后，结合比较验证结果，过滤掉一些潜在的误报，然后返回碱基对解析的反演断点。我们进行了一系列仿真实验来验证所提出算法的性能，并与两种非常流行的方法DELLY和Pindel进行了比较。结果表明，该方法对宽谱长度反演具有较好的处理效果，特别是在存在中短谱长度反演的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE)

自引率

0.00%

发文量