Robin Jugas, Martin Vítek, K. Sedlář, Helena Skutková
{"title":"Cross-correlation based detection of contigs overlaps","authors":"Robin Jugas, Martin Vítek, K. Sedlář, Helena Skutková","doi":"10.23919/MIPRO.2018.8400030","DOIUrl":null,"url":null,"abstract":"Increasing demand for genomic data stress the development of new sequencing techniques and assembly methods. While the sequencing techniques are the biologist domain, the genome assembly is bioinformatical task and development of new assembly algorithms responds to the new sequencing methods. The final part of the assembly process is merging the contigs and find their position in the genome. Contigs are almost the final product but they can contain errors and features induced by previous assembly process. The current methods use string algorithms based on dynamic programming computing with characters (A, C, G, T) representing nucleotides, but if applied to long sequences, e.g. contigs, they tend to be time-consuming. We applied another approach based on genomic signal processing to evaluate the further merging and overlaps between the contigs. The genomic signal form of DNA sequence can reveal hidden features of sequences and digital signal processing methods can be applied. Also, the computational complexity of task can be reduced by implementing massive downsampling. We use our own implementation of cross-correlation based on Pearson correlation coefficient to detect possible overlaps between contigs, when high positive correlation indicates possible shared regions of the contigs but also to denote the position of that region, without the alignment.","PeriodicalId":431110,"journal":{"name":"2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MIPRO.2018.8400030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Increasing demand for genomic data stress the development of new sequencing techniques and assembly methods. While the sequencing techniques are the biologist domain, the genome assembly is bioinformatical task and development of new assembly algorithms responds to the new sequencing methods. The final part of the assembly process is merging the contigs and find their position in the genome. Contigs are almost the final product but they can contain errors and features induced by previous assembly process. The current methods use string algorithms based on dynamic programming computing with characters (A, C, G, T) representing nucleotides, but if applied to long sequences, e.g. contigs, they tend to be time-consuming. We applied another approach based on genomic signal processing to evaluate the further merging and overlaps between the contigs. The genomic signal form of DNA sequence can reveal hidden features of sequences and digital signal processing methods can be applied. Also, the computational complexity of task can be reduced by implementing massive downsampling. We use our own implementation of cross-correlation based on Pearson correlation coefficient to detect possible overlaps between contigs, when high positive correlation indicates possible shared regions of the contigs but also to denote the position of that region, without the alignment.