{"title":"基于MapReduce的生物测序质量评估分布式算法","authors":"Jie Yang, Yong Cao, Biao-sheng Huang, Youjie Zhao","doi":"10.1109/ICCC47050.2019.9064159","DOIUrl":null,"url":null,"abstract":"DNA sequencing technology has played an important role on life sciences, especially Illumina’s sequencer. It was used for more and more biological genomic and transcriptomic projects. Faced with the huge amount of biological sequencing data, it is a problem how to assess its quality quickly. In this paper, we developed a distributed algorithm based on MapReduce, which can assess the quality of biological sequencing in parallel. In order to validate the algorithm, different data sizes (1G - 20G) were used to test by different computing nodes (1 - 20) in Hadoop platform. The results show that the parallel efficiency improves continuously following with the increase of data size and computing nodes. And the algorithm has better parallel efficiency when data size and computing nodes greater than 5Gb and 10 processors. This work effectively saves the time of quality assessment of biological sequencing.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"28 1","pages":"188-192"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Ditributed Algorithm for Quality Assessment of Biological Sequencing Based on MapReduce\",\"authors\":\"Jie Yang, Yong Cao, Biao-sheng Huang, Youjie Zhao\",\"doi\":\"10.1109/ICCC47050.2019.9064159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA sequencing technology has played an important role on life sciences, especially Illumina’s sequencer. It was used for more and more biological genomic and transcriptomic projects. Faced with the huge amount of biological sequencing data, it is a problem how to assess its quality quickly. In this paper, we developed a distributed algorithm based on MapReduce, which can assess the quality of biological sequencing in parallel. In order to validate the algorithm, different data sizes (1G - 20G) were used to test by different computing nodes (1 - 20) in Hadoop platform. The results show that the parallel efficiency improves continuously following with the increase of data size and computing nodes. And the algorithm has better parallel efficiency when data size and computing nodes greater than 5Gb and 10 processors. This work effectively saves the time of quality assessment of biological sequencing.\",\"PeriodicalId\":6739,\"journal\":{\"name\":\"2019 IEEE 5th International Conference on Computer and Communications (ICCC)\",\"volume\":\"28 1\",\"pages\":\"188-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 5th International Conference on Computer and Communications (ICCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCC47050.2019.9064159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Ditributed Algorithm for Quality Assessment of Biological Sequencing Based on MapReduce
DNA sequencing technology has played an important role on life sciences, especially Illumina’s sequencer. It was used for more and more biological genomic and transcriptomic projects. Faced with the huge amount of biological sequencing data, it is a problem how to assess its quality quickly. In this paper, we developed a distributed algorithm based on MapReduce, which can assess the quality of biological sequencing in parallel. In order to validate the algorithm, different data sizes (1G - 20G) were used to test by different computing nodes (1 - 20) in Hadoop platform. The results show that the parallel efficiency improves continuously following with the increase of data size and computing nodes. And the algorithm has better parallel efficiency when data size and computing nodes greater than 5Gb and 10 processors. This work effectively saves the time of quality assessment of biological sequencing.