Hyun-Hwa Choi, Byoung-Seob Kim, Shinyoung Ahn, Seung-Jo Bae
{"title":"A workflow for parallel and distributed computing of large-scale genomic data","authors":"Hyun-Hwa Choi, Byoung-Seob Kim, Shinyoung Ahn, Seung-Jo Bae","doi":"10.1109/ICITST.2013.6750194","DOIUrl":null,"url":null,"abstract":"Workflow management systems are emerging as dominant solution in bioinformatics because they enable researchers to analyze the huge amount of data generated by modern laboratory equipment. The growth of genomic data generated by next generation sequencing (NGS) results in an increasing need to analyze data on distributed computer clusters. In this paper, we construct a semi-automated workflow system for the analysis of large-scale sequence data sets, describe a pipeline designed with parallel computation to perform the optimal computational steps required to analyze whole genome sequence data, and report the overall execution time of the pipeline using cores on multiple machines.","PeriodicalId":246884,"journal":{"name":"8th International Conference for Internet Technology and Secured Transactions (ICITST-2013)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"8th International Conference for Internet Technology and Secured Transactions (ICITST-2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITST.2013.6750194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Workflow management systems are emerging as dominant solution in bioinformatics because they enable researchers to analyze the huge amount of data generated by modern laboratory equipment. The growth of genomic data generated by next generation sequencing (NGS) results in an increasing need to analyze data on distributed computer clusters. In this paper, we construct a semi-automated workflow system for the analysis of large-scale sequence data sets, describe a pipeline designed with parallel computation to perform the optimal computational steps required to analyze whole genome sequence data, and report the overall execution time of the pipeline using cores on multiple machines.