Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer
{"title":"高效和可扩展的基因组分析工作流程","authors":"Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer","doi":"10.1145/2912152.2912156","DOIUrl":null,"url":null,"abstract":"Recent growth in the volume of DNA sequence data and associated computational costs of extracting meaningful information warrants the need for efficient computational systems at-scale. In this work, we propose the Illinois Genomics Execution Environment (IGen), a framework for efficient and scalable genome analyses. The design philosophy of IGen is based on algorithmic analysis and extensive measurements on compute- and data-intensive genomic analyses workflows (such as variant discovery and genotyping analysis) executed on high-performance and cloud computing infrastructures. IGen leverages the advantages of existing designs and proposes new software improvements to overcome the ine ciencies we observe in our measurements. Based on these composite improvements, we demonstrate that IGen is able to accelerate the alignment from 13.1 hours to 10.8 hours (1.2x) and the variant from 10.1 hours to 1.25 hours (8x) calling on a single node, and its modular design scales e ciently in a parallel computing environment.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Efficient and Scalable Workflows for Genomic Analyses\",\"authors\":\"Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer\",\"doi\":\"10.1145/2912152.2912156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent growth in the volume of DNA sequence data and associated computational costs of extracting meaningful information warrants the need for efficient computational systems at-scale. In this work, we propose the Illinois Genomics Execution Environment (IGen), a framework for efficient and scalable genome analyses. The design philosophy of IGen is based on algorithmic analysis and extensive measurements on compute- and data-intensive genomic analyses workflows (such as variant discovery and genotyping analysis) executed on high-performance and cloud computing infrastructures. IGen leverages the advantages of existing designs and proposes new software improvements to overcome the ine ciencies we observe in our measurements. Based on these composite improvements, we demonstrate that IGen is able to accelerate the alignment from 13.1 hours to 10.8 hours (1.2x) and the variant from 10.1 hours to 1.25 hours (8x) calling on a single node, and its modular design scales e ciently in a parallel computing environment.\",\"PeriodicalId\":443897,\"journal\":{\"name\":\"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2912152.2912156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2912152.2912156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient and Scalable Workflows for Genomic Analyses
Recent growth in the volume of DNA sequence data and associated computational costs of extracting meaningful information warrants the need for efficient computational systems at-scale. In this work, we propose the Illinois Genomics Execution Environment (IGen), a framework for efficient and scalable genome analyses. The design philosophy of IGen is based on algorithmic analysis and extensive measurements on compute- and data-intensive genomic analyses workflows (such as variant discovery and genotyping analysis) executed on high-performance and cloud computing infrastructures. IGen leverages the advantages of existing designs and proposes new software improvements to overcome the ine ciencies we observe in our measurements. Based on these composite improvements, we demonstrate that IGen is able to accelerate the alignment from 13.1 hours to 10.8 hours (1.2x) and the variant from 10.1 hours to 1.25 hours (8x) calling on a single node, and its modular design scales e ciently in a parallel computing environment.