David Perez, Ling-Hong Hung, Sonia Xu, K. Y. Yeung, W. Lloyd
{"title":"RNA测序工作流程的公有云性能变化研究","authors":"David Perez, Ling-Hong Hung, Sonia Xu, K. Y. Yeung, W. Lloyd","doi":"10.1145/3388440.3414859","DOIUrl":null,"url":null,"abstract":"Public Infrastructure-as-a-Service (IaaS) clouds abstract various details regarding the implementation of resources provided to users. For example, users are not informed about the exact physical location of their virtual machines (VMs), the specific hardware used, the number of co-resident VMs they reside with, or the workloads that co-resident VMs are running. Detecting when VMs underperform can help identify resource contention from co-resident VMs to spur their replacement. Resource utilization metrics can be used to help classify performance of runs for use in VM performance model datasets to sample the distribution of performance outcomes in the cloud. VM performance models are key to predicting the cost of bioinformatics analyses in the public cloud. This paper investigates the performance variations of running a RNA sequencing workflow in the public cloud. We examine causes of performance variations including VM provisioning, CPU heterogeneity, and resource contention. We leverage Amazon Elastic Compute Cloud (EC2) placement groups, a feature designed to help influence VM placement to help examine how VM placement impacts performance variations. As a use case, we investigate the performance of a multi-stage bioinformatics RNA sequencing (RNA-seq) analytical workflow consisting of four distinct phases, executing in 90 minutes on average using 8-core public cloud VMs. In addition, we investigate whether Linux resource utilization metrics collected by profiling workflow runs can help identify performance implications.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Investigation on Public Cloud Performance Variation for an RNA Sequencing Workflow\",\"authors\":\"David Perez, Ling-Hong Hung, Sonia Xu, K. Y. Yeung, W. Lloyd\",\"doi\":\"10.1145/3388440.3414859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Public Infrastructure-as-a-Service (IaaS) clouds abstract various details regarding the implementation of resources provided to users. For example, users are not informed about the exact physical location of their virtual machines (VMs), the specific hardware used, the number of co-resident VMs they reside with, or the workloads that co-resident VMs are running. Detecting when VMs underperform can help identify resource contention from co-resident VMs to spur their replacement. Resource utilization metrics can be used to help classify performance of runs for use in VM performance model datasets to sample the distribution of performance outcomes in the cloud. VM performance models are key to predicting the cost of bioinformatics analyses in the public cloud. This paper investigates the performance variations of running a RNA sequencing workflow in the public cloud. We examine causes of performance variations including VM provisioning, CPU heterogeneity, and resource contention. We leverage Amazon Elastic Compute Cloud (EC2) placement groups, a feature designed to help influence VM placement to help examine how VM placement impacts performance variations. As a use case, we investigate the performance of a multi-stage bioinformatics RNA sequencing (RNA-seq) analytical workflow consisting of four distinct phases, executing in 90 minutes on average using 8-core public cloud VMs. In addition, we investigate whether Linux resource utilization metrics collected by profiling workflow runs can help identify performance implications.\",\"PeriodicalId\":411338,\"journal\":{\"name\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3388440.3414859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3414859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Investigation on Public Cloud Performance Variation for an RNA Sequencing Workflow
Public Infrastructure-as-a-Service (IaaS) clouds abstract various details regarding the implementation of resources provided to users. For example, users are not informed about the exact physical location of their virtual machines (VMs), the specific hardware used, the number of co-resident VMs they reside with, or the workloads that co-resident VMs are running. Detecting when VMs underperform can help identify resource contention from co-resident VMs to spur their replacement. Resource utilization metrics can be used to help classify performance of runs for use in VM performance model datasets to sample the distribution of performance outcomes in the cloud. VM performance models are key to predicting the cost of bioinformatics analyses in the public cloud. This paper investigates the performance variations of running a RNA sequencing workflow in the public cloud. We examine causes of performance variations including VM provisioning, CPU heterogeneity, and resource contention. We leverage Amazon Elastic Compute Cloud (EC2) placement groups, a feature designed to help influence VM placement to help examine how VM placement impacts performance variations. As a use case, we investigate the performance of a multi-stage bioinformatics RNA sequencing (RNA-seq) analytical workflow consisting of four distinct phases, executing in 90 minutes on average using 8-core public cloud VMs. In addition, we investigate whether Linux resource utilization metrics collected by profiling workflow runs can help identify performance implications.