Caitlin J. Ross, M. Mubarak, John Jenkins, P. Carns, C. Carothers, R. Ross, Wei Tang, Wolfgang Gerlach, Folker Meyer
{"title":"离散事件仿真提高MG-RAST可扩展性的实例研究","authors":"Caitlin J. Ross, M. Mubarak, John Jenkins, P. Carns, C. Carothers, R. Ross, Wei Tang, Wolfgang Gerlach, Folker Meyer","doi":"10.1145/2901378.2901387","DOIUrl":null,"url":null,"abstract":"As the cost of DNA sequencing has decreased, computational biology data processing platforms are experiencing an increasingly large volume of data analysis requests. The metagenomics analysis server MG-RAST at Argonne National Laboratory, a computational biology data processing platform, is receiving several terabytes of data submissions per month. However, MG-RAST currently relies on a central object-based data store, Shock, for data access and storage that can become a bottleneck under high data transfer loads, adversely affecting the job response time for end users. In this work, we use a discrete-event simulation approach to explore the use of data proxies and an enhanced, proxy-aware scheduling methodology designed to reduce the movement of the intermediate data generated during workflow processing. In this approach, Shock is supplemented with proxy storage servers, employing solid state drives, to decentralize the management and hence reduce the movement of intermediate workflow results. Discrete-event simulation provides a way to evaluate the performance of MG-RAST with increased workloads without disrupting the production system. For our case study, we extrapolate scientific workflows obtained from MG-RAST to represent future usage trends. We demonstrate that the addition of proxies and the proxy-aware scheduling methodology significantly reduces the data movement overhead by distributing the data plane, leading to substantial improvement in end-user job response time.","PeriodicalId":325258,"journal":{"name":"Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"38 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Case Study in Using Discrete-Event Simulation to Improve the Scalability of MG-RAST\",\"authors\":\"Caitlin J. Ross, M. Mubarak, John Jenkins, P. Carns, C. Carothers, R. Ross, Wei Tang, Wolfgang Gerlach, Folker Meyer\",\"doi\":\"10.1145/2901378.2901387\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the cost of DNA sequencing has decreased, computational biology data processing platforms are experiencing an increasingly large volume of data analysis requests. The metagenomics analysis server MG-RAST at Argonne National Laboratory, a computational biology data processing platform, is receiving several terabytes of data submissions per month. However, MG-RAST currently relies on a central object-based data store, Shock, for data access and storage that can become a bottleneck under high data transfer loads, adversely affecting the job response time for end users. In this work, we use a discrete-event simulation approach to explore the use of data proxies and an enhanced, proxy-aware scheduling methodology designed to reduce the movement of the intermediate data generated during workflow processing. In this approach, Shock is supplemented with proxy storage servers, employing solid state drives, to decentralize the management and hence reduce the movement of intermediate workflow results. Discrete-event simulation provides a way to evaluate the performance of MG-RAST with increased workloads without disrupting the production system. For our case study, we extrapolate scientific workflows obtained from MG-RAST to represent future usage trends. We demonstrate that the addition of proxies and the proxy-aware scheduling methodology significantly reduces the data movement overhead by distributing the data plane, leading to substantial improvement in end-user job response time.\",\"PeriodicalId\":325258,\"journal\":{\"name\":\"Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"volume\":\"38 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2901378.2901387\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901378.2901387","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Case Study in Using Discrete-Event Simulation to Improve the Scalability of MG-RAST
As the cost of DNA sequencing has decreased, computational biology data processing platforms are experiencing an increasingly large volume of data analysis requests. The metagenomics analysis server MG-RAST at Argonne National Laboratory, a computational biology data processing platform, is receiving several terabytes of data submissions per month. However, MG-RAST currently relies on a central object-based data store, Shock, for data access and storage that can become a bottleneck under high data transfer loads, adversely affecting the job response time for end users. In this work, we use a discrete-event simulation approach to explore the use of data proxies and an enhanced, proxy-aware scheduling methodology designed to reduce the movement of the intermediate data generated during workflow processing. In this approach, Shock is supplemented with proxy storage servers, employing solid state drives, to decentralize the management and hence reduce the movement of intermediate workflow results. Discrete-event simulation provides a way to evaluate the performance of MG-RAST with increased workloads without disrupting the production system. For our case study, we extrapolate scientific workflows obtained from MG-RAST to represent future usage trends. We demonstrate that the addition of proxies and the proxy-aware scheduling methodology significantly reduces the data movement overhead by distributing the data plane, leading to substantial improvement in end-user job response time.