George Teodoro, Tulio Tavares, Renato Ferreira, Tahsin Kurc, Wagner Meira, Dorgival Guedes, Tony Pan, Joel Saltz
{"title":"A Run-time System for Efficient Execution of Scientific Workflows on Distributed Environments.","authors":"George Teodoro, Tulio Tavares, Renato Ferreira, Tahsin Kurc, Wagner Meira, Dorgival Guedes, Tony Pan, Joel Saltz","doi":"10.1007/s10766-007-0068-8","DOIUrl":null,"url":null,"abstract":"<p><p>Scientific workflow systems have been introduced in response to the demand of researchers from several domains of science who need to process and analyze increasingly larger datasets. The design of these systems is largely based on the observation that data analysis applications can be composed as pipelines or networks of computations on data. In this work, we present a runtime support system that is designed to facilitate this type of computation in distributed computing environments. Our system is optimized for data-intensive workflows, in which efficient management and retrieval of data, coordination of data processing and data movement, and check-pointing of intermediate results are critical and challenging issues. Experimental evaluation of our system shows that linear speedups can be achieved for sophisticated applications, which are implemented as a network of multiple data processing components.</p>","PeriodicalId":14313,"journal":{"name":"International Journal of Parallel Programming","volume":"36 2","pages":"250-266"},"PeriodicalIF":0.9000,"publicationDate":"2008-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10766-007-0068-8","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Programming","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10766-007-0068-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 5
Abstract
Scientific workflow systems have been introduced in response to the demand of researchers from several domains of science who need to process and analyze increasingly larger datasets. The design of these systems is largely based on the observation that data analysis applications can be composed as pipelines or networks of computations on data. In this work, we present a runtime support system that is designed to facilitate this type of computation in distributed computing environments. Our system is optimized for data-intensive workflows, in which efficient management and retrieval of data, coordination of data processing and data movement, and check-pointing of intermediate results are critical and challenging issues. Experimental evaluation of our system shows that linear speedups can be achieved for sophisticated applications, which are implemented as a network of multiple data processing components.
期刊介绍:
International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. Such systems are characterized by the coexistence over time of multiple coordinated activities. The journal publishes both original research and survey papers. Fields of interest include: linguistic foundations, conceptual frameworks, high-level languages, evaluation methods, implementation techniques, programming support systems, pragmatic considerations, architectural characteristics, software engineering aspects, advances in parallel algorithms, performance studies, and application studies.