A Parallel Runtime Framework for Communication Intensive Stream Applications

S. Muralidharan, Kevin Casey, David Gregg
{"title":"A Parallel Runtime Framework for Communication Intensive Stream Applications","authors":"S. Muralidharan, Kevin Casey, David Gregg","doi":"10.1109/TrustCom.2013.142","DOIUrl":null,"url":null,"abstract":"Stream applications are often limited in their performance by their underlying communication system. A typical implementation relies on the operating system to handle the majority of network operations. In such cases, the communication stack, which was not designed to handle tremendous amounts of data, acts as a bottleneck and restricts the performance of the application. In this paper, we propose a parallel runtime framework that integrates the communication operations with stream applications, and provides a common parallel processing engine that can execute both the communication and computation operations in parallel on multicore processors. We place an emphasis on the low-level details required to implement such a framework, but also provide some guidelines on how an application programmer can employ the framework. Our runtime system uses a set of operations represented as filters to perform the relevant computations on the data stream. Filters that handle the application specific operations are categorized as computation filters and those that transform data to and from network devices are classified as communication filters. Computation filters are designed by the user and are specific to the application. Communication filters are provided by the runtime system and are built using system software that allows direct access to network hardware. Such system software allows the network operations to be performed by the runtime system in parallel, leading to better communication performance. Applications that are designed for this framework are built by constructing application specific computation filters and then connecting them to the communication filters provided by the runtime system. This abstracts the low-level programming of network adapters and protocols by the application developer, making it easier to build stream applications that take advantage of the improved communication performance. Moreover, by dynamically replicating and statically scheduling such filters on the given multicore architecture, it is possible for the runtime system to process multiple data streams in parallel. We are able to parallelize stream applications and achieve speedups of more than a factor of eight in all the applications we tested. The results show that our system scales to as many parallel processes as there are cores on our computer, and achieves speedups of more than a factor of ten in some cases compared to sequential implementations.","PeriodicalId":206739,"journal":{"name":"2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom.2013.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stream applications are often limited in their performance by their underlying communication system. A typical implementation relies on the operating system to handle the majority of network operations. In such cases, the communication stack, which was not designed to handle tremendous amounts of data, acts as a bottleneck and restricts the performance of the application. In this paper, we propose a parallel runtime framework that integrates the communication operations with stream applications, and provides a common parallel processing engine that can execute both the communication and computation operations in parallel on multicore processors. We place an emphasis on the low-level details required to implement such a framework, but also provide some guidelines on how an application programmer can employ the framework. Our runtime system uses a set of operations represented as filters to perform the relevant computations on the data stream. Filters that handle the application specific operations are categorized as computation filters and those that transform data to and from network devices are classified as communication filters. Computation filters are designed by the user and are specific to the application. Communication filters are provided by the runtime system and are built using system software that allows direct access to network hardware. Such system software allows the network operations to be performed by the runtime system in parallel, leading to better communication performance. Applications that are designed for this framework are built by constructing application specific computation filters and then connecting them to the communication filters provided by the runtime system. This abstracts the low-level programming of network adapters and protocols by the application developer, making it easier to build stream applications that take advantage of the improved communication performance. Moreover, by dynamically replicating and statically scheduling such filters on the given multicore architecture, it is possible for the runtime system to process multiple data streams in parallel. We are able to parallelize stream applications and achieve speedups of more than a factor of eight in all the applications we tested. The results show that our system scales to as many parallel processes as there are cores on our computer, and achieves speedups of more than a factor of ten in some cases compared to sequential implementations.
面向通信密集型流应用的并行运行时框架
流应用程序的性能通常受到底层通信系统的限制。典型的实现依赖于操作系统来处理大部分网络操作。在这种情况下,通信堆栈不是为处理大量数据而设计的,它会成为瓶颈,限制应用程序的性能。在本文中,我们提出了一个将通信操作与流应用程序集成的并行运行时框架,并提供了一个通用的并行处理引擎,可以在多核处理器上并行执行通信和计算操作。我们将重点放在实现这样一个框架所需的底层细节上,但也提供了一些关于应用程序程序员如何使用该框架的指导方针。我们的运行时系统使用一组表示为过滤器的操作来执行数据流上的相关计算。处理特定于应用程序的操作的过滤器被归类为计算过滤器,而那些在网络设备之间转换数据的过滤器被归类为通信过滤器。计算过滤器由用户设计,并且特定于应用程序。通信过滤器由运行时系统提供,并使用允许直接访问网络硬件的系统软件构建。这种系统软件允许网络操作由运行时系统并行执行,从而获得更好的通信性能。为这个框架设计的应用程序是通过构造特定于应用程序的计算过滤器,然后将它们连接到运行时系统提供的通信过滤器来构建的。这抽象了应用程序开发人员对网络适配器和协议的低级编程,使构建利用改进的通信性能的流应用程序变得更容易。此外,通过在给定的多核架构上动态复制和静态调度这样的过滤器,运行时系统可以并行处理多个数据流。我们能够并行化流应用程序,并在我们测试的所有应用程序中实现超过8倍的速度提升。结果表明,我们的系统可以扩展到与计算机上的核心数量一样多的并行进程,并且在某些情况下,与顺序实现相比,可以实现十倍以上的速度提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信