多面体片段:为处理器阵列象征性地生成代码的有效表示

Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design Pub Date : 2019-10-09 DOI:10.1145/3359986.3361205

Michael Witterauf, Frank Hannig, J. Teich

{"title":"多面体片段:为处理器阵列象征性地生成代码的有效表示","authors":"Michael Witterauf, Frank Hannig, J. Teich","doi":"10.1145/3359986.3361205","DOIUrl":null,"url":null,"abstract":"To leverage the vast parallelism of loops, embedded loop accelerators often take the form of processor arrays with many, but simple processing elements. Each processing element executes a subset of a loop's iterations in parallel using instruction- and datalevel parallelism by tightly scheduling iterations using software pipelining and packing instructions into compact, individual programs. However, loop bounds are often unknown until runtime, which complicates the static generation of programs because they influence each program's control flow. Existing solutions, like generating and storing all possible programs or full just-in-time compilation, are prohibitively expensive, especially in embedded systems. As a remedy, we propose a hybrid approach introducing a tree-like program representation, whose generation front-loads all intractable sub-problems to compile time, and from which all concrete program variants can efficiently be stitched together at runtime. The tree consists of so-called polyhedral fragments that represent concrete program parts and are annotated with iteration-dependent conditions. We show that both this representation is both space- and time-efficient: it requires polynomial space to store---whereas storing all possibly generated programs is non-polynomial---and polynomial time to evaluate---whereas just-in-time compilation requires solving NP-hard problems. In a case study, we show for a representative loop program that using a tree of polyhedral fragments saves 98.88 % of space compared to storing all program variants.","PeriodicalId":331904,"journal":{"name":"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays\",\"authors\":\"Michael Witterauf, Frank Hannig, J. Teich\",\"doi\":\"10.1145/3359986.3361205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To leverage the vast parallelism of loops, embedded loop accelerators often take the form of processor arrays with many, but simple processing elements. Each processing element executes a subset of a loop's iterations in parallel using instruction- and datalevel parallelism by tightly scheduling iterations using software pipelining and packing instructions into compact, individual programs. However, loop bounds are often unknown until runtime, which complicates the static generation of programs because they influence each program's control flow. Existing solutions, like generating and storing all possible programs or full just-in-time compilation, are prohibitively expensive, especially in embedded systems. As a remedy, we propose a hybrid approach introducing a tree-like program representation, whose generation front-loads all intractable sub-problems to compile time, and from which all concrete program variants can efficiently be stitched together at runtime. The tree consists of so-called polyhedral fragments that represent concrete program parts and are annotated with iteration-dependent conditions. We show that both this representation is both space- and time-efficient: it requires polynomial space to store---whereas storing all possibly generated programs is non-polynomial---and polynomial time to evaluate---whereas just-in-time compilation requires solving NP-hard problems. In a case study, we show for a representative loop program that using a tree of polyhedral fragments saves 98.88 % of space compared to storing all program variants.\",\"PeriodicalId\":331904,\"journal\":{\"name\":\"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3359986.3361205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3359986.3361205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了利用循环的巨大并行性，嵌入式循环加速器通常采用具有许多但简单的处理元素的处理器阵列的形式。每个处理元素使用指令级和数据级并行性并行执行循环迭代的一个子集，通过使用软件流水线和将指令打包到紧凑的单个程序中来严格调度迭代。但是，循环边界通常在运行时之前是未知的，这会使程序的静态生成变得复杂，因为它们会影响每个程序的控制流。现有的解决方案，如生成和存储所有可能的程序或完全即时编译，都非常昂贵，特别是在嵌入式系统中。作为补救措施，我们提出了一种混合方法，引入树状程序表示，其生成将所有棘手的子问题预先加载到编译时，并且可以在运行时有效地将所有具体的程序变体拼接在一起。该树由所谓的多面体片段组成，这些片段表示具体的程序部分，并使用依赖迭代的条件进行注释。我们表明，这种表示既节省空间又节省时间:它需要多项式空间来存储——而存储所有可能生成的程序是非多项式的——并且需要多项式时间来评估——而即时编译需要解决np困难问题。在一个案例研究中，我们展示了一个代表性的循环程序，与存储所有程序变体相比，使用多面体片段树节省了98.88%的空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays

To leverage the vast parallelism of loops, embedded loop accelerators often take the form of processor arrays with many, but simple processing elements. Each processing element executes a subset of a loop's iterations in parallel using instruction- and datalevel parallelism by tightly scheduling iterations using software pipelining and packing instructions into compact, individual programs. However, loop bounds are often unknown until runtime, which complicates the static generation of programs because they influence each program's control flow. Existing solutions, like generating and storing all possible programs or full just-in-time compilation, are prohibitively expensive, especially in embedded systems. As a remedy, we propose a hybrid approach introducing a tree-like program representation, whose generation front-loads all intractable sub-problems to compile time, and from which all concrete program variants can efficiently be stitched together at runtime. The tree consists of so-called polyhedral fragments that represent concrete program parts and are annotated with iteration-dependent conditions. We show that both this representation is both space- and time-efficient: it requires polynomial space to store---whereas storing all possibly generated programs is non-polynomial---and polynomial time to evaluate---whereas just-in-time compilation requires solving NP-hard problems. In a case study, we show for a representative loop program that using a tree of polyhedral fragments saves 98.88 % of space compared to storing all program variants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design

自引率

0.00%

发文量