一个快速公正评估自动机处理硬件的框架

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI:10.1109/IISWC.2017.8167767

Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng

{"title":"一个快速公正评估自动机处理硬件的框架","authors":"Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng","doi":"10.1109/IISWC.2017.8167767","DOIUrl":null,"url":null,"abstract":"Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A framework for fast and fair evaluation of automata processing hardware\",\"authors\":\"Xiaodong Yu, Kaixi Hou, Hao Wang, Wu-chun Feng\",\"doi\":\"10.1109/IISWC.2017.8167767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.\",\"PeriodicalId\":110094,\"journal\":{\"name\":\"2017 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2017.8167767\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2017.8167767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

编程美光的自动机处理器(AP)需要在自动机理论和AP体系结构方面的专业知识，因为程序员必须使用低级自动机网络标记语言(ANML)手动操作状态转换元素(STEs)及其转换。当应用程序所需的业务节点超过硬件容量时，需要进行多次重新配置。然而，大多数以前基于AP的设计将数据集大小限制在单个AP板中，并且简单地忽略了重新配置的昂贵开销。这导致AP和其他处理器之间的性能比较不公平。为了解决这个问题，我们提出了一个快速公平评估AP设备的框架。我们的框架提供了一种分层方法，通过用户定义的范例为大型数据集自动生成自动机，并允许使用级联宏来实现高度优化的重新配置。我们强调计算配置时间在整体AP性能中的重要性，这反过来可以更好地了解识别基本硬件特性，特别是对于大规模问题。我们的框架显示，与CPU相比，AP可以实现高达461倍的整体加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A framework for fast and fair evaluation of automata processing hardware

Programming Micron's Automata Processor (AP) requires expertise in both automata theory and the AP architecture, as programmers have to manually manipulate state transition elements (STEs) and their transitions with a low-level Automata Network Markup Language (ANML). When the required STEs of an application exceed the hardware capacity, multiple reconfigurations are needed. However, most previous AP-based designs limit the dataset size to fit into a single AP board and simply neglect the costly overhead of reconfiguration. This results in unfair performance comparisons between the AP and other processors. To address this issue, we propose a framework for the fast and fair evaluation of AP devices. Our framework provides a hierarchical approach that automatically generates automata for large datasets through user-defined paradigms and allows the use of cascadable macros to achieve highly optimized reconfigurations. We highlight the importance of counting the configuration time in the overall AP performance, which in turn, can provide better insight into identifying essential hardware features, specifically for large-scale problem sizes. Our framework shows that the AP can achieve up to 461x overall speedup fairly compared to CPU counterparts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量