Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout
{"title":"处理器自定义对工作负载的输入数据集有多敏感?","authors":"Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout","doi":"10.1109/SASP.2011.5941070","DOIUrl":null,"url":null,"abstract":"Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.","PeriodicalId":375788,"journal":{"name":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"How sensitive is processor customization to the workload's input datasets?\",\"authors\":\"Maximilien Breughe, Zheng Li, Yang Chen, Stijn Eyerman, O. Temam, Chengyong Wu, L. Eeckhout\",\"doi\":\"10.1109/SASP.2011.5941070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.\",\"PeriodicalId\":375788,\"journal\":{\"name\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"volume\":\"7 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 9th Symposium on Application Specific Processors (SASP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SASP.2011.5941070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 9th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2011.5941070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
How sensitive is processor customization to the workload's input datasets?
Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to what extent processor customization is sensitive to the training workload's input datasets. Current practice is to consider a single or only a few input datasets per workload during the processor design cycle — the reason being that simulation is prohibitively time-consuming which excludes considering a large number of datasets. This paper addresses this fundamental question, for the first time. In order to perform the large number of runs required to address this question in a reasonable amount of time, we first propose a mechanistic analytical model, built from first principles, that is accurate within 3.6% on average across a broad design space. The analytical model is at least 4 orders of magnitude faster than detailed cycle-accurate simulation for design space exploration. Using the model, we are able to study the sensitivity of a workload's input dataset on the optimum customized processor architecture. Considering MiBench benchmarks and 1000 datasets per benchmark, we conclude that processor customization is largely dataset-insensitive. This has an important implication in practice: a single or only a few datasets are sufficient for determining the optimum processor architecture when designing application-specific processors.