{"title":"Approximate computing for multithreaded programs in shared memory architectures","authors":"Bernard Nongpoh, Rajarshi Ray, A. Banerjee","doi":"10.1145/3359986.3361209","DOIUrl":null,"url":null,"abstract":"In multicore and multicached architectures, cache coherence is ensured with a coherence protocol. However, the performance benefits of caching diminishes due to the cost associated with the protocol implementation. In this paper, we propose a novel technique to improve the performance of multithreaded programs running on shared-memory multicore processors by embracing approximate computing. Our idea is to relax the coherence requirement selectively in order to reduce the cost associated with a cache-coherence protocol, and at the same time, ensure a bounded QoS degradation with probabilistic reliability. In particular, we detect instructions in a multithreaded program that write to shared data, we call them Shared-Write-Access-Points (SWAPs), and propose an automated statistical analysis to identify those which can tolerate coherence faults. We call such SWAPs approximable. Our experiments on 9 applications from the SPLASH 3.0 benchmarks suite reveal that an average of 57% of the tested SWAPs are approximable. To leverage this observation, we propose an adapted cache-coherence protocol that relaxes the coherence requirement on stores from approximable SWAPs. Additionally, our protocol uses stale values for load misses due to coherence, the stale value being the version at the time of invalidation. We observe an average of 15% reduction in CPU cycles and 11% reduction in energy footprint from architectural simulation of the 9 applications using our approximate execution scheme.","PeriodicalId":331904,"journal":{"name":"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3359986.3361209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In multicore and multicached architectures, cache coherence is ensured with a coherence protocol. However, the performance benefits of caching diminishes due to the cost associated with the protocol implementation. In this paper, we propose a novel technique to improve the performance of multithreaded programs running on shared-memory multicore processors by embracing approximate computing. Our idea is to relax the coherence requirement selectively in order to reduce the cost associated with a cache-coherence protocol, and at the same time, ensure a bounded QoS degradation with probabilistic reliability. In particular, we detect instructions in a multithreaded program that write to shared data, we call them Shared-Write-Access-Points (SWAPs), and propose an automated statistical analysis to identify those which can tolerate coherence faults. We call such SWAPs approximable. Our experiments on 9 applications from the SPLASH 3.0 benchmarks suite reveal that an average of 57% of the tested SWAPs are approximable. To leverage this observation, we propose an adapted cache-coherence protocol that relaxes the coherence requirement on stores from approximable SWAPs. Additionally, our protocol uses stale values for load misses due to coherence, the stale value being the version at the time of invalidation. We observe an average of 15% reduction in CPU cycles and 11% reduction in energy footprint from architectural simulation of the 9 applications using our approximate execution scheme.