利用近似值局部性实现多核处理器上的数据同步

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI:10.1109/IISWC.2010.5650333

Jaswanth Sreeram, S. Pande

{"title":"利用近似值局部性实现多核处理器上的数据同步","authors":"Jaswanth Sreeram, S. Pande","doi":"10.1109/IISWC.2010.5650333","DOIUrl":null,"url":null,"abstract":"This paper shows that for a variety of parallel “soft computing” programs that use optimistic synchronization, the approximate nature of the values produced during execution can be exploited to improve performance significantly. Specifically, through mechanisms for imprecise sharing of values between threads, the amount of contention in these programs can be reduced thereby avoiding expensive aborts and improving parallel performance while keeping the results produced by the program within the bounds of an acceptable approximation. This is made possible due to our observation that for many such programs, a large fraction of the values produced during execution exhibit a substantial amount of value locality. We describe how this locality can be exploited using extensions to C/C++ language types that allow specification of limits on the precision and accuracy required and a novel value-aware conflict detection scheme that minimizes the number of conflicts while respecting these limits. Our experiments indicate that for the programs studied substantial speedups can be achieved - upto 5.7x over the original program for the same number of threads. We also present experimental evidence that for the programs studied, the amount of error introduced often grows relatively slowly.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Exploiting approximate value locality for data synchronization on multi-core processors\",\"authors\":\"Jaswanth Sreeram, S. Pande\",\"doi\":\"10.1109/IISWC.2010.5650333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper shows that for a variety of parallel “soft computing” programs that use optimistic synchronization, the approximate nature of the values produced during execution can be exploited to improve performance significantly. Specifically, through mechanisms for imprecise sharing of values between threads, the amount of contention in these programs can be reduced thereby avoiding expensive aborts and improving parallel performance while keeping the results produced by the program within the bounds of an acceptable approximation. This is made possible due to our observation that for many such programs, a large fraction of the values produced during execution exhibit a substantial amount of value locality. We describe how this locality can be exploited using extensions to C/C++ language types that allow specification of limits on the precision and accuracy required and a novel value-aware conflict detection scheme that minimizes the number of conflicts while respecting these limits. Our experiments indicate that for the programs studied substantial speedups can be achieved - upto 5.7x over the original program for the same number of threads. We also present experimental evidence that for the programs studied, the amount of error introduced often grows relatively slowly.\",\"PeriodicalId\":107589,\"journal\":{\"name\":\"IEEE International Symposium on Workload Characterization (IISWC'10)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on Workload Characterization (IISWC'10)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2010.5650333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Workload Characterization (IISWC'10)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2010.5650333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

本文表明，对于使用乐观同步的各种并行“软计算”程序，可以利用执行期间产生的值的近似性质来显着提高性能。具体来说，通过在线程之间不精确地共享值的机制，可以减少这些程序中的争用量，从而避免代价高昂的中止，提高并行性能，同时将程序产生的结果保持在可接受的近似范围内。这是可能的，因为我们观察到，对于许多这样的程序，在执行期间产生的很大一部分值表现出大量的值局部性。我们描述了如何使用C/ c++语言类型的扩展来利用这种局域性，这些扩展允许对所需的精度和准确性进行限制的规范，以及一种新的值感知冲突检测方案，该方案在尊重这些限制的同时最大限度地减少冲突的数量。我们的实验表明，对于所研究的程序，可以实现显著的速度提升——对于相同数量的线程，可以达到原始程序的5.7倍。我们还提供了实验证据，表明对于所研究的程序，引入的误差量通常增长相对缓慢。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting approximate value locality for data synchronization on multi-core processors

This paper shows that for a variety of parallel “soft computing” programs that use optimistic synchronization, the approximate nature of the values produced during execution can be exploited to improve performance significantly. Specifically, through mechanisms for imprecise sharing of values between threads, the amount of contention in these programs can be reduced thereby avoiding expensive aborts and improving parallel performance while keeping the results produced by the program within the bounds of an acceptable approximation. This is made possible due to our observation that for many such programs, a large fraction of the values produced during execution exhibit a substantial amount of value locality. We describe how this locality can be exploited using extensions to C/C++ language types that allow specification of limits on the precision and accuracy required and a novel value-aware conflict detection scheme that minimizes the number of conflicts while respecting these limits. Our experiments indicate that for the programs studied substantial speedups can be achieved - upto 5.7x over the original program for the same number of threads. We also present experimental evidence that for the programs studied, the amount of error introduced often grows relatively slowly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on Workload Characterization (IISWC'10)

自引率

0.00%

发文量