Andrés Rainiero Hernández Coronado, Wonjun Lee, Wei-Ming Lin
{"title":"The Race-Timing Prototype","authors":"Andrés Rainiero Hernández Coronado, Wonjun Lee, Wei-Ming Lin","doi":"10.1109/PST52912.2021.9647804","DOIUrl":null,"url":null,"abstract":"The disclosure of transient execution attacks, such as Meltdown and Spectre, has once again highlighted the threat of cache side-channel configurations. It has now been demonstrated that the cache hierarchy of modern processors functions as a micro-architectural medium where the transient execution may leak private data, in addition to also serving as an attack surface that adversaries can use to directly spy on victim programs. As a response to tamper the effectiveness of cache side-channel configurations, hardware manufacturers, and most notably AMD, have now opted to reduce the processor’s cycle counter resolution to prevent adversaries from mounting side-channels in the higher levels of the cache hierarchy, specially in the L1D cache. This partially works because classic cache side-channel techniques, namely Prime+Probe or Flush+Reload, heavily rely on the processor’s cycle counter to detect changes in the cache state, which is normally done by tracking cache hits and misses. Yet, we present The Race-Timing Prototype to prove that devoted adversaries no longer rely on the processor’s cycle counter resolution to effectively distinguish cache hits from misses, and that a cache side-channel can still be configured by only utilizing controlled memory race conditions. In view that our implementation is hardware agnostic and does not rely on any proprietary instruction set extension, we demonstrate that our prototype works on processors from both Intel and AMD. Additionally, with propose a new semantic of Out-of-Order Detachment that further allows our Race-Timing measurements to closely match the accuracy of hardware Performance Monitoring Counters to distinguish fast L1D and L2 cache accesses in Intel processors. Ultimately, this work demonstrates that an adversary can exploit a cache side-channel with high-precision timing, and without utilizing cycle counters at all.","PeriodicalId":144610,"journal":{"name":"2021 18th International Conference on Privacy, Security and Trust (PST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Conference on Privacy, Security and Trust (PST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PST52912.2021.9647804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The disclosure of transient execution attacks, such as Meltdown and Spectre, has once again highlighted the threat of cache side-channel configurations. It has now been demonstrated that the cache hierarchy of modern processors functions as a micro-architectural medium where the transient execution may leak private data, in addition to also serving as an attack surface that adversaries can use to directly spy on victim programs. As a response to tamper the effectiveness of cache side-channel configurations, hardware manufacturers, and most notably AMD, have now opted to reduce the processor’s cycle counter resolution to prevent adversaries from mounting side-channels in the higher levels of the cache hierarchy, specially in the L1D cache. This partially works because classic cache side-channel techniques, namely Prime+Probe or Flush+Reload, heavily rely on the processor’s cycle counter to detect changes in the cache state, which is normally done by tracking cache hits and misses. Yet, we present The Race-Timing Prototype to prove that devoted adversaries no longer rely on the processor’s cycle counter resolution to effectively distinguish cache hits from misses, and that a cache side-channel can still be configured by only utilizing controlled memory race conditions. In view that our implementation is hardware agnostic and does not rely on any proprietary instruction set extension, we demonstrate that our prototype works on processors from both Intel and AMD. Additionally, with propose a new semantic of Out-of-Order Detachment that further allows our Race-Timing measurements to closely match the accuracy of hardware Performance Monitoring Counters to distinguish fast L1D and L2 cache accesses in Intel processors. Ultimately, this work demonstrates that an adversary can exploit a cache side-channel with high-precision timing, and without utilizing cycle counters at all.