{"title":"Exploiting cache coherence for effective on-the-fly data tracing in multicores","authors":"Mounika Ponugoti, A. Milenković","doi":"10.1109/ICCD.2016.7753295","DOIUrl":null,"url":null,"abstract":"Software testing and debugging of modern embedded computer systems become increasingly a challenging task due to growing hardware and software complexity, increased integration and miniaturization, and ever tightening time-to-market. To find software bugs faster, developers often rely on on-chip trace and debug resources. However, these resources offer limited visibility of the system, increase the system cost, and do not scale well with a growing number of processor cores. This paper introduces a new hardware/software mechanism for capturing and filtering load data value traces in multicores that enables a complete reconstruction of a parallel program execution. The proposed mechanism exploits data caches and cache coherence protocol states to minimize the number of trace events that are necessary to stream out of the target platform to the software debugger. The mechanism relies on a single trace bit per data cache block, thus minimizing the cost of hardware implementation. Our experimental evaluation explores the effectiveness of the proposed technique by measuring the trace port bandwidth as a function of the cache size and the number of processor cores. The results show that the proposed mechanism significantly reduces the required trace port bandwidth when compared to the Nexus-like load data value tracing. Depending on data cache size, the improvements range from 9.9 to 23.5 times for single cores and from 18.6 to 37.3 times for octa cores.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"567 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 34th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2016.7753295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Software testing and debugging of modern embedded computer systems become increasingly a challenging task due to growing hardware and software complexity, increased integration and miniaturization, and ever tightening time-to-market. To find software bugs faster, developers often rely on on-chip trace and debug resources. However, these resources offer limited visibility of the system, increase the system cost, and do not scale well with a growing number of processor cores. This paper introduces a new hardware/software mechanism for capturing and filtering load data value traces in multicores that enables a complete reconstruction of a parallel program execution. The proposed mechanism exploits data caches and cache coherence protocol states to minimize the number of trace events that are necessary to stream out of the target platform to the software debugger. The mechanism relies on a single trace bit per data cache block, thus minimizing the cost of hardware implementation. Our experimental evaluation explores the effectiveness of the proposed technique by measuring the trace port bandwidth as a function of the cache size and the number of processor cores. The results show that the proposed mechanism significantly reduces the required trace port bandwidth when compared to the Nexus-like load data value tracing. Depending on data cache size, the improvements range from 9.9 to 23.5 times for single cores and from 18.6 to 37.3 times for octa cores.