{"title":"嵌入式多核系统并行目标检测的数据局部性优化","authors":"B. Lai, C. Chiang, Guan-Ru Li","doi":"10.1109/ICSESS.2011.5982381","DOIUrl":null,"url":null,"abstract":"Object detection is an important application for modern smart embedded devices. It enables the device to recognize the surrounding environment and perform intelligent applications. The intensive computation requirements make the object detection an expensive application running on the resource-constrained embedded device. Parallel processing on multi-core systems provides a platform to boost the performance. However, the memory bottleneck limits the performance scalability. Improving data locality of the on-chip cache has therefore become a critical design concern. This paper analyzed the memory behavior of a parallel Viola-Jones algorithm, and proposed a scheme to enhance the data locality of on-chip cache. By running a multi-threaded object detection algorithm on a cycle-accurate multi-core simulator, the proposed approach can achieve up to 58% better performance when compared with the original parallel program.","PeriodicalId":108533,"journal":{"name":"2011 IEEE 2nd International Conference on Software Engineering and Service Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Data locality optimization for a parallel object detection on embedded multi-core systems\",\"authors\":\"B. Lai, C. Chiang, Guan-Ru Li\",\"doi\":\"10.1109/ICSESS.2011.5982381\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object detection is an important application for modern smart embedded devices. It enables the device to recognize the surrounding environment and perform intelligent applications. The intensive computation requirements make the object detection an expensive application running on the resource-constrained embedded device. Parallel processing on multi-core systems provides a platform to boost the performance. However, the memory bottleneck limits the performance scalability. Improving data locality of the on-chip cache has therefore become a critical design concern. This paper analyzed the memory behavior of a parallel Viola-Jones algorithm, and proposed a scheme to enhance the data locality of on-chip cache. By running a multi-threaded object detection algorithm on a cycle-accurate multi-core simulator, the proposed approach can achieve up to 58% better performance when compared with the original parallel program.\",\"PeriodicalId\":108533,\"journal\":{\"name\":\"2011 IEEE 2nd International Conference on Software Engineering and Service Science\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 2nd International Conference on Software Engineering and Service Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSESS.2011.5982381\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 2nd International Conference on Software Engineering and Service Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2011.5982381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data locality optimization for a parallel object detection on embedded multi-core systems
Object detection is an important application for modern smart embedded devices. It enables the device to recognize the surrounding environment and perform intelligent applications. The intensive computation requirements make the object detection an expensive application running on the resource-constrained embedded device. Parallel processing on multi-core systems provides a platform to boost the performance. However, the memory bottleneck limits the performance scalability. Improving data locality of the on-chip cache has therefore become a critical design concern. This paper analyzed the memory behavior of a parallel Viola-Jones algorithm, and proposed a scheme to enhance the data locality of on-chip cache. By running a multi-threaded object detection algorithm on a cycle-accurate multi-core simulator, the proposed approach can achieve up to 58% better performance when compared with the original parallel program.