{"title":"多媒体应用中芯片多处理器的内核间数据重用和流水线","authors":"L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha","doi":"10.1109/ESTMED.2009.5336815","DOIUrl":null,"url":null,"abstract":"The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications\",\"authors\":\"L. A. Bathen, Yongjin Ahn, N. Dutt, S. Pasricha\",\"doi\":\"10.1109/ESTMED.2009.5336815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.\",\"PeriodicalId\":104499,\"journal\":{\"name\":\"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESTMED.2009.5336815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTMED.2009.5336815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications
The increasing demand for low power and high performance multimedia embedded systems has motivatedation bandwidth and latency requirements under a tight power budge the need for effective solutions to satisfy applict. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches.