Srividya Rajaraman, Pritam S. Sirpotdar, Abhijeet Wavare, A. Patki
{"title":"多线程在单核TMS320C6713 DSP上的实现","authors":"Srividya Rajaraman, Pritam S. Sirpotdar, Abhijeet Wavare, A. Patki","doi":"10.1109/EIC.2015.7230723","DOIUrl":null,"url":null,"abstract":"Very Long Instruction Word is an architectural breakthrough in DSP architecture that caters to the real time constraints and efficient algorithm implementation. This paper brings out various loopholes namely latency, underutilization of functional units, use of NOPs and constraints of cross path in register file accessing present in such architecture. This paper proposes a technique to reduce the delay slots present in the pipeline due to NOPs and hence obtain reduction in code size and reduced latency. With the available functional units, thread level parallelism is introduced to enhance existing instruction level parallelism, thus addressing the issue of under utilization of functional units. Aforementioned issues are dealt with by the use of multithreading - concept frequently associated with multi-core DSPs and RTOS. This paper reports a novel technique of introducing a programming discipline in assembly coding to emulate multithreading in a single core DSP without use of OS and reduction in the number of clock cycles required is observed. Code snippets implemented using Code Composer Studio for TMS320C6713 illustrate the concepts.","PeriodicalId":101532,"journal":{"name":"2014 International Conference on Advances in Communication and Computing Technologies (ICACACT 2014)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multithreading implementation in a single core TMS320C6713 DSP\",\"authors\":\"Srividya Rajaraman, Pritam S. Sirpotdar, Abhijeet Wavare, A. Patki\",\"doi\":\"10.1109/EIC.2015.7230723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Very Long Instruction Word is an architectural breakthrough in DSP architecture that caters to the real time constraints and efficient algorithm implementation. This paper brings out various loopholes namely latency, underutilization of functional units, use of NOPs and constraints of cross path in register file accessing present in such architecture. This paper proposes a technique to reduce the delay slots present in the pipeline due to NOPs and hence obtain reduction in code size and reduced latency. With the available functional units, thread level parallelism is introduced to enhance existing instruction level parallelism, thus addressing the issue of under utilization of functional units. Aforementioned issues are dealt with by the use of multithreading - concept frequently associated with multi-core DSPs and RTOS. This paper reports a novel technique of introducing a programming discipline in assembly coding to emulate multithreading in a single core DSP without use of OS and reduction in the number of clock cycles required is observed. Code snippets implemented using Code Composer Studio for TMS320C6713 illustrate the concepts.\",\"PeriodicalId\":101532,\"journal\":{\"name\":\"2014 International Conference on Advances in Communication and Computing Technologies (ICACACT 2014)\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Advances in Communication and Computing Technologies (ICACACT 2014)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EIC.2015.7230723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advances in Communication and Computing Technologies (ICACACT 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIC.2015.7230723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
超长指令字(Very Long Instruction Word)是DSP体系结构中的一个突破,它满足了实时约束和高效算法实现的要求。本文指出了该体系结构存在的各种漏洞,即延迟、功能单元利用率不足、nop的使用以及寄存器文件访问中的交叉路径约束。本文提出了一种技术来减少由于nop而出现在管道中的延迟槽,从而减少代码大小和减少延迟。在现有功能单元的基础上,引入线程级并行来增强现有的指令级并行,从而解决功能单元利用率不足的问题。前面提到的问题是通过使用多线程来解决的,这个概念经常与多核dsp和RTOS相关。本文报道了一种新的技术,在汇编编码中引入编程规则来模拟单核DSP中的多线程,而不使用操作系统,并且观察到所需时钟周期的数量减少。使用TMS320C6713的Code Composer Studio实现的代码片段说明了这些概念。
Multithreading implementation in a single core TMS320C6713 DSP
Very Long Instruction Word is an architectural breakthrough in DSP architecture that caters to the real time constraints and efficient algorithm implementation. This paper brings out various loopholes namely latency, underutilization of functional units, use of NOPs and constraints of cross path in register file accessing present in such architecture. This paper proposes a technique to reduce the delay slots present in the pipeline due to NOPs and hence obtain reduction in code size and reduced latency. With the available functional units, thread level parallelism is introduced to enhance existing instruction level parallelism, thus addressing the issue of under utilization of functional units. Aforementioned issues are dealt with by the use of multithreading - concept frequently associated with multi-core DSPs and RTOS. This paper reports a novel technique of introducing a programming discipline in assembly coding to emulate multithreading in a single core DSP without use of OS and reduction in the number of clock cycles required is observed. Code snippets implemented using Code Composer Studio for TMS320C6713 illustrate the concepts.