QrnPro:加速古兰经应用程序的新处理器架构

2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences Pub Date : 2013-12-01 DOI:10.1109/NOORIC.2013.89

M. Soliman

{"title":"QrnPro:加速古兰经应用程序的新处理器架构","authors":"M. Soliman","doi":"10.1109/NOORIC.2013.89","DOIUrl":null,"url":null,"abstract":"Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.","PeriodicalId":328341,"journal":{"name":"2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"QrnPro: New Processor Architecture for Accelerating Quran Applications\",\"authors\":\"M. Soliman\",\"doi\":\"10.1109/NOORIC.2013.89\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.\",\"PeriodicalId\":328341,\"journal\":{\"name\":\"2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NOORIC.2013.89\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOORIC.2013.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

可兰经的应用包括图像/视频处理、语音识别、数据加密/解密等，这些都是基于数据并行的。这些应用程序的特点是对大型数据集进行结构化和规则的计算。本文提出了一种新的处理器架构QrnPro，以加速古兰经的应用。QrnPro通过将矢量处理技术添加到VLIW架构中，利用了可兰经应用程序中的数据并行性。QrnPro使用VLIW架构在并行执行单元上并发处理多个独立的标量指令。此外，数据并行性由矢量指令表示，并在VLIW架构的相同并行执行单元上进行处理。VLIW和向量处理之间的这种组合可以有效地利用资源，即使数据并行性的百分比不是100%。大小为256×128-bit的指令存储器以128位VLIW的形式存储古兰经应用程序的标量/矢量指令。单个寄存器文件(8-vector×16-element×32-bit或128×32-bit寄存器)用于存储多标量/向量元素。控制单元通过所需的操作数(多标量/矢量元素)向并行执行单元提供数据，并且每个时钟周期可以产生最多4×32-bit的结果。标量/矢量加载/存储以每个时钟周期128位(4×32-bit元素)的速率从/到QrnPro的数据存储器(512×128-bit)进行。最后，回写阶段每个时钟周期将来自内存系统或执行单元的最多四个结果(4×32-bit)写入QrnPro寄存器文件中。针对Xilinx FPGA Virtex-5、XC5VLX110T-3FF1136器件，采用VHDL实现了QrnPro的设计，并对其性能进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

QrnPro: New Processor Architecture for Accelerating Quran Applications

Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences

自引率

0.00%

发文量