Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI:10.1109/ACAC.2001.903363

C. Jesshope

引用次数: 29

Abstract

This paper looks at a combination of two techniques, one of which, using a vector instruction set, has a long history dating back to pipelined vector supercomputers, such as the Cray 1 and its successors. The other technique, multi-threading, is also well understood. The novel approach proposed in this paper combines both vertical and horizontal micro-threading with vector instruction descriptors. It will be shown that a family of threads can represent a vector instruction with dependencies between the instances of that family, the iterations. This technique gives a very low overhead in implementing an n-way loop and is able to tolerate high memory latency. The use of micro-threading to handle dependencies between threads provides the ability to trade-off between instruction level parallelism and loop parallelism. The paper describes the means by which instruction classes may be instanced as independent parallel micro-threads and illustrates the speed-up that may be obtained compared to using a conventional loop.

查看原文本刊更多论文

利用微线程管道在芯片多处理器中实现高效的矢量指令集

本文着眼于两种技术的结合，其中一种是使用矢量指令集，这种技术的历史可以追溯到流水线矢量超级计算机，如Cray 1及其后续产品。另一种技术，多线程，也很容易理解。本文提出的新方法将垂直微线程和水平微线程与矢量指令描述符相结合。它将显示线程族可以表示向量指令，该指令族的实例之间存在依赖关系，即迭代。这种技术在实现n路循环时开销非常低，并且能够容忍高内存延迟。使用微线程处理线程之间的依赖关系提供了在指令级并行性和循环并行性之间进行权衡的能力。本文描述了将指令类实例化为独立并行微线程的方法，并举例说明了与使用传统循环相比可能获得的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001

自引率

0.00%

发文量