{"title":"A parallel computer based on simple DSP modules","authors":"F. Mayer-Lindenberg","doi":"10.1016/0165-6074(95)00013-E","DOIUrl":"10.1016/0165-6074(95)00013-E","url":null,"abstract":"<div><p>This article reports on an engineering project at the TUHH aimed at providing a massively parallel experimental computer system to support a number of research projects. The computer nicknamed the PENTAGON is an MIMD system containing a number of identical processing elements (PE's) linked via interfaces. The network is a 3-D torus, and the nodes are based on off-the-shelf signal processor chips, namely TMS 320C40's from TI. The design adds to these standard ingredients an engineering discipline to keep things as simple as possible, and a corresponding, quite unusual physical setup of the total system. These make up for a very cost effective system showing how simple it may be to build a powerful parallel machine.</p><p>Although based on a standard architecture, the PENTAGON design takes some special choices, the most important being the complete distribution of I/O capabilities. This provides for an unlimited I/O bandwidth, the support of realtime applications and excellent capabilities of expansion. A graphics interface has been designed to provide direct realtime output from the DSP's. Another recent extension is a set of Power-PC modules on top of the DSP nodes.</p><p>Besides standard commercial compilers for 'C40 networks, the functional language Fifth of the author has been implemented on the PENTAGON. Fifth provides facilities such as distributed objects and the automatical distribution of parallel programs. For well parallelizable applications such as the calculation of a Mandelbrot set, high efficiencies in the usage of the processors have been obtained.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 4","pages":"Pages 301-314"},"PeriodicalIF":0.0,"publicationDate":"1995-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00013-E","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126150656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The multi-associative branch target buffer: a cost effective BTB mechanism","authors":"Weili Chu , Stamatis Vassiliadis , JoséG. Delgado-Frias","doi":"10.1016/0165-6074(95)00009-D","DOIUrl":"10.1016/0165-6074(95)00009-D","url":null,"abstract":"<div><p>A new branch target buffer hardware organization, denoted as the multi-associative branch target buffer (MBTB), for efficient branch handling in pipelined central processing units (CPUs) is presented. The proposed organization consists of multiple different size arrays addressed via a bit selection addressing mechanism. These arrays are used to maintain information pertinent to the branches, including information usually contained within the traditional branch target buffers such as branch instruction address and branch target address. The proposed configuration and its bit extraction mechanism — which is used to increase the hit ratio of the buffers — provides the capability of dynamically increasing the associativity of the branch target buffers. Due to the new organization, i.e. the multiple array structure, along with the new addressing scheme, it is suggested, based on simulation results, that improvements with reduced hardware can be expected when a multi-associative branch target buffer is installed in a CPU implementation.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 3","pages":"Pages 211-225"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00009-D","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131263451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calendar of forthcoming conferences and eventsuting","authors":"","doi":"10.1016/0165-6074(95)90004-7","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90004-7","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 3","pages":"Pages 261-262"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90004-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137290309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modified straight division: A computer implementation of multiple-precision division","authors":"Ranjani Parthasarathi, Ashok Jhunjhunwala","doi":"10.1016/0165-6074(94)00091-N","DOIUrl":"10.1016/0165-6074(94)00091-N","url":null,"abstract":"<div><p>The ‘Straight division’ algorithm is an in-place division technique that has been known in India as a mental computation technique. This technique is found to be suitable for implementing multiple-precision division on computers using existing single-precision operations. This paper presents some modifications carried out on this basic technique to improve the efficiency of the algorithm. It also discusses an implementation of multiple-precision division using this modified straight division technique on two different processor architectures. This is followed by an analysis of these implementations in comparison with other existing division techniques. It is found that the modified straight division is superior in performance to other known methods.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 3","pages":"Pages 193-209"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00091-N","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128212719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parset: A language construct for system independent parallel programming on distributed systems","authors":"Rushikesh K. Joshi, D.Janaki Ram","doi":"10.1016/0165-6074(95)00006-A","DOIUrl":"10.1016/0165-6074(95)00006-A","url":null,"abstract":"<div><p>Parallel programming on loosely coupled distributed systems involves many system dependent tasks such as sensing node availability, creating remote processes, programming inter-process communication and synchronization, etc. Very often these system-dependent tasks are handled at the programmer level. This has complicated the process of parallel programming on distributed systems. The portability of these programs is also severely affected. The programmer may also start his remote processes on heavily loaded nodes, thereby degrading the overall performance of the system. To overcome these difficulties, we introduce a language construct called parset at the programming level. Parset captures various kinds of coarse grain parallelism occurring in distributed systems. It also provides scalability to distributed programs. We show that this construct greatly simplifies writing programs on distributed systems providing transparency to various system dependent tasks.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 3","pages":"Pages 245-259"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00006-A","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133403334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-dimensional specification of queries in object-oriented databases","authors":"Jae-Cheol Kwak , Songchun Moon","doi":"10.1016/0165-6074(95)00005-9","DOIUrl":"10.1016/0165-6074(95)00005-9","url":null,"abstract":"<div><p>Visual queries based on schema graphs simplify access to databases for technical and non-technical users. Unlike relational databases, in object-oriented databases, the basic entity in a query, i.e. a class, is frequently considered as a compound of several entities to which the query operations may apply, which causes the deficiency in describing an entity of designation. In this paper, we propose a visual query language <em>object query diagram</em> (OQD) for object-oriented databases, where a class is decomposed into a number of <em>object sets</em>, each of which is a set of values of one of the attributes of the other classes. By representing each class and object sets in the class using the well-known Venn diagram in a query, OQD explicitly presents all the entities to which the operations in a query can apply. We describe the syntax and semantics of OQD through a number of illustrative examples.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 3","pages":"Pages 227-244"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00005-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124731557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient fault tolerant cache memory design","authors":"H.T. Verges, D. Nikolos","doi":"10.1016/0165-6074(95)00004-8","DOIUrl":"10.1016/0165-6074(95)00004-8","url":null,"abstract":"<div><p>In this paper we firstly discuss the consequences of cache memory defects/faults in the operation of the system and we show that cache tag defects/faults compared to cache data defects/faults may cause significantly more serious consequences on the integrity and performance of the system. A possible solution is the use of a single error correcting-double error detecting (SEC/DED) code in the cache tag memory. However, the classical implementation of the SEC/DED code is proved to be inappropriate for the tag memory due to the required silicon area and time delays. In this paper we propose a new way of the SEC/DED code exploitation well-suited to cache tag memories. During fault free operation the proposed technique does not add any delay on the critical path of the cache, while in the case of a single error the delay is so small that the cache access time is increased by at most one CPU cycle. An example design shows the superiority of the proposed technique against the classical one. The application of the proposed scheme to real and virtual addressed caches of one or two levels is also discussed.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 153-169"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00004-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127605382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A software-controlled prefetching mechanism for software-managed TLBs","authors":"Jang Suk Park , Gwang Seon Ahn","doi":"10.1016/0165-6074(95)00003-7","DOIUrl":"10.1016/0165-6074(95)00003-7","url":null,"abstract":"<div><p>The TLB (Translation Lookaside Buffer) miss services have been concealed from operating systems, but some new RISC architectures manage the TLB in software. Since software-managed TLBs provide flexibility to an operating system in page translation, they are considered an important factor in the design of microprocessors for open system environments. However, software-managed TLBs suffer from larger miss penalty than hardware-managed TLBs, since they require more extra context switching overhead than hardware-managed TLBs.</p><p>This paper introduces a new technique for reducing the miss penalty of software-managed TLBs by prefetching necessary TLB entries before being used. This technique is not inherently limited to specific applications. The key of this scheme is to perform the prefetch operations to update the TLB entries before first accesses so that TLB misses can be avoided. Using trace-driven simulation and a quantitative analysis, the proposed scheme is evaluated in terms of the miss rate and the total miss penalty. Our results show that the proposed scheme reduces the TLB miss rate by a factor of 6% to 77% due to TLB characteristics and page sizes. In addition, it is found that reducing the miss rate by the prefetching scheme reduces the total miss penalty and bus traffics in software-managed TLBs.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 121-136"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00003-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124116989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating job sizes in distributed load balancing","authors":"John G. Vaughan","doi":"10.1016/0165-6074(95)00008-C","DOIUrl":"10.1016/0165-6074(95)00008-C","url":null,"abstract":"<div><p>The load index most frequently used for load balancing in distributed systems is the job queue length. This work examines some of the implications of scheduling jobs according to an additional abstract dimension attribute called job size. The load balancing algorithm is supported by a virtual ring structure which organises the network nodes in groups and defines the information-gathering activities to take place within and between such groups. A two-phase approach to information gathering and decision making is adopted. This enables the selection of jobs for transfer to be delayed until as close as possible to the moment of transfer. The operation of the protocol is described for each phase and synchronisation of the parallel activities in the virtual rings is discussed. The schedule length performance of the distributed algorithm is examined in a series of closed-system tests.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 111-119"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00008-C","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131643444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}