Young-Sik Kim, T. Han, Shin-Dug Kim, Sung-Bong Yang
{"title":"An effective memory-processor integrated architecture for computer vision","authors":"Young-Sik Kim, T. Han, Shin-Dug Kim, Sung-Bong Yang","doi":"10.1109/ICPP.1997.622654","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622654","url":null,"abstract":"In this paper an effective memory-processor integrated architecture, called memory based processor array (MPA), for computer vision is proposed. The MPA can be easily attached into any host system via memory interface. In order to measure the impact of the memory interface structure an analytical model is derived. The performance improvement on the proposed model for the memory interface architecture of the MPA system can be 6%/spl sim/40% for vision tasks consisting of sequential and data parallel tasks. The asymptotic time complexities of the mapping algorithms are evaluated to verify the cost-effectiveness and the efficiency of the MPA system.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122811510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise call graph construction for OO programs in the presence of virtual functions","authors":"Deepa B. Bairagi, D. Agrawal, Sandeep Kumar","doi":"10.1109/ICPP.1997.622674","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622674","url":null,"abstract":"Several intra- and inter-procedural program analysis techniques form the backbone of an optimizing and parallelizing compiler. The efficacy of these analyses depends upon how precise the call graph is. However, due to lack of exact type information for objects in an object-oriented (OO) program the existing call graph construction algorithms are rendered imprecise. In this paper, we present an algorithm for constructing a more precise call graph by exploiting the static class hierarchy of an OO program. The information collected during the class hierarchy analysis helps in avoiding unnecessary addition of many spurious call graph edges for virtual-function calls. We have implemented our algorithm for handling C++ programs within a restructuring tool, Sage++. With our precise algorithm for call graph construction, the percentage reduction in the number of nodes and edges in the call graphs for the benchmark programs we had selected ranged between 4% to 56% and between 22% to 58%, respectively.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134237460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A class of fixed-degree Cayley-graph interconnection networks derived by pruning k-ary n-cubes","authors":"D. Kwai, B. Parhami","doi":"10.1109/ICPP.1997.622563","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622563","url":null,"abstract":"We introduce a pruning scheme to reduce the node degree of k-ary n-cube from 2n to 4. The links corresponding to n-2 of the n dimensions are removed from each node. One of the remaining dimensions is common to all nodes and the other is selected periodically from the remaining n-1 dimensions. Despite the removal of a large number of links from the k-ary n-cube, this incomplete version still preserves many of its desirable topological properties. In this paper, we show that this incomplete k-ary n-cube belongs to the class of Cayley graphs, and hence, is node-symmetric. It is 4-connected with diameter close to that of the k-ary n-cube.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic parallelization and scheduling of programs on multiprocessors using CASCH","authors":"I. Ahmad, Yu-Kwong Kwok, Minyou Wu, W. Shu","doi":"10.1109/ICPP.1997.622657","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622657","url":null,"abstract":"The lack of a versatile software tool for parallel program development has been one of the major obstacles for exploiting the potential of high-performance architectures. In this paper, we describe an experimental software tool called CASCH (Computer Aided SCHeduling) for parallelizing and scheduling applications to parallel processors. CASCH transforms a sequential program to a parallel program with automatic scheduling, mapping, communication, and synchronization. The major strength of CASCH is its extensive library of scheduling and mapping algorithms representing a broad range of state-of-the-art work reported in the recent literature. These algorithms are applied for allocating a parallelized program to the processors, and thus the algorithms can be interactively analyzed, tested and compared using real data on a common platform with various performance objectives. CASCH is useful for both novice and expert programmers of parallel machines, and can serve as a teaching and learning aid for understanding scheduling and mapping algorithms.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128301312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a circuit-switched highly fault-tolerant k-ary n-cube","authors":"B. Izadi, F. Özgüner","doi":"10.1109/ICPP.1997.622664","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622664","url":null,"abstract":"In this paper we present a strongly fault-tolerant design for the k-ary n-cube multiprocessor and examine its reconfigurability. Our design augments the k-ary n-cube with (/sup k///sub j/)/sup n/ spare nodes; each set of j/sup n/ regular nodes is connected to a spare node and the spare nodes are interconnected as a (/sup k///sub j/)-ary n-cube. Our approach utilizes the circuit-switched capabilities of the communication modules of the spare nodes to tolerate a large number of faulty nodes and faulty links without any performance degradation. Both theoretical and simulation results are presented.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130901962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time job scheduling in hypercube systems","authors":"O-Hoon Kwon, Jong Kim, S. Hong, Sunggu Lee","doi":"10.1109/ICPP.1997.622581","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622581","url":null,"abstract":"In this paper, we present the problem of scheduling real-time jobs in a hypercube system and propose a scheduling algorithm. The goals of the proposed scheduling algorithm are to determine whether all jobs can complete their processing before their fixed deadlines in a hypercube system and to find such a schedule. Each job is associated with a computation time, a deadline, and a dimensional requirement. Determining a schedule such that all jobs meet before their respective fixed deadlines in a hypercube system when preemption is not allowed is an NP-complete problem. Hence, we present a heuristic scheduling algorithm for scheduling non-preemptable real-time jobs in a hypercube system. Finally, we evaluate the proposed algorithm using simulation.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114584421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative analysis on caching effect of I-structure data in frame-based multithreaded processing","authors":"Hyong-Shik Kim, S. Ha, C. Jhon","doi":"10.1109/ICPP.1997.622573","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622573","url":null,"abstract":"Since long latency due to remote memory access could be tolerated by rapidly switching to another thread in multithreaded processing, caching I-structure data is expected to have less beneficial effect on the performance than caching ordinary data. In this paper we show that caching I-structure data could improve the overall performance in spite of latency tolerating property of multithreading. Our quantitative analysis reveals that the most important caching effect off-structure data in frame-based multithreading is the enhancement of frame parallelism. It reduces the idle time due to latency by lowering latency sensitivity and at the same time decreases the thread processing time by exploiting more processors.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message encoding techniques for efficient array redistribution","authors":"Yeh-Ching Chung, Ching-Hsien Hsu","doi":"10.1109/ICPP.1997.622579","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622579","url":null,"abstract":"In this paper, we present message encoding techniques to improve the performance of BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) (and vice versa) array redistribution algorithms. The message encoding techniques are machine independent and could be used with different algorithms. By incorporating the techniques in array redistribution algorithms, one can reduce the computation overheads and improve the overall performance of array redistribution algorithms. To evaluate the performance of the techniques, we have implemented the message encoding techniques into some array redistribution algorithms on an IBM SP2 parallel machine. The experimental results show that the execution time of array redistribution algorithms with the message encoding techniques is 3% to 22% faster than those without the message encoding techniques.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122792915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerant parallel applications using queues and actions","authors":"J. A. Smith, S. Shrivastava","doi":"10.1109/ICPP.1997.622578","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622578","url":null,"abstract":"There are many techniques supporting execution of large computations over a network of workstations (NOW) but data intensive computations are usually run on high performance parallel machines. A NOW comprising individual user's machines typically has a low performance interconnect and suffers arbitrary changes of availability. Exploiting such resources to execute data intensive computations is difficult but even in a more constrained environment there is an unfulfilled need for fault-tolerance. The structuring approach presented fulfills this need. Performance exceeding 100 Mflop/s is demonstrated for large fault-tolerant out of core examples of matrix multiplication and Cholesky factorisation using five 133 MHz Pentium compute machines.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131733915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tree-based multicasting on wormhole routed multistage interconnection networks","authors":"Vara Varavithya, P. Mohapatra","doi":"10.1109/ICPP.1997.622645","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622645","url":null,"abstract":"In this peeper, we propose a tree-based multicasting algorithm for Multistage Interconnection Networks. We first analyze the necessary conditions for deadlocks in MINs. Based on these observations, an asynchronous tree-based multicasting algorithm is developed in which deadlocks are prevented by serializing the initiations of branching operations that have potential for creating deadlocks. The serialization is done using a technique based on grouping of the switching elements. The preliminary simulation results are encouraging as it lowers the latency by almost a factor of 4 when compared with the software multicasting approach proposed earlier.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}