IEEE Parallel & Distributed Technology: Systems & Applications最新文献

A Unified Trace Environment for IBM SP systems 用于IBM SP系统的统一跟踪环境

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-06-01 DOI: 10.1109/M-PDT.1996.494613

C. Wu, H. Franke, Yew-Huey Liu

{"title":"A Unified Trace Environment for IBM SP systems","authors":"C. Wu, H. Franke, Yew-Huey Liu","doi":"10.1109/M-PDT.1996.494613","DOIUrl":"https://doi.org/10.1109/M-PDT.1996.494613","url":null,"abstract":"C. Eric Wu, Hubertus Franke, and Yew-Huey Liu IBM T J. Watson Research Center Distributed parallel processing can increase system computing power beyond the limits of current uniprocessor technology. However, programming in such a system based on the message-passing programming model is much more complex than writing sequential programs. To take advantage of the underlying hardware, understanding the communication behavior of parallel programs and system responses to user applications is extremely critical. One common way of monitoring a program’s behavior is to generate trace events while executing the program. Events generated can then be used for other purposes such as debugging and program visualization. However, as we’ll see, such a method potentially requires source code modification, increases overhead, and causes clocksynchronization problems. T o meet these challenges, we developed a Unified Trace Environment for IBM SP systems. The user-level U T E trace libraries require only relinking for generating message-passing and system events. With the UTE, users can generate message-passing events with minimum overhead, and mark specific portions of the program, such as various phases, loops, and routines, for performance analysis and visualization. Most user-level trace tools for messagepassing systems require source code modification to collect message-passing events. More advanced tools such as the Paradyn systeml require no source code modification; they insert the code for performance instrumentation into an application program during execution. However, instrumentation daemons cause substantial overhead. Collecting system events is as important as collecting message-passing events. System and I/O events such as process dispatch and page fault can reveal crucial information on system responses to user applications. The trace facility should also easily expand to trace activities from other software layers, such as parallel I/O file systems and high-level parallel languages. Such expandability enables the same trace facility to trace multiple software systems. One of the most serious problems in trace analysis for distributed parallel systems is clock synchronization. In such a system, multiple processors generate trace records, and often multiple nodes produce separate streams independently. The logical order of events might not be guaranteed in the trace because of discrepancies among local clocks. As a result, many trace facilities must do additional work to ensure consistent time stamps, thus increasing trace overhead. The challenges of trace analysis","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125388148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Index, volume 4, 1996 索引，第4卷，1996年

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/M-PDT.1996.544443

J. D. Cavin

引用次数: 0

Advances in distributed sensor technology 分布式传感器技术的进展

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/MPDT.1996.7102340

J. D. Cavin

引用次数: 81

Parallel architectures: Techniques for compiler-directed cache coherence 并行架构:面向编译器的缓存一致性技术

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/M-PDT.1996.544438

L. Choi, Hock-Beng Lim, P. Yew

引用次数: 4

Integrating personal computers in a distributed client-server environment 在分布式客户机-服务器环境中集成个人计算机

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/MPDT.1996.7102338

Everett Markowska-Scott

引用次数: 2

Topics in advanced scientific computation 高级科学计算主题

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/M-PDT.1996.544444

J. Zalewski, M. Pernice

引用次数: 28

Fault-tolerant computer system design 计算机容错系统设计

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/MPDT.1996.7102341

N. Jha

引用次数: 391

Parallel processing in cellular arrays 蜂窝阵列中的并行处理

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/MPDT.1996.7102337

Albert Y. Zomaya

引用次数: 3

Parallel architectures: Cache memories for dataflow systems 并行架构:数据流系统的缓存存储器

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/88.544436

A. Hurson, K. Kavi, B. Shirazi, Ben Lee

引用次数: 3

Parallel computing works! 并行计算工作!

IEEE Parallel & Distributed Technology: Systems & Applications Pub Date : 1996-01-24 DOI: 10.1109/MPDT.1996.7102339

M. Paprzycki

引用次数: 194