Proceedings 11th International Parallel Processing Symposium最新文献

External adjustment of runtime parameters in Time Warp synchronized parallel simulators 时间扭曲同步并行模拟器运行时参数的外部调整

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580905

R. Radhakrishnan, L. Moore, P. Wilsey

{"title":"External adjustment of runtime parameters in Time Warp synchronized parallel simulators","authors":"R. Radhakrishnan, L. Moore, P. Wilsey","doi":"10.1109/IPPS.1997.580905","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580905","url":null,"abstract":"Several optimizations to the Time Warp synchronization protocol for parallel discrete event simulation have been proposed and studied. Many of these optimizations have included some form of dynamic adjustment (or control) of the operating parameters of the simulation (e.g. checkpoint interval, cancellation strategy). Traditionally dynamic parameter adjustment has been performed at the simulation object level; each simulation object collects measures of its operating behaviors (e.g. rollback frequency, rollback length, etc.) and uses them to adjust its operating parameters. The performance data collection functions and parameter adjustment are overhead costs that are incurred in the expectation of higher throughput. The paper presents a method of eliminating some of these overheads through the use of an external object to adjust the control parameters. That is, instead of inserting code for adjusting simulation parameters in the simulation object, an external control object is defined to periodically analyze each simulation object's performance data and revise that object's operating parameters. An implementation of an external control object in the WARPED Time Warp simulation kernel has been completed. The simulation parameters updated by the implemented control system are: checkpoint interval, and cancellation strategy (lazy or aggressive). A comparative analysis of three test cases shows that the external control mechanism provides speedups between 5%-17% over the best performing embedded dynamic adjustment algorithms.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115713743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Lower bounds on systolic gossip 收缩闲散的下界

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580949

M. Flammini, S. Pérennes

引用次数: 7

Parallel simulated annealing: an adaptive approach 并行模拟退火:一种自适应方法

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580950

J. Knopman, J. S. Aude

引用次数: 11

The impact of timing on linearizability in counting networks 时序对计数网络线性化的影响

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580978

M. Mavronicolas, M. Papatriantafilou, P. Tsigas

{"title":"The impact of timing on linearizability in counting networks","authors":"M. Mavronicolas, M. Papatriantafilou, P. Tsigas","doi":"10.1109/IPPS.1997.580978","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580978","url":null,"abstract":"Counting networks form a new class of distributed, low-contention data structures made up of interconnected balancers, and are suitable for solving a variety of multiprocessor synchronization problems that can be expressed as counting problems. A linearizable counting network guarantees that the order of the values it returns respects the real-time order they were requested. Linearizability significantly raises the capabilities of the network, but at a possible price in network size or synchronization support. In this paper, we further pursue the systematic study of the impact of timing on linearizability for counting networks, along a research line initiated by Lynch et al. (1996). We consider two basic timing models: the instantaneous balancer model, in which the transition of a token from an input to an output port of a balancer is modeled as an instantaneous event, and the periodic balancer model, where balancers send out tokens at a fixed rate. We also consider lower and upper bounds on the delays incurred by wires connecting the balancers. We present necessary and sufficient conditions for linearizability in the form of precise inequalities that involve timing parameters and identify structural parameters of the counting network, which may be of more general interest. Our results significantly extend and strengthen previous impossibility and possibility results on linearizability in counting networks (Herlihy et al., 1990; Lynch et al., 1996).","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127437605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Implementation and results of hypothesis testing from the C/sup 3/I parallel benchmark suite C/sup 3/I并行基准测试套件的假设检验的实现和结果

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580886

B. V. Voorst, Luiz Pires, R. Jha, Mustafa Muhammad

引用次数: 7

Performance prediction for complex parallel applications 复杂并行应用程序的性能预测

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580884

J. Brehm, P. Worley

引用次数: 8

Coherent block data transfer in the FLASH multiprocessor 在FLASH多处理器中进行相干块数据传输

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580836

J. Heinlein, K. Gharachorloo, Robert P. Bosch, M. Rosenblum, Anoop Gupta

{"title":"Coherent block data transfer in the FLASH multiprocessor","authors":"J. Heinlein, K. Gharachorloo, Robert P. Bosch, M. Rosenblum, Anoop Gupta","doi":"10.1109/IPPS.1997.580836","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580836","url":null,"abstract":"A key goal of the Stanford FLASH project is to explore the integration of multiple communication protocols in a single multiprocessor architecture. To achieve this goal, FLASH includes a programmable node controller called MAGIC, which contains an embedded protocol processor capable of implementing multiple protocols. In this paper we present a specialized protocol for block data transfer integrated with a conventional cache coherence protocol. Block transfer forms the basis for message passing implementations on top of shared memory, occurs in important workloads such as databases, and is frequently used by the operating system. We discuss the issues that arise in designing a fully integrated protocol and its interactions with cache coherence. Using microbenchmarks, MPI communication primitives, and an application running on the operating system, we compare our protocol with standard bcopy and bcopy augmented with prefetches. Our results show that integrated block transfer can accelerate communication between nodes while off-loading the task from the main processor utilizing the network more efficiently, and reducing the associated cache pollution. Given the aggressive support for prefetching in FLASH, prefetched bcopy is able to achieve competitive performance in many cases but lacks the other three advantages of our protocol.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123641456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Multiple templates access of trees in parallel memory systems 并行存储系统中树的多模板访问

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580980

V. Auletta, A. D. Vivo, V. Scarano

{"title":"Multiple templates access of trees in parallel memory systems","authors":"V. Auletta, A. D. Vivo, V. Scarano","doi":"10.1109/IPPS.1997.580980","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580980","url":null,"abstract":"Studies the problem of mapping the N nodes of a data structure onto M memory modules so that they can be accessed in parallel by templates, i.e. distinct sets of nodes. In the literature, several algorithms are available for arrays (accessed by rows, columns, diagonals and subarrays) and trees (accessed by subtrees, root-to-leaf paths, etc.). Although some mapping algorithms for arrays allow conflict-free access to several templates at once (e.g. rows and columns), no mapping algorithm is known for efficiently accessing both subtree and root-to-leaf path templates in complete binary trees. We prove that any mapping algorithm that is conflict-free for one of these two templates has /spl Omega/(M/log M) conflicts on the other. Therefore, no mapping algorithm can be found that is conflict-free on both templates. We give an algorithm for mapping complete binary trees with N=2/sup M/-1 nodes on M memory modules in such a way that: (a) the number of conflicts for accessing a subtree template or a root-to-leaf path template is O[/spl radic/(M/logM)], (b) the load (i.e. the ratio between the maximum and minimum number of data items mapped on each module) is 1+o(1), and (c) the time complexity for retrieving the module where a given data item is stored is O(1) if a preprocessing phase of space and time complexity O(log N) is executed, or O(log log N) if no preprocessing is allowed.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121648319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Fast parallel computation of the polynomial shift 多项式移位的快速并行计算

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580933

E. Zima

引用次数: 5

Parallel 'Go with the winners' algorithms in the LogP model LogP模型中并行的“与胜者同行”算法

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580972

Marcus Peinado, Thomas Lengauer

引用次数: 8