{"title":"Introducing mNUMA: an extended PGAS architecture","authors":"Megan Vance, P. Kogge","doi":"10.1145/2020373.2020379","DOIUrl":null,"url":null,"abstract":"We describe design details of a Light Weight Processing migration-NUMA architecture, a novel high performance system design that provides hardware support for a partitioned global address space, migrating subjects, and word level synchronization primitives. Using the architectural definition, combinations of structures are shown to work together to carry out basic actions such as address translation, migration, in-memory synchronization, and work management. We present results from simulation of microkernels showing that LWP-mNUMA compensates for latency with far greater memory access concurrency than possible on a conventional systems. In particular, several microkernels model tough, irregular access patterns that have limited speedups -- in certain problem areas -- to dozens of conventional processors. On these, results show speedup increasing up to 1024 multicore mNUMA processing nodes, running over 1 million threadlets.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Partitioned Global Address Space Programming Models","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2020373.2020379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We describe design details of a Light Weight Processing migration-NUMA architecture, a novel high performance system design that provides hardware support for a partitioned global address space, migrating subjects, and word level synchronization primitives. Using the architectural definition, combinations of structures are shown to work together to carry out basic actions such as address translation, migration, in-memory synchronization, and work management. We present results from simulation of microkernels showing that LWP-mNUMA compensates for latency with far greater memory access concurrency than possible on a conventional systems. In particular, several microkernels model tough, irregular access patterns that have limited speedups -- in certain problem areas -- to dozens of conventional processors. On these, results show speedup increasing up to 1024 multicore mNUMA processing nodes, running over 1 million threadlets.