P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, C. Dulong
{"title":"Intel Itanium 2 4路多处理器系统并行内部点求解器的构建与性能表征","authors":"P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, C. Dulong","doi":"10.1109/WWC.2004.1437402","DOIUrl":null,"url":null,"abstract":"In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium/spl reg/ 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune/spl trade/ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.","PeriodicalId":240633,"journal":{"name":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","volume":"11 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Construction and performance characterization of parallel interior point solver on 4-way Intel Itanium 2 multiprocessor system\",\"authors\":\"P. Koka, T. Suh, M. Smelyanskiy, R. Grzeszczuk, C. Dulong\",\"doi\":\"10.1109/WWC.2004.1437402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium/spl reg/ 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune/spl trade/ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.\",\"PeriodicalId\":240633,\"journal\":{\"name\":\"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004\",\"volume\":\"11 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WWC.2004.1437402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WWC.2004.1437402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Construction and performance characterization of parallel interior point solver on 4-way Intel Itanium 2 multiprocessor system
In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium/spl reg/ 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune/spl trade/ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.