{"title":"基于多核、多gpu和协处理器系统的异步任务分配模型的自动调优TRSM","authors":"Clícia Pinto, Marcos E. Barreto, M. Boratto","doi":"10.1109/AICCSA.2016.7945637","DOIUrl":null,"url":null,"abstract":"The increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallel algorithms that efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi-GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.","PeriodicalId":448329,"journal":{"name":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Auto-tuning TRSM with an asynchronous task assignment model on multicore, multi-GPU and coprocessor systems\",\"authors\":\"Clícia Pinto, Marcos E. Barreto, M. Boratto\",\"doi\":\"10.1109/AICCSA.2016.7945637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallel algorithms that efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi-GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.\",\"PeriodicalId\":448329,\"journal\":{\"name\":\"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)\",\"volume\":\"235 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2016.7945637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2016.7945637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Auto-tuning TRSM with an asynchronous task assignment model on multicore, multi-GPU and coprocessor systems
The increasing need for computing power today justifies the continuous search for techniques that decrease the time to answer usual computational problems. To take advantage of new hybrid parallel architectures composed by multithreading and multiprocessor hardware, our current efforts involve the design and validation of highly parallel algorithms that efficently explore the characteristics of such architectures. In this paper, we propose an automatic tuning methodology to easily exploit multicore, multi-GPU and coprocessor systems. We present an optimization of an algorithm for solving triangular systems (TRSM), based on block decomposition and asynchronous task assignment, and discuss some results.