{"title":"在异构系统上引入核心内混合逻辑单元实现","authors":"Cheng Chen, Canqun Yang","doi":"10.1109/ICIVC.2017.7984721","DOIUrl":null,"url":null,"abstract":"Matrix factorization (MF) is a employed by many algorithms, such as collaborating filtering, text mining and deriving hidden features of words. Out-of-core heterogeneous MF implementations are recently used to take advantage of state-of-the-art architecture and can solve problems larger than the available memory of coprocessors. Due to the data set cannot fit into the limited amount of device memory, frequently data transfers take place between the hosts and coprocessors via the costly PCIe bus. With the increasing of coprocessor's in-card memory, we introduce an in-core hybrid MF algorithm, e.g. LU factorization, on a CPU-MIC system to minimize such data movement. Validation on the Tianhe-2 supercomputer shows that our in-core implementation competes with the highly optimized MKL which is an out-of-core hybrid LU implementation and achieves about 5 × speedup versus the CPU version.","PeriodicalId":181522,"journal":{"name":"2017 2nd International Conference on Image, Vision and Computing (ICIVC)","volume":"255 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Introducing an in-core hybrid LU implementation on heterogeneous systems\",\"authors\":\"Cheng Chen, Canqun Yang\",\"doi\":\"10.1109/ICIVC.2017.7984721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix factorization (MF) is a employed by many algorithms, such as collaborating filtering, text mining and deriving hidden features of words. Out-of-core heterogeneous MF implementations are recently used to take advantage of state-of-the-art architecture and can solve problems larger than the available memory of coprocessors. Due to the data set cannot fit into the limited amount of device memory, frequently data transfers take place between the hosts and coprocessors via the costly PCIe bus. With the increasing of coprocessor's in-card memory, we introduce an in-core hybrid MF algorithm, e.g. LU factorization, on a CPU-MIC system to minimize such data movement. Validation on the Tianhe-2 supercomputer shows that our in-core implementation competes with the highly optimized MKL which is an out-of-core hybrid LU implementation and achieves about 5 × speedup versus the CPU version.\",\"PeriodicalId\":181522,\"journal\":{\"name\":\"2017 2nd International Conference on Image, Vision and Computing (ICIVC)\",\"volume\":\"255 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 2nd International Conference on Image, Vision and Computing (ICIVC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIVC.2017.7984721\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Image, Vision and Computing (ICIVC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIVC.2017.7984721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Introducing an in-core hybrid LU implementation on heterogeneous systems
Matrix factorization (MF) is a employed by many algorithms, such as collaborating filtering, text mining and deriving hidden features of words. Out-of-core heterogeneous MF implementations are recently used to take advantage of state-of-the-art architecture and can solve problems larger than the available memory of coprocessors. Due to the data set cannot fit into the limited amount of device memory, frequently data transfers take place between the hosts and coprocessors via the costly PCIe bus. With the increasing of coprocessor's in-card memory, we introduce an in-core hybrid MF algorithm, e.g. LU factorization, on a CPU-MIC system to minimize such data movement. Validation on the Tianhe-2 supercomputer shows that our in-core implementation competes with the highly optimized MKL which is an out-of-core hybrid LU implementation and achieves about 5 × speedup versus the CPU version.