2014 IEEE 28th International Parallel and Distributed Processing Symposium最新文献

筛选
英文 中文
Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters 分布式可持续数据中心中异构感知的工作负载放置和迁移
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.41
Dazhao Cheng, Changjun Jiang, Xiaobo Zhou
{"title":"Heterogeneity-Aware Workload Placement and Migration in Distributed Sustainable Datacenters","authors":"Dazhao Cheng, Changjun Jiang, Xiaobo Zhou","doi":"10.1109/IPDPS.2014.41","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.41","url":null,"abstract":"While major cloud service operators have taken various initiatives to operate their sustainable data enters with green energy, it is challenging to effectively utilize the green energy since its generation depends on dynamic natural conditions. Fortunately, the geographical distribution of data enters provides an opportunity for optimizing the system performance by distributing cloud workloads. In this paper, we propose a holistic heterogeneity-aware cloud workload placement and migration approach, sCloud, that aims to maximize the system good put in distributed self-sustainable data enters. sCloud adaptively places the transactional workload to distributed data enters, allocates the available resource to heterogeneous workloads in each data enter, and migrates batch jobs across data enters, while taking into account the green power availability and QoS requirements. We formulate the transactional workload placement as a constrained optimization problem that can be solved by nonlinear programming. Then, we propose a batch job migration algorithm to further improve the system good put when the green power supply varies widely at different locations. We have implemented sCloud in a university cloud test bed with real-world weather conditions and workload traces. Experimental results demonstrate sCloud can achieve near-to-optimal system performance while being resilient to dynamic power availability. It outperforms a heterogeneity-oblivious approach by 26% in improving system good put and 29% in reducing QoS violations.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114785220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series 多变量时间序列的高效通信分布式方差监测与离群点检测
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.16
Moshe Gabel, A. Schuster, D. Keren
{"title":"Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series","authors":"Moshe Gabel, A. Schuster, D. Keren","doi":"10.1109/IPDPS.2014.16","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.16","url":null,"abstract":"Modern scale-out services are comprised of thousands of individual machines, which must be continuously monitored for unexpected failures. One recent approach to monitoring is latent fault detection, an adaptive statistical framework for scale-out, load-balanced systems. By periodically measuring hundreds of performance metrics and looking for outlier machines, it attempts to detect subtle problems such as misconfigurations, bugs, and malfunctioning hardware, before they manifest as machine failures. Previous work on a large, real-world Web service has shown that many failures are indeed preceded by such latent faults. Latent fault detection is an offline framework with large bandwidth and processing requirements. Each machine must send all its measurements to a centralized location, which is prohibitive in some settings and requires data-parallel processing infrastructure. In this work we adapt the latent fault detector to provide an online, communication- and computation-reduced version. We utilize stream processing techniques to trade accuracy for communication and computation. We first describe a novel communication-efficient online distributed variance monitoring algorithm that provides a continuous estimate of the global variance within guaranteed approximation bounds. Using the variance monitor, we provide an online distributed outlier detection framework for non-stationary multivariate time series common in scale-out systems. The adapted framework reduces data size and central processing cost by processing the data in situ, making it usable in wider settings. Like the original framework, our adaptation admits different comparison functions, supports non-stationary data, and provides statistical guarantees on the rate of false positives. Simulations on logs from a production system show that we are able to reduce bandwidth by an order of magnitude, with below 1% error compared to the original algorithm.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117180604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers 并行计算机任务映射中的几何划分
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.15
Mehmet Deveci, S. Rajamanickam, V. Leung, K. Pedretti, Stephen L. Olivier, David P. Bunde, Ümit V. Çatalyürek, K. Devine
{"title":"Exploiting Geometric Partitioning in Task Mapping for Parallel Computers","authors":"Mehmet Deveci, S. Rajamanickam, V. Leung, K. Pedretti, Stephen L. Olivier, David P. Bunde, Ümit V. Çatalyürek, K. Devine","doi":"10.1109/IPDPS.2014.15","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.15","url":null,"abstract":"We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a job are not necessarily located within a contiguous block nor within close proximity to each other in the network. The goal is to assign tasks to cores so that interdependent tasks are performed by \"nearby\" cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We show that, for the structured finite difference mini-app Mini Ghost, our mapping method reduced execution time 34% on average on 65,536 cores of a Cray XE6. In a molecular dynamics mini-app, Mini MD, our mapping method reduced communication time by 26% on average on 6144 cores. We also compare our mapping with graph-based mappings from the LibTopoMap library and show that our mappings reduced the communication time on average by 15% in MiniGhost and 10% in MiniMD.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115076130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems CPU-GPU耦合算法在多孔介质系统多相流解模拟与分析中的千兆级应用
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.67
J. McClure, Hao Wang, J. Prins, Cass T. Miller, Wu-chun Feng
{"title":"Petascale Application of a Coupled CPU-GPU Algorithm for Simulation and Analysis of Multiphase Flow Solutions in Porous Medium Systems","authors":"J. McClure, Hao Wang, J. Prins, Cass T. Miller, Wu-chun Feng","doi":"10.1109/IPDPS.2014.67","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.67","url":null,"abstract":"Large-scale simulation can provide a wide range of information needed to develop and validate theoretical models for multiphase flow in porous medium systems. In this paper, we consider a coupled solution in which a multiphase flow simulator is coupled to an analysis approach used to extract the interfacial geometries as the flow evolves. This has been implemented using MPI to target heterogeneous nodes equipped with GPUs. The GPUs evolve the multiphase flow solution using the lattice Boltzmann method while the CPUs compute up scaled measures of the morphology and topology of the phase distributions and their rate of evolution. Our approach is demonstrated to scale to 4,096 GPUs and 65,536 CPU cores to achieve a maximum performance of 244,754 million-lattice-node updates per second (MLUPS) in double precision execution on Titan. In turn, this approach increases the size of systems that can be considered by an order of magnitude compared with previous work and enables detailed in situ tracking of averaged flow quantities at temporal resolutions that were previously impossible. Furthermore, it virtually eliminates the need for post-processing and intensive I/O and mitigates the potential loss of data associated with node failures.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128734712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Using Multiple Threads to Accelerate Single Thread Performance 使用多线程加速单线程性能
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.104
Zehra Sura, K. O'Brien, J. Brunheroto
{"title":"Using Multiple Threads to Accelerate Single Thread Performance","authors":"Zehra Sura, K. O'Brien, J. Brunheroto","doi":"10.1109/IPDPS.2014.104","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.104","url":null,"abstract":"Computing systems are being designed with an increasing number of hardware cores. To effectively use these cores, applications need to maximize the amount of parallel processing and minimize the time spent in sequential execution. In this work, we aim to exploit fine-grained parallelism beyond the parallelism already encoded in an application. We define an execution model using a primary core and some number of secondary cores that collaborate to speed up the execution of sequential code regions. This execution model relies on cores that are physically close to each other and have fast communication paths between them. For this purpose, we introduce dedicated hardware queues for low-latency transfer of values between cores, and define special \"enque\" and \"deque\" instructions to use the queues. Further, we develop compiler analyses and transformations to automatically derive fine-grained parallel code from sequential code regions. We implemented this model for exploiting fine-grained parallelization in the IBM XL compiler framework and in a simulator for the Blue Gene/Q system. We also studied the Sequoia benchmarks to determine code sections where our techniques are applicable. We evaluated our work using these code sections, and observed an average speedup of 1.32 on 2 cores, and an average speedup of 2.05 on 4 cores. Since these code sections are otherwise sequentially executed, we conclude that our approach is useful for accelerating single thread performance.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116530275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
EDM: An Endurance-Aware Data Migration Scheme for Load Balancing in SSD Storage Clusters EDM:基于SSD存储集群负载均衡的持久性数据迁移方案
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.86
Jiaxin Ou, J. Shu, Youyou Lu, Letian Yi, Wei Wang
{"title":"EDM: An Endurance-Aware Data Migration Scheme for Load Balancing in SSD Storage Clusters","authors":"Jiaxin Ou, J. Shu, Youyou Lu, Letian Yi, Wei Wang","doi":"10.1109/IPDPS.2014.86","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.86","url":null,"abstract":"Data migration schemes are critical to balance the load in storage clusters for performance improvement. However, as NAND flash based SSDs are widely deployed in storage systems, extending the lifespan of SSD storage clusters becomes a new challenge for data migration. Prior approaches designed for HDD storage clusters, however, are inefficient due to excessive write amplification during data migration, which significantly decrease the lifespan of SSD storage clusters. To overcome this problem, we propose EDM, an endurance aware data migration scheme with careful data placement and movement to minimize the data migrated, so as to limit the worn-out of SSDs while improving the performance. Based on the observation that performance degradation is dominated by the wear speed of an SSD, which is affected by both the storage utilization and the write intensity, two complementary data migration policies are designed to explore the trade-offs among throughput, response time during migration, and lifetime of SSD storage clusters. Moreover, we design an SSD wear model and quantitatively calculate the amount of data migrated as well as the sources and destinations of the migration, so as to reduce the write amplification caused by migration. Results on a real storage cluster using real-world traces show that EDM performs favorably versus existing HDD based migration techniques, reducing cluster-wide aggregate erase count by up to 40%. In the meantime, it improves the performance by 25% on average compared to the baseline system which achieves almost the same effectiveness of performance improvement as previous migration techniques.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126999138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications 面向大规模高性能计算应用的多级检查点模型优化
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.122
S. Di, M. Bouguerra, L. Bautista-Gomez, F. Cappello
{"title":"Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications","authors":"S. Di, M. Bouguerra, L. Bautista-Gomez, F. Cappello","doi":"10.1109/IPDPS.2014.122","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.122","url":null,"abstract":"HPC community projects that future extreme scale systems will be much less stable than current Petascale systems, thus requiring sophisticated fault tolerance to guarantee the completion of large scale numerical computations. Execution failures may occur due to multiple factors with different scales, from transient uncorrectable memory errors localized in processes to massive system outages. Multi-level checkpoint/restart is a promising model that provides an elastic response to tolerate different types of failures. It stores checkpoints at different levels: e.g., local memory, remote memory, using a software RAID, local SSD, remote file system. In this paper, we respond to two open questions: 1) how to optimize the selection of checkpoint levels based on failure distributions observed in a system, 2) how to compute the optimal checkpoint intervals for each of these levels. The contribution is three-fold. (1) We build a mathematical model to fit the multi-level checkpoint/restart mechanism with large scale applications regarding various types of failures. (2) We theoretically optimize the entire execution performance for each parallel application by selecting the best checkpoint level combination and corresponding checkpoint intervals. (3) We characterize checkpoint overheads on different checkpoint levels in a real cluster environment, and evaluate our optimal solutions using both simulation with millions of cores and real environment with real-world MPI programs running on hundreds of cores. Experiments show that optimized selections of levels associated with optimal checkpoint intervals at each level outperforms other state-of-the-art solutions by 5-50 percent.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130637900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
Designing Bit-Reproducible Portable High-Performance Applications 设计位可复制的便携式高性能应用程序
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.127
Andrea Arteaga, O. Fuhrer, T. Hoefler
{"title":"Designing Bit-Reproducible Portable High-Performance Applications","authors":"Andrea Arteaga, O. Fuhrer, T. Hoefler","doi":"10.1109/IPDPS.2014.127","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.127","url":null,"abstract":"Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the possible sources of non-reproducibility. In particular, we consider the tasks of evaluating transcendental functions and performing reductions using non-associative operators. We present a set of techniques to achieve reproducibility and we propose improvements over existing algorithms to perform reproducible computations in a portable way, at the same time obtaining good performance and accuracy. By applying these techniques to more complex tasks we show that bit-reproducibility can be achieved on a broad range of scientific applications.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131269544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures 运行时导向的多核架构缓存一致性优化
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.71
M. Manivannan, P. Stenström
{"title":"Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures","authors":"M. Manivannan, P. Stenström","doi":"10.1109/IPDPS.2014.71","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.71","url":null,"abstract":"Emerging task-based parallel programming models shield programmers from the daunting task of parallelism management by delegating the responsibility of mapping and scheduling of individual tasks to the runtime system. The runtime system can use semantic information about task dependencies supplied by the programmer and the mapping information of tasks to enable optimizations like data-flow based execution and locality-aware scheduling of tasks. However, should the cache coherence substrate have access to this information from the runtime system, it would enable aggressive optimizations of prevailing access patterns such as one-to-many producer-consumer sharing and migratory sharing. Such linkage has however not been studied before. We present a family of runtime guided cache coherence optimizations enabled by linking dependency and mapping information from the runtime system to the cache coherence substrate. By making this information available to the cache coherence substrate, we show that optimizations, such as downgrading and self-invalidation, that help reducing overheads associated with producer-consumer and migratory sharing can be supported with reasonable extensions to the baseline cache coherence protocol. Our experimental results establish that each optimization provides significant performance gain in isolation and can provide additional gains when combined. Finally, we evaluate these optimizations in the context of earlier proposed runtime-guided prefetching schemes and show that they can have synergistic effects.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132388028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures MIC-SVM:为先进的现代多核和多核架构设计一个高效的支持向量机
2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.88
Yang You, S. Song, H. Fu, A. Márquez, M. Dehnavi, K. Barker, K. Cameron, A. Randles, Guangwen Yang
{"title":"MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core Architectures","authors":"Yang You, S. Song, H. Fu, A. Márquez, M. Dehnavi, K. Barker, K. Cameron, A. Randles, Guangwen Yang","doi":"10.1109/IPDPS.2014.88","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.88","url":null,"abstract":"Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. Advanced multi- and many-core architectures offer massive parallelism with complex memory hierarchies which can make runtime training possible, but form a barrier to efficient parallel SVM design. To address the challenges above, we designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multi-core and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools. MIC-SVM achieves 4.4-84x and 18-47x speedups against the popular LIBSVM, on MIC and Ivy Bridge CPUs respectively, for several real-world data-mining datasets. Even compared with GPUSVM, run on a top of the line NVIDIA k20x GPU, the performance of our MIC-SVM is competitive. We also conduct a cross-platform performance comparison analysis, focusing on Ivy Bridge CPUs, MIC and GPUs, and provide insights on how to select the most suitable advanced architectures for specific algorithms and input data patterns.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信