2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing最新文献

筛选
英文 中文
Scalable Huge Directories through OSD+ Devices 通过OSD+设备扩展大目录
Ana Aviles-González, J. Piernas, Pilar González-Férez
{"title":"Scalable Huge Directories through OSD+ Devices","authors":"Ana Aviles-González, J. Piernas, Pilar González-Férez","doi":"10.1109/PDP.2013.11","DOIUrl":"https://doi.org/10.1109/PDP.2013.11","url":null,"abstract":"Management of directories with millions of files, accessed by thousands of clients at the same time, is a problem recently identified in HPC environments. This paper introduces an OSD+-based technique to deal with those directories. We use directory objects in OSD+ devices for dynamically distributing a huge directory among several servers. Directory objects work independently, achieving good performance and scalability. Experiments show that, by using just 8 OSD+s and Ext4, FPFS is able to create, stat and delete more than 70,000, 120,000 and 37,000 files per second, respectively. With ReiserFS, these numbers are 118,000, 97,000 and 67,000. Experiments, however, have produced unforeseen results too. While distribution is beneficial when a huge directory is accessed by many clients, it can also downgrade the performance when several huge directories are concurrently accessed by a few clients.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fewest Common Hops (FCH): An Improved Peer Selection Approach for P2P Applications 最小公共跳数(FCH):一种改进的P2P应用对等选择方法
H. Ijaz, S. Saleem, M. Welzl
{"title":"Fewest Common Hops (FCH): An Improved Peer Selection Approach for P2P Applications","authors":"H. Ijaz, S. Saleem, M. Welzl","doi":"10.1109/PDP.2013.73","DOIUrl":"https://doi.org/10.1109/PDP.2013.73","url":null,"abstract":"Underlay-unawareness in P2P systems can result in sub-optimal peer selection for overlay routing and hence poor performance. The majority of underlay aware proposals for peer selection focus on finding the shortest overlay routes by selecting the nearest peers according to proximity. However, in case of multiple and parallel downloads, if the underlay paths between a downloader and its selected nearest peers share a bottleneck, this can cause congestion, leading to performance deterioration instead of improvement. This effect was neglected in previous work because, in today's Internet, the bottleneck is usually not shared as it is the end user's access link. This is no longer the case in more modern scenarios, e.g. with FTTH or with upcoming in-network caching techniques such as DECADE. We propose an improved peer selection approach for P2P applications called Fewest Common Hops (FCH) that ensures proximity based node selection having maximum path disjointness. It is a client based, infrastructure independent heuristic to optimize download time for multiple and parallel downloads in P2P content distribution applications. Simulations show that, even when FCH is implemented in the simplest possible fashion (using only traceroute), it can significantly decrease the download time.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114908672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels ELMO:在OpenCL内核中启用本地内存的用户友好API
Jianbin Fang, A. Varbanescu, Jie Shen, H. Sips
{"title":"ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels","authors":"Jianbin Fang, A. Varbanescu, Jie Shen, H. Sips","doi":"10.1109/pdp.2013.61","DOIUrl":"https://doi.org/10.1109/pdp.2013.61","url":null,"abstract":"Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO, that improves productivity while preserving high performance of local memory operations. Specifically, ELMO is a generic API that covers different local memory use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired optimizations to maximize their performance. Experimental results on the NVIDIA Quadro5000 GPU show that performance is significantly improved by using ELMO on native implementations: the achieved speedup ranges from 1.3x to 3.7x. Furthermore, using ELMO we still achieve performance comparable (if not better) with that of hand-tuned applications, while the code is shorter, clearer, and safer.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114968776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Optimization Techniques for Dimensionally Truncated Sparse Grids on Heterogeneous Systems 异构系统上维截断稀疏网格的优化技术
Andrei Deftu, A. Murarasu
{"title":"Optimization Techniques for Dimensionally Truncated Sparse Grids on Heterogeneous Systems","authors":"Andrei Deftu, A. Murarasu","doi":"10.1109/PDP.2013.57","DOIUrl":"https://doi.org/10.1109/PDP.2013.57","url":null,"abstract":"Given the existing heterogeneous processor landscape dominated by CPUs and GPUs, topics such as programming productivity and performance portability have become increasingly important. In this context, an important question refers to how can we develop optimization strategies that cover both CPUs and GPUs. We answer this for fastsg, a library that provides functionality for handling efficiently high-dimensional functions. As it can be employed for compressing and decompressing large-scale simulation data, it finds itself at the core of a computational steering application which serves us as test case. We describe our experience with implementing fastsg's time critical routines for Intel CPUs and Nvidia Fermi GPUs. We show the differences and especially the similarities between our optimization strategies for the two architectures. With regard to our test case for which achieving high speedups is a \"must'\" for real-time visualization, we report a speedup of up to 6.2x times compared to the state-of-the-art implementation of the sparse grid technique for GPUs.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114170063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 片上分组交换网络中功率和性能有效的部分电路
N. Teimouri, M. Modarressi, H. Sarbazi-Azad
{"title":"Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip","authors":"N. Teimouri, M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/PDP.2013.82","DOIUrl":"https://doi.org/10.1109/PDP.2013.82","url":null,"abstract":"In this paper, we propose a hybrid packet-circuit switching for networks-on-chip to benefit from the advantages of both switching mechanisms. Integrating circuit and packet switching into a single NoC is achieved by partitioning the link bandwidth and router data-path and control-path elements into two parts and allocating each part to one of the switching methods. In this NoC, during injection in the source node, packets are initially forwarded on the packet-switched sub-network, but keep requesting a circuit towards the destination node. The circuit-switched part, at each cycle, collects the circuit construction requests, performs arbitration among the conflicting requests, and constructs circuits over the unallocated circuit-switched sub-network links. Unlike traditional circuit-switching, the circuit end point in this NoC is not necessarily the packet destination, rather the circuits can be terminated in any intermediate node between the packet source and destination nodes. At that node, the packet may either travel over another circuit (in case of successful circuit request) or continue its path over the packet-switched part. Therefore, packets may switch between the two sub-networks several times during their life-time in the network. Circuit construction is handled by a low-latency and low-cost setup network. To keep the complexity of the circuit construction low, the circuits are restricted to span within a neighborhood of d hops of the requesting node. The experimental results show considerable improvement in energy and latency over a traditional packet-switched NoC.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115255881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Towards a Graceful Degradable Multicore-System by Hierarchical Handling of Hard Errors 通过分层处理硬错误实现优雅的可降解多核系统
Sebastian Müller, Mario Schölzel, H. Vierhaus
{"title":"Towards a Graceful Degradable Multicore-System by Hierarchical Handling of Hard Errors","authors":"Sebastian Müller, Mario Schölzel, H. Vierhaus","doi":"10.1109/PDP.2013.51","DOIUrl":"https://doi.org/10.1109/PDP.2013.51","url":null,"abstract":"We present a novel concept for handling permanent faults in a statically scheduled heterogeneous multi-core system by means of a software-based self-reconfiguration. Hard faults are handled in a hierarchical cross layer manner, either locally by each core itself, or globally by reconfiguring the full system. Local reconfiguration of a defect core is based on the adaptation of the executed task to the current fault state of the core, such that defect components are never used. This adaptation is achieved by a rescheduling of the program code of the task. If this local reconfiguration fails, then the binding of tasks to cores is modified. Because heterogeneous cores are allowed, this may require a rescheduling of the tasks whose binding is changed. Estimations for the runtime of such a global reconfiguration are presented. Moreover, it is shown that systems that support the global reconfiguration achieve the same fault tolerance level as systems with local repair only, but with reduced hardware overhead.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133364146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Solving the Linearized Poisson-Boltzmann Equation on GPUs Using CUDA 基于CUDA在gpu上求解线性化泊松-玻尔兹曼方程
José Colmenares, Jesús Ortiz, S. Decherchi, A. Fijany, W. Rocchia
{"title":"Solving the Linearized Poisson-Boltzmann Equation on GPUs Using CUDA","authors":"José Colmenares, Jesús Ortiz, S. Decherchi, A. Fijany, W. Rocchia","doi":"10.1109/PDP.2013.67","DOIUrl":"https://doi.org/10.1109/PDP.2013.67","url":null,"abstract":"In this work an implementation of a linearized Poisson-Boltzmann equation solver based on a Finite Differences scheme on the GPU architecture is presented. The algorithm exploits the checkerboard structure of the discretized Laplace operator and follows the footprints of a popular solver called DelPhi, which is widely used in the Computational Biology community. The algorithm has been implemented using CUDA. This implementation has then been integrated with the DelPhi solver and tested over a few representative cases of biological interest. Details of the implementation as well as performance test results are illustrated.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123622195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SecMon: A Secure Introspection Framework for Hardware Virtualization SecMon:硬件虚拟化的安全自省框架
Xiaolong Wu, Yunwei Gao, Xinhui Tian, Ying Song, Bing Guo, Baiming Feng, Yuzhong Sun
{"title":"SecMon: A Secure Introspection Framework for Hardware Virtualization","authors":"Xiaolong Wu, Yunwei Gao, Xinhui Tian, Ying Song, Bing Guo, Baiming Feng, Yuzhong Sun","doi":"10.1109/PDP.2013.48","DOIUrl":"https://doi.org/10.1109/PDP.2013.48","url":null,"abstract":"With the fusion of cloud computing and virtualization technology, system security under virtualization becomes a key point in recent research. As a foundational technology to construct a secure system, virtual machine introspection receives more attention than ever. Almost all of the existing virtual machine monitors take the privileged virtual machine (Domain-0) as the monitoring machine, which ignore the threats brought by Domain-0 because of its huge code base of user-level tools. Besides, para-virtualized machines cannot provide the basic support for popular security applications of Windows operating system. This paper proposes a secure monitoring framework based on hardware virtualization. We use Windows operating system to build a monitoring virtual machine in hardware virtual machine domain, and set up monitoring mechanism in it. In addition, the security of the Windows monitoring machine itself is ensured all through its lifetime-bootstrap and runtime. The experiments show our secure monitoring system performs well in the secure monitoring process. The performance overhead it brings is considered to be acceptable.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121613499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Data Intensive Computing of X-Ray Computed Tomography Reconstruction at the LSDF LSDF下x射线计算机断层扫描重建的数据密集型计算
Xiaoli Yang, T. Jejkal, H. Pasic, R. Stotzka, A. Streit, J. V. Wezel, T. Rolo
{"title":"Data Intensive Computing of X-Ray Computed Tomography Reconstruction at the LSDF","authors":"Xiaoli Yang, T. Jejkal, H. Pasic, R. Stotzka, A. Streit, J. V. Wezel, T. Rolo","doi":"10.1109/PDP.2013.21","DOIUrl":"https://doi.org/10.1109/PDP.2013.21","url":null,"abstract":"In this paper, the method of data intensive computing is studied for large amounts of data in computed tomography (CT). An automatic workflow is built up to connect the tomography beamline of ANKA with the large scale data facility (LSDF), able to enhance the data storage and analysis efficiency. In this workflow, this paper focuses on the parallel computing of 3D computed tomography reconstruction. Different from the existing reconstruction system with filtered back-projection method, an algebraic reconstruction technique based on compressive sampling theory is presented to reconstruct the data from ultrafast computed tomography with fewer projections. Then the connected computing resources at the LSDF are used to implement the 3D CT reconstruction by distributing the whole job into multiple tasks executed in parallel. Promising reconstruction images and high computing performance are reported. For the 3D X-ray CT reconstruction, less than six minutes are actually required. LSDF is not only able to organize data efficiently, but also can provide reconstructed results to users in nearly instantaneous time. After integration into the workflow, this data intensive computing method will largely improve the data processing for ultrafast computed tomography at ANKA.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128684533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Distributed Eigensolver for Loosely Coupled Networks 松耦合网络的分布式特征解算器
H. Straková, W. Gansterer
{"title":"A Distributed Eigensolver for Loosely Coupled Networks","authors":"H. Straková, W. Gansterer","doi":"10.1109/PDP.2013.18","DOIUrl":"https://doi.org/10.1109/PDP.2013.18","url":null,"abstract":"We introduce a new distributed eigensolver (dOI) for square matrices based on orthogonal iteration. In contrast to standard parallel eigensolvers, our approach performs only nearest neighbor communication and provides much more flexibility with respect to the properties of the hardware infrastructure on which the computation is performed. This is achieved by utilizing distributed summation methods with randomized communication schedules which do not require global synchronization across the nodes. Our algorithm is particularly attractive for loosely coupled distributed networks with arbitrary network topologies and potentially unreliable components. Our distributed eigensolver dOI is based on a novel distributed matrix-matrix multiplication algorithm and on an extension of a distributed QR factorization algorithm proposed earlier. We illustrate the advantages of dOI in terms of higher flexibility with respect to the underlying network and lower communication cost compared to a related distributed eigensolver by Kempe and McSherry. Moreover, we experimentally illustrate how the overall communication cost of dOI is further reduced by adapting the accuracy of each distributed summation during the orthogonal iteration process.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信