{"title":"Disjoint-Paths and Fault-Tolerant Routing on Recursive Dual-Net","authors":"Yamin Li, S. Peng, Wanming Chu","doi":"10.1142/S0129054111008532","DOIUrl":"https://doi.org/10.1142/S0129054111008532","url":null,"abstract":"The recursive dual-net is a newly proposed interconnection network for of massive parallel computers. The recursive dual-net is based on a recursive dual-construction of a base network. A $bm{k}$-level dual-construction for $bm{k≫0}$ creates a network containing $bm{(2n_0)^{2^k}/2}$ nodes with node-degree $bm{d_0+k}$, where $bm{n_0}$ and $bm{d_0}$ are the number of nodes and the node-degree of the base network, respectively. The recursive dual-net is node and edge symmetric and can contain huge number of nodes with small node-degree and short diameter. Disjoint-paths routing and fault-tolerant routing are fundamental and critical issues for the performance of an interconnection network. In this paper, we propose efficient algorithms for disjoint-paths and fault-tolerant routings on the recursive dual-net.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115095697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Kamiya, Tomoaki Tsumura, H. Matsuo, Y. Nakashima
{"title":"A Speculative Technique for Auto-Memoization Processor with Multithreading","authors":"Y. Kamiya, Tomoaki Tsumura, H. Matsuo, Y. Nakashima","doi":"10.1109/PDCAT.2009.67","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.67","url":null,"abstract":"We have proposed an auto-memoization processor. This processor automatically and dynamically memoizes both functions and loop iterations, and skips their execution by reusing their results. On the other hand, multi/many-core processors have come into wide use. The number of cores is expected to increase to a hundred or more. However, many programs do not have so much parallelism in them. Therefore it becomes very important to consider how to utilize many cores effectively. This paper describes a speedup technique for auto-memoization processor using speculative multi-threading. Two speculative threads will be forked on reuse test. The one assumes that the reuse test will succeed, and executes the following codes of the reuse target block speculatively. The other assumes that the reuse test will fail, and executes the reuse target block. These two threads conceal the overhead of auto-memoization processor. The result of the experiment with SPEC CPU95 suite benchmarks shows that proposing method improves the maximum speedup from 13.9% to 36.0%.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to be a More Efficient Snoop: Refined Probe Complexity of Quorum Sets","authors":"Timo Warns, C. Storm, Oliver E. Theel","doi":"10.1109/PDCAT.2009.31","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.31","url":null,"abstract":"Quorums are flexible and well-studied means for implementing fault-tolerant distributed systems. The probe complexity gives the number of probes required to find a quorum of noncrashed processes or to reveal that no such quorum currently exists. In this paper, we refine the original notion of probe complexity by explicitly considering the underlying failure model. A refined probe complexity gives a tight bound on the number of required probes, which is lower than the original probe complexity for most failure models. Additionally, we present a universal probe strategy that is defined for all quorum sets and exhibits the refined probe complexity in the worst case. In contrast, previous probe strategies were limited to special quorum sets, namely to coteries, and meet the original probe complexity only for special (i. e., nondominated) coteries.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121725277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Sensor Acoustic Feature Extraction for Embedded Realtime Vehicle Classification","authors":"Andreas Starzacher, B. Rinner","doi":"10.1109/PDCAT.2009.18","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.18","url":null,"abstract":"Vehicle classification is an important task for various traffic monitoring applications. This paper investigates the capabilities of acoustic feature generation for vehicle classification. Six temporal and spectral features are extracted from the audio recordings. Six different classification algorithms are compared using the extracted features. We focus on a single sensor setting to keep the computational effort low and evaluate its classification accuracy and real-time performance. The experimental evaluation is performed on our embedded platform using recorded data of about 150 vehicles. The results are applied in our ongoing research on fusing video, laser and acoustic data for real-time traffic monitoring.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122898276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting Partial Ordering with the Parallel Iterator","authors":"Nasser Giacaman, O. Sinnen","doi":"10.1109/PDCAT.2009.11","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.11","url":null,"abstract":"With the advent of multi-core processors, desktop application developers must finally face parallel computing and its challenges. A large portion of the computational load in a program rests within iterative computations. In object-oriented languages these are commonly handled using iterators which are inadequate for parallel programming. Consequently, the powerful Parallel Iterator concept was developed. This paper presents various developments of the Parallel Iterator, such as parallel traversal of complex collections with partial ordering (such as a tree). Other features include reductions, parallel remove semantics and exception handling. Along with the ease of use, the results reveal great speedup in comparison to traditional Java parallelism approaches.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124456416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equi-Width Data Swapping for Private Data Publication","authors":"Yidong Li, Hong Shen","doi":"10.1109/PDCAT.2009.69","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.69","url":null,"abstract":"Data Swapping is a popular value-invariant data perturbation technique. The quality of a data swapping method is measured by how well it preserves data privacy and data utility. As swapping data globally is computationally impractical, to guarantee its performance in these metrics appropriate, localization schemes are often conducted in advance. Equi-depth partitioning is preferred by most of the existing data perturbation techniques as it provides uniform privacy protection for each data tuple. However, this method performs ineffectively for two types of applications: one is to maintain statistics based on equi-width partitioning, such as the multivariate histogram with equal bin width, and the other is to preserve parametric statistics, such as covariance, in the context of sparse data with non-uniform distribution. As a natural solution for the above application, this paper explores the possibility of using data swapping with equi-width partitioning for private data publication, which has been little used in data perturbation due to the difficulty of preserving data privacy. With extensive theoretical analysis and experimental results, we show that, Equi-Width Swapping (EWS)can achieve a similar performance in privacy preservation to that of Equi-Depth Swapping (EDS) if the number of partitions is sufficiently large (e. g. &get;=sqrt(N), where N is the size of dataset). Our experimental results in both synthetic and real-world data validate our theoretical analysis.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133340380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Horng, Yuan-Hsin Chen, R. Run, Rong-Jian Chen, Jui-Lin Lai, Kevin Octavius Sentosa
{"title":"An Improved Score Level Fusion in Multimodal Biometric Systems","authors":"S. Horng, Yuan-Hsin Chen, R. Run, Rong-Jian Chen, Jui-Lin Lai, Kevin Octavius Sentosa","doi":"10.1109/PDCAT.2009.82","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.82","url":null,"abstract":"In a multimodal biometric system, the effective fusion method is necessary for combining information from various single modality systems. In this paper we examined the performance of sum rule-based score level fusion and Support Vector Machines (SVM)-based score level fusion. Three biometric characteristics were considered in this study: fingerprint, face, and finger vein. We also proposed a new robust normalization scheme (Reduction of High-scores Effect normalization) which is derived from min-max normalization scheme. Experiments on four different multimodal databases suggest that integrating the proposed scheme in sum rule-based fusion and SVM-based fusion leads to consistently high accuracy.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124513022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generalized Multi-Organization Scheduling on Unrelated Parallel Machines","authors":"Fukuhito Ooshita, Tomoko Izumi, Taisuke Izumi","doi":"10.1109/PDCAT.2009.26","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.26","url":null,"abstract":"We consider the parallel computing environment where m organizations provide machines and several jobs to be executed. While cooperation of organizations is required to minimize the global makespan, each organization also expects the faster completion of its own jobs primarily and thus it is not necessarily cooperative. To handle the situations, we formulate the alpha-cooperative multi-organization scheduling problem (alpha-MOSP), where alpha≫= 1 is a parameter representing the degree of cooperativeness.alpha-MOSP minimizes the makespan under the cooperation constraint that each organization does not allow the completion time of its own jobs to be delayed alpha times of that in the case where those jobs are executed by itself. In this paper, we aim to reveal the relation between the makespan and the degree of cooperativeness. First, we investigate the relation between alpha and the quality of the global makespan. For alpha=1 (i. e., each organization never sacrifices its completion time), we show an instance where the cooperation constraint degrades the optimal makespan by $m$ times. In contrast, for alpha≫1, we can construct an algorithm transforming any unconstrained schedule to one satisfying the cooperation constraint. This algorithm bounds the degradation ratio by alpha / (alpha - 1), which implies that weak cooperation improves the makespan dramatically. Second, we study the complexity of alpha-MOSP. We show its strongly NP-hardness and inapproximability for the approximation factor less than max{(alpha + 1)/alpha, 3/2}. We also show the hardness of transformation: Even if an optimal schedule under no cooperation constraint is given, no polynomial-time algorithm finds an optimal schedule for alpha-MOSP. This result is a witness for inexistence of general polynomial-time transformation algorithms that preserve the approximation ratio.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121362382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maotai 2.0: Data Race Prevention in View-Oriented Parallel Programming","authors":"K. Leung, Zhiyi Huang, Qihang Huang, P. Werstein","doi":"10.1109/PDCAT.2009.12","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.12","url":null,"abstract":"This paper proposes a data race prevention scheme, which can prevent data races in the View-Oriented Parallel Programming (VOPP) model. VOPP is a novel shared-memory data-centric parallel programming model, which uses views to bundle mutual exclusion with data access. We have implemented the data race prevention scheme with a memory protection mechanism. Experimental results show that the extra overhead of memory protection is trivial in our applications. We also present a new VOPP implementation--Maotai 2.0, which has advanced features such as deadlock avoidance, producer/consumer view and system queues, in addition to the data race prevention scheme. The performance of Maotai 2.0 is evaluated and compared with modern programming models such as OpenMP and Cilk.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114899782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache Partitioning on Chip Multi-Processors for Balanced Parallel Scientific Applications","authors":"Guang Suo","doi":"10.1109/PDCAT.2009.48","DOIUrl":"https://doi.org/10.1109/PDCAT.2009.48","url":null,"abstract":"Nowadays, more and more supercomputers are built on multi-core processors with shared caches. However, the conflict accesses to shared cache from different threads or processes become a performance bottleneck for parallel applications. Cache partitioning can be used to allocate cache resources for different processes exclusively according to the demands of the processes. Conflicted accesses are avoided by restricting cache accesses to distinct private part of shared caches. This paper studies the problem of shared cache partition for balanced MPI parallel applications in CMP architecture, presenting the performance oriented cache partitioning framework, including Spatial-Level Cache Partitioning(SLCP), Time-level Cache Partitioning(TLCP) and the evaluation of them. We evaluate SLCP and TLCP based on a quad-core simulator. Experiment shows that the SLCP and TLCP outperforms traditional LRU cache replacement policy in IPC throughput and miss rate metric. Specifically, for large workloads, TLCP outperforms LRU by up to 20% and on average 8.7%.","PeriodicalId":312929,"journal":{"name":"2009 International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}