Tiago Luis Andrade, Rogéria Cristiane Gratão de Souza, Maurizio Babini, C. R. Valêncio
{"title":"Optimization of Algorithm to Identification of Duplicate Tuples through Similarity Phonetic Based on Multithreading","authors":"Tiago Luis Andrade, Rogéria Cristiane Gratão de Souza, Maurizio Babini, C. R. Valêncio","doi":"10.1109/PDCAT.2011.58","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.58","url":null,"abstract":"Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126975079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sudhakar Sah, Y. Roh, KyoungSeop Chang, DaeHwa Jeong
{"title":"Phase Map Generation for Phase Shift Moire Using CUDA","authors":"Sudhakar Sah, Y. Roh, KyoungSeop Chang, DaeHwa Jeong","doi":"10.1109/PDCAT.2011.68","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.68","url":null,"abstract":"Phase shift Moiré is a very popular and one of the most successful techniques for shape measurement of 3-D objects such as PCB (printed circuit board), TFT (thin film transistor), LCD (Liquid crystal display) etc. Various implementations of phase shift moiré are available for improving accuracy and/or speed. Although, these methods contribute a lot in reducing the computation with some compromise in accuracy, there is a lot of scope of improving the performance of these algorithms with increased accuracy, especially when specialized hardware like GPU is available. GPU contains many core or processing elements that can process the same work concurrently resulting in dramatic increase in performance. In this paper, we propose the parallel implementation of the phase shift moiré method on CUDA. A novel method called image stacking method is proposed that can also be used for CUDA implementation of similar algorithms to improve performance. Using this technique, we are able to execute the application 180 times faster compared to the CPU implementation.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116745767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast Incremental Spectral Clustering for Large Data Sets","authors":"Tengteng Kong, Ye Tian, Hong Shen","doi":"10.1109/PDCAT.2011.4","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.4","url":null,"abstract":"Spectral clustering is an emerging research topic that has numerous applications, such as data dimension reduction and image segmentation. In spectral clustering, as new data points are added continuously, dynamic data sets are processed in an on-line way to avoid costly re-computation. In this paper, we propose a new representative measure to compress the original data sets and maintain a set of representative points by continuously updating Eigen-system with the incidence vector. According to these extracted points we generate instant cluster labels as new data points arrive. Our method is effective and able to process large data sets due to its low time complexity. Experimental results over various real evolutional data sets show that our method provides fast and relatively accurate results.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129412045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Communication-Aware Task Partition and Voltage Scaling for Energy Minimization on Heterogeneous Parallel Systems","authors":"Guibin Wang, Wei Song","doi":"10.1109/PDCAT.2011.28","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.28","url":null,"abstract":"Heterogeneous parallel systems have become popular in general purpose computing and even high performance computing fields. There are many studies focused on harnessing heterogeneous parallel processing for better performance. However the energy optimization for heterogeneous system has not been well studied. Owing to the differences in performance and energy consumption, the energy optimization technique for heterogeneous system is different from the existing methods designed for homogeneous system. Besides typical voltage scaling method, reasonable task partitioning is also an essential method for optimizing energy consumption on heterogeneous systems. Through partitioning a data parallel task and mapping sub-tasks onto several processors, one could achieve better performance and reduced energy consumption. As the computation cost reduces with specific accelerators, the communication overhead becomes more prominent. Therefore, the task partition optimization should holistically consider the computation improvement and communication overhead to achieve higher energy efficiency. Typically, task partition and voltage scaling are not orthogonal and influence the effect of each other in the energy optimization problem. In order to harness both two knobs efficiently, this paper proposes an integer linear programming (ILP) based energy-optimal solution designed for heterogeneous system. We present a case study of optimizing MGRID benchmark on a typical CPU-GPU heterogeneous system. The experimental results demonstrate that the proposed method could exploit the heterogeneity in different processors and achieve improved energy efficiency.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126002347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anonymizing Hypergraphs with Community Preservation","authors":"Yidong Li, Hong Shen","doi":"10.1109/PDCAT.2011.21","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.21","url":null,"abstract":"Data publishing based on hyper graphs is becoming increasingly popular due to its power in representing multi-relations among objects. However, security issues have been little studied on this subject, while most recent work only focuses on the protection of relational data or graphs. As a major privacy breach, identity disclosure reveals the identification of entities with certain background knowledge known by an adversary. In this paper, we first introduce a novel background knowledge attack model based on the property of hyper edge ranks, and formalize the rank-based hyper graph anonymization problem. We then propose a complete solution in a two-step framework, with taking community preservation as the objective data utility. The algorithms run in near-quadratic time on hyper graph size, and protect data from rank attacks with almost same utility preserved. The performances of the methods have been validated by extensive experiments on real-world datasets as well.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Multidimensional Scaling and Barycentric Coordinates Based Distributed Localization in Wireless Sensor Networks","authors":"Cuiqin Hou, Yibin Hou, Zhangqin Huang, Huibing Zhang","doi":"10.1109/PDCAT.2011.79","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.79","url":null,"abstract":"Position information is vital for wireless sensor networks in many applications. In this paper, based on the barycentric coordinate system, we incorporate a term constrains sensors to remain the intrinsic structure revealed by range measurements between neighboring nodes into the STRESS function, which is the cost function optimized by the distributed weighted-multidimensional scaling algorithm (dwMDS). By minimizing the modified cost function, we derive a distributed localization algorithm called the Multidimensional Scaling and Barycentric Coordinates based Distributed Localization Algorithm (MDS_BC_DLA). Experimental results on four different types of WSNs show MDS_BC_DLA outperforms dwMDS.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Study of the Beezone System","authors":"Shuyu Liu, Zhengbiao Guo, Zhitang Li, Hao Tu","doi":"10.1109/PDCAT.2011.17","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.17","url":null,"abstract":"Bee zone is a large-scale live video streaming system, which adopts dual-protocol-stack technology and gossip-like protocol to build an unstructured P2P application layer overlay, which can be used in IPv4/v6 hybrid network environment. We run the system during the 2010 FIFA World Cup South Africa. By collecting the logs from the system when it works, we study the workload, performance, and dynamics of the system. Based on these logs, we show that (1) the system can be used in hybrid network environment, which makes the playback smooth and reduces latency at end users. (2) it can make good use of IPv6 bandwidth, and (3) the dynamics of the system is slow. Our results fully verified the performance of Bee zone, and indicate the use of IPv6 channel can improves the performance of the P2P video streaming system.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129050245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dacoop: Accelerating Data-Iterative Applications on Map/Reduce Cluster","authors":"Yi Liang, Guangrui Li, Lei Wang, Yanpeng Hu","doi":"10.1109/PDCAT.2011.32","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.32","url":null,"abstract":"Map/reduce is a popular parallel processing framework for massive-scale data-intensive computing. The data-iterative application is composed of a serials of map/reduce jobs and need to repeatedly process some data files among these jobs. The existing implementation of map/reduce framework focus on perform data processing in a single pass with one map/reduce job and do not directly support the data-iterative applications, particularly in term of the explicit specification of the repeatedly processed data among jobs. In this paper, we propose an extended version of Hadoop map/reduce framework called Dacoop. Dacoop extends Map/Reduce programming interface to specify the repeatedly processed data, introduces the shared memory-based data cache mechanism to cache the data since its first access, and adopts the caching-aware task scheduling so that the cached data can be shared among the map/reduce jobs of data-iterative applications. We evaluate Dacoop on two typical data-iterative applications: k-means clustering and the domain rule reasoning in sementic web, with real and synthetic datasets. Experimental results show that the data-iterative applications can gain better performance on Dacoop than that on Hadoop. The turnaround time of a data-iterative application can be reduced by the maximum of 15.1%.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116380701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Task Based Sensor-Centric Model for Overall Energy Consumption","authors":"N. Kamyabpour, D. Hoang","doi":"10.1109/PDCAT.2011.12","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.12","url":null,"abstract":"Sensors have limited resources so it is important to manage the resources efficiently to maximize their use. A sensor's battery is a crucial resource as it singly determines the lifetime of sensor network applications. Since these devices are useful only when they are able to communicate with the world, radio transceiver of a sensor as an I/O and a costly unit plays a key role in its lifetime. This resource often consumes a big portion of the sensor's energy as it must be active most of the time to announce the existence of the sensor in the network. As such the radio component has to deal with its embedded sensor network whose parameters and operations have significant effects on the sensor's lifetime. In existing energy models, hardware is considered, but the environment and the network's parameters did not receive adequate attention. Energy consumption components of traditional network architecture are often considered individually and separately, and their influences on each other have not been considered in these approaches. In this paper we consider all possible tasks of a sensor in its embedded network and propose an energy management model. We categorize these tasks in five energy consuming constituents. The sensor's Energy Consumption (EC) is modeled on its energy consuming constituents and their input parameters and tasks. The sensor's EC can thus be reduced by managing and executing efficiently the tasks of its constituents. The proposed approach can be effective for power management, and it also can be used to guide the design of energy efficient wireless sensor networks through network parameterization and optimization.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121341426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"jMigBSP: Object Migration and Asynchronous One-Sided Communication for BSP Applications","authors":"Lucas Graebin, R. Righi","doi":"10.1109/PDCAT.2011.48","DOIUrl":"https://doi.org/10.1109/PDCAT.2011.48","url":null,"abstract":"This paper describes the rationale for developing jMigBSP - a Java programming library that offers object rescheduling. It was designed to work on grid computing environments and offers an interface that follows the BSP (Bulk Synchronous Parallel) style. jMigBSP's main contribution focuses on the rescheduling facility in two different ways: (i) by using migration directives on the application code directly and (ii) through automatic load balancing at middleware level. Especially, this second idea is feasible thanks to the Java's inheritance feature, in which transforms a simple jMigBSP application in a migratable one only by changing a single line of code. In addition, the presented library makes the object interaction easier by providing one-sided message passing directives and hides network latency through asynchronous communications. Finally, a BSP-based FFT application was developed and its execution shows jMigBSP as a competitive library when comparing performance with a C-based library called BSPlib. Besides its user-friendly Java interface, the strengths of jMigBSP also considers the migration tests where it outperforms the time spent with BSPlib.","PeriodicalId":137617,"journal":{"name":"2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133524800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}