{"title":"Extracting Domain-Relevant Term Using Wikipedia Based on Random Walk Model","authors":"Wenjuan Wu, Tao Liu, H. Hu, Xiaoyong Du","doi":"10.1109/CHINAGRID.2012.20","DOIUrl":"https://doi.org/10.1109/CHINAGRID.2012.20","url":null,"abstract":"In this paper we present a new approach for the automatic identification of domain-relevant concepts and entities of a given domain using the category and page structures of the Wikipedia in a language independent way. By applying Markov random walk algorithm on the weighted Wikipedia link graph, our approach can identify large quantities of domain-relevant concepts and entities with very little human effort. Experimental results show that our method achieves high accuracy and acceptable efficiency in domain-relevant term extraction.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133906459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changqing Ji, Tingting Dong, Yu Li, Yanming Shen, Keqiu Li, Wenming Qiu, W. Qu, M. Guo
{"title":"Inverted Grid-Based kNN Query Processing with MapReduce","authors":"Changqing Ji, Tingting Dong, Yu Li, Yanming Shen, Keqiu Li, Wenming Qiu, W. Qu, M. Guo","doi":"10.1109/ChinaGrid.2012.19","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.19","url":null,"abstract":"With the increasing availability of LBS (Location Based Services) and mobile internet, the amount of spatial data is growing larger and larger. It poses new requirements and challenges towards cloud environments, such as how to accomplish efficient index and query processing on large scale spatial data. A scalable and distributed spatial data index is a best choice for the effective processing of the spatial data analysis and query. There are several approaches that implement distributed indices and query processing with MapReduce, such as R-tree and Voronoi-based index. However, R-tree is unsuitable for parallelization and query processing on Voronoi-based index needs extra computation for localization or local index reconstruction. The regularity of grid partition is much easier to scale and parallel comparing with the above two approaches. Inverted Index utilizes limited index entries to index unlimited data points. In this paper, we propose a new distributed spatial data index: Inverted Grid Index, which is a combination of inverted index and grid partition. Our index structure is more simple and suitable for large-scale parallel spatial query application. We present MapReduce-based approaches that both construct Inverted Grid Index and process kNN query over large spatial data sets. Extensive experiments have been done to evaluate the scalability and the performance of kNN query processing on our index structure. The results demonstrate the efficiency and scalability of our kNN query algorithm based on Inverted Grid Index.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125934458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Heterogeneity-aware Data Distribution and Rebalance Method in Hadoop Cluster","authors":"Yuanquan Fan, Weiguo Wu, Haijun Cao, Huo Zhu, Xu Zhao, Wei Wei","doi":"10.1109/ChinaGrid.2012.22","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.22","url":null,"abstract":"The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous. Due to the fact that the input data are split into data blocks with a predefined block size, Hadoop suffers performance degradation during Map phase in heterogeneous cluster. To solve this problem, we propose a heterogeneity-aware data distribution and rebalance method in heterogeneous Hadoop cluster. The method consists of two aspects: 1) performance-aware data distribution, and 2) dynamic data migration. The experimental results indicate that our method can improve the Map performance in heterogeneous cluster. Furthermore, the data locality of the Map task is enhanced as well.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123099823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Tang, Yulong Yu, Yuxin Wang, Yong Zhou, He Guo
{"title":"EMA: Turning Multiple Address Spaces Transparent to CUDA Programming","authors":"Kun Tang, Yulong Yu, Yuxin Wang, Yong Zhou, He Guo","doi":"10.1109/ChinaGrid.2012.23","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.23","url":null,"abstract":"CUDA performs general purpose parallel computing using GPGPU, which has been applied to various computing fields. However, the multi-address-space architecture in CUDA makes memory management complicated. NVIDIA introduced UVA, Unified Virtual Addressing, into CUDA Toolkit 4.0 to address this issue. However, UVA has platform limitations and even performance loss under certain circumstances. We propose EMA, Encapsulated Multiple Addressing, which encapsulates data residing in multiple address spaces into a single data object. Combined with data manipulating encapsulation, EMA also turns multi-address-space architecture into single-address-space architecture. Compared with UVA, EMA has no platform limitations and the experimental results show that EMA avoids the potential performance loss with negligible overhead.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125896773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Effective IO Throughput by Adaptive Read-Ahead Strategy for Private Cloud Storage Service","authors":"Qiuping Wang, Kang Chen, Yongwei Wu, Weimin Zheng","doi":"10.1109/ChinaGrid.2012.9","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.9","url":null,"abstract":"We have employed the Linux on-demand read-ahead framework in our campus wide storage service system call MeePo. The appropriate read-ahead mechanism can significantly increase IO throughput and improve user experiences by hiding network latency which is critical for real-time applications. Our strategy is based on the data accessing characteristic of MeePo system. The read-ahead framework uses the strategy profile which is generated according to the analysis of access trace of a typical user in a storage community. Our test deployment environment involves more than 5000 registered users as well as 150+ communities. Based on our observation that most of files in our system have either sequential or interleaved accessing patterns. In such scenario, client IO throughout could increase 12% for sequential stream and more than 180% improvement for interleaved stream.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127580328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Security SLAs for IMS-based Cloud Services","authors":"Guo Zhien, Dai Yi-qi","doi":"10.1109/CHINAGRID.2012.14","DOIUrl":"https://doi.org/10.1109/CHINAGRID.2012.14","url":null,"abstract":"For the actual use of existing security problems and lack of existing solutions for cloud computing, we introduce the SLA(Service Level Agreement) ideas to the security capability negotiations, which called sSLA(Security Service Level Agreement). The framework and data processes of sSLA were implemented on our IMS-based cloud systems, which were showed that this method can effectively eliminate the security concerns of the customers of cloud services.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121580610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Huang, Kang Chen, Yongwei Wu, Weimin Zheng, Q. Yue
{"title":"Improving the System Capacity by Client Cooperation in Distributed File Service","authors":"Gang Huang, Kang Chen, Yongwei Wu, Weimin Zheng, Q. Yue","doi":"10.1109/ChinaGrid.2012.11","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.11","url":null,"abstract":"We have investigated the workload of a campus wide distributed file service. The observation is that hot files are downloaded many times within a short period of time. To reduce the server overhead, we employed a client side cooperation module in the system. The client side cooperation module can get file data from other clients instead of directly download data from data servers. The applied mechanism can significantly reduce server load that is critical to system scalability. Thus, this distributed file service can support more users with the same amount of servers while keeping or improving user experiences. We verified the proposed mechanism in the file service system with over 5000 registered users. Results show that, in the environment of campus network, server load can be reduced by up to 30%. The chunk response latency can be reduced by 90% percent and gives a significant boost to the client side download speed.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129056358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zhao, Xiaoshe Dong, Haijun Cao, Yuanquan Fan, Huo Zhu
{"title":"A Parameter Dynamic-Tuning Scheduling Algorithm Based on History in Heterogeneous Environments","authors":"Xu Zhao, Xiaoshe Dong, Haijun Cao, Yuanquan Fan, Huo Zhu","doi":"10.1109/ChinaGrid.2012.24","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.24","url":null,"abstract":"In MapReduce model, the job execution time was prolonged by the straggler tasks in heterogeneity environments. The LATE scheduler has introduced the longest remaining time strategy, but it also has some drawbacks such as inaccurate estimated time and the wasting of system resources. In order to solve these problems, we propose two main algorithms : The parameter dynamic-tuning algorithm based history estimates progress of a task accurately since it dynamically tunes the weight of each phase of a map task and a reduce task according to the historical values of the weights, The evaluation-scheduling algorithm reduce the wasting of system resources by evaluating the free slot before launching a straggler task on this node. The two main algorithms are implemented in hadoop 0.20.1. The environment results are satisfaction to our expects and significantly reduce the wasting of system resources.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133961482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WaxElephant: A Realistic Hadoop Simulator for Parameters Tuning and Scalability Analysis","authors":"Zujie Ren, Zhijun Liu, Xianghua Xu, Jian Wan, Weisong Shi, Min Zhou","doi":"10.1109/ChinaGrid.2012.25","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.25","url":null,"abstract":"MapReduce is becoming the state-of-the-art computation paradigm for processing large-scale datasets on a large cluster with tens or thousands of nodes. Hadoop, an open-source implementation of MapReduce framework, has gained much popularity due to its high scalability and performance. Two challenging issues for a large-scale Hadoop cluster are how to analyze the scalability and identify the optimal parameters configurations. To address these issues, we designed and implemented a Hadoop simulator called Wax Elephant, which provides the following capabilities: (1) loading real MapReduce workloads derived from the historical log of Hadoop clusters, and replaying the job execution history, (2) synthesizing workloads and executing them based on statistical characteristics of workloads, (3) identifying the optimal parameters configurations, and (4) analyzing the scalability of the cluster. Extensive experiments have been conducted to validate the accuracy of the Wax Elephant simulator.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126636742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Heuristic Algorithm for Scheduling on Grid Computing Environment","authors":"Jing Wang, Gongqing Wu, Bin Zhang, Xuegang Hu","doi":"10.1109/ChinaGrid.2012.13","DOIUrl":"https://doi.org/10.1109/ChinaGrid.2012.13","url":null,"abstract":"With the conglomeration of large-scale heterogeneous systems, the grid computing environment makes the whole network into a powerful and reliable resource available nearly everywhere. Resource scheduling is a fundamental issue in grid computing. For this NP-hard problem, we take into account of the geographic distribution of resources and the requirement of job entity in the scheduling algorithm. To do so, we first consider the parameters of job entity and resource entity. Then the key characteristics as release time, processing time and delivery time determine the rules about the scheduling. We present HF (Harder First) strategy and DF (Larger Distance First) strategy. Let the H value denotes the sum of release time, length and delivery time of the job, the job with a higher H value is considered to be harder and should be assigned to a faster resource according to the HF strategy. Secondly, when the number of jobs is larger than the number of resources, the DF strategy makes sure that the job with a higher difference (distance) between the delivery time and the release time should be processed first. Based on the stated strategies, we provide a heuristic algorithm HFFP (Harder First Faster Prior) for resource scheduling on the grid computing environment. The experiment data of jobs scale from 10k to 80k, while the number of resources ranges from 2 to 6. The algorithm performance is demonstrated by simulation on the platform of GridSim. Our experiment results show that the algorithm HFFP can minimize the completion time of jobs especially when the number of jobs is much larger than the number of resources. By comparing our algorithm with classical scheduling algorithm as Min-min algorithm, we can see that our algorithm can assign the jobs to the resources reasonably from the criteria of make span. To better compare the performance of our algorithm with Max-min, we do some medication to the traditional Max-min algorithm and presents Max-min-L (Max-min-Local). Max-min-L chooses the local maximization instead of overall maximization, suitable for jobs with similar length. By comparing experiments with Max-min-L and Min-min, we can still get that our algorithm is better than Min-min and Max-min-L by the metrics of make span.","PeriodicalId":371382,"journal":{"name":"2012 Seventh ChinaGrid Annual Conference","volume":"285 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123727858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}