{"title":"A probability based algorithm for influence maximization in social networks","authors":"Zhen Wang, Zhuzhong Qian, Sanglu Lu","doi":"10.1145/2532443.2532455","DOIUrl":"https://doi.org/10.1145/2532443.2532455","url":null,"abstract":"In a social network, information runs from word-of-mouth based on the relationship of the users. The influence maximization is to find a limited number of initial users (nodes) to spread the information, so that the maximum number of other users could accept the information, which is a useful technique for marketing, information monitoring and advertising in a social network. Diffusion model of social networks imitates the process of information spreading in social networks, and Independent Cascade (IC) Model and Linear Threshold (LT) Model, are well-known stochastic information influence models. In this paper, we extend the classical IC model according to the observation of users' behaviors in social networks and propose an effective influence maximization algorithm based on this extended IC model. This novel algorithm calculates the influence probability of each node in sub-graphs that other nodes can engendered to it iteratively. The simulation experiments on real social network datasets show that our algorithm is much faster than the greedy hill-climbing algorithm, while the results are very close to the greedy algorithm and out-perform the other heuristic algorithms.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132439410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei
{"title":"MR-runner: a modularized map-reduce job management tool","authors":"Xinsheng Yang, Wei Wang, Lijie Xu, Jie Liu, Jun Wei","doi":"10.1145/2532443.2532474","DOIUrl":"https://doi.org/10.1145/2532443.2532474","url":null,"abstract":"Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete \"map\" and \"reduce\" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called \"de-parallel\". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a \"client\", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126950185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating API-usage example for project developers","authors":"Zixiao Zhu, Yanzhen Zou, Yong Jin, Bing Xie","doi":"10.1145/2532443.2532470","DOIUrl":"https://doi.org/10.1145/2532443.2532470","url":null,"abstract":"Usage examples have been shown very helpful for API learning in software reuse. Nowadays, many approaches have been proposed to automatically extract usage examples from client code or web pages for API users. However, they overlooked the benefit of API developers in example publishing and few works paid attention to help API developers to generate usage examples automatically. In this paper, we proposed an approach to generate API-usage example based on test code before the project are released. It analyzed which parts in test code are important for indicating API-usage and summarized some test code patterns, then a heuristic slice algorithm are proposed to extract referential test code as API-usage example based on these patterns. In the experiments, we gave some case studies on the commons-lang3 open source software library. It proved that our approach can provide good assistance for developers in APIs usage example generation.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115616867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"b-bit minwise hashing in practice","authors":"Ping Li, Anshumali Shrivastava, A. König","doi":"10.1145/2532443.2532446","DOIUrl":"https://doi.org/10.1145/2532443.2532446","url":null,"abstract":"Minwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demonstrated a potential use of b-bit minwise hashing [23, 24] for efficient search and learning on massive, high-dimensional, binary data (which are typical for many applications in Web search and text mining). In this paper, we focus on a number of critical issues which must be addressed before one can apply b-bit minwise hashing to the volumes of data often used industrial applications. Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations for each data vector. We developed a parallelization scheme using GPUs and observed that the preprocessing time can be reduced by a factor of 20 ~ 80 and becomes substantially smaller than the data loading time. Reducing the preprocessing time is highly beneficial in practice, e.g., for duplicate Web page detection (where minwise hashing is a major step in the crawling pipeline) or for increasing the testing speed of online classifiers. Another critical issue is that for very large data sets it becomes im- possible to store a (fully) random permutation matrix, due to its space requirements. Our paper is the first study to demonstrate that b-bit minwise hashing implemented using simple hash functions, e.g., the 2-universal (2U) and 4-universal (4U) hash families, can produce very similar learning results as using fully random permutations. Experiments on datasets of up to 200GB are presented.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng
{"title":"Accelerate MapReduce on GPUs with multi-level reduction","authors":"Ran Zheng, Kai Liu, Hai Jin, Qin Zhang, Xiaowen Feng","doi":"10.1145/2532443.2532447","DOIUrl":"https://doi.org/10.1145/2532443.2532447","url":null,"abstract":"With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133963478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining user daily behavior patterns from access logs of massive software and websites","authors":"Wei Zhao, Jie Liu, Dan Ye, Jun Wei","doi":"10.1145/2532443.2532462","DOIUrl":"https://doi.org/10.1145/2532443.2532462","url":null,"abstract":"Everyone has a characteristic pattern of daily activities. This study applies cluster analysis to identify a computer user's daily behavior patterns based on 1000 China users' 4-weeks software and web usage. Clustering models are built for 4 different behavior definition methods with different time period divisions and feature measurement selections. With these patterns, we build classification models to predict new users' daily behavior pattern with their half day activity logs. For example, if we know one user use computer for entertainment in the morning, we can predict his behavior in the afternoon and evening. The prediction model can be used to recommend suitable items to users according to their current behavior status. Our method can get 92.5% prediction correctness for the best.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128012193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"COCO","authors":"Wenjia Zhang, Wei Song, Xiaoxing Ma, Qiliang Yang, Xuewei Zhang","doi":"10.1145/2532443.2532471","DOIUrl":"https://doi.org/10.1145/2532443.2532471","url":null,"abstract":"Processes are an effective and efficient way to construct on-demand Internetware applications. It is often needed to evaluate whether two process-driven applications are consistent, or whether the implemented process conforms to the process specification. Most existing methods only return qualitative results (i.e., true or false), so slight inconsistencies may lead to a false result. To address this problem, based on activity constraints, we have presented a quantitative approach to process consistency analysis. In this paper, we focus on the implementation issues of our approach and introduce how to use our tool in practice.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123919791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards an operating system for the campus","authors":"Pengfei Yuan, Yao Guo, Xiangqun Chen","doi":"10.1145/2532443.2532468","DOIUrl":"https://doi.org/10.1145/2532443.2532468","url":null,"abstract":"Almost every computing device runs an operating system, which is responsible for managing different resources on the device and providing higher-level programming abstractions. This paper proposes CampusOS, an operating system which is responsible for managing networked resources on university campuses, including data of students, teachers, courses, organizations, and even data generated from users' computing devices. CampusOS provides flexible support for campus application development with SDKs consisting of campus-related APIs. CampusOS features and SDK APIs can also be extended by developers easily. We discuss the design of CampusOS, as well as its challenges.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125481093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A distributed rule execution mechanism based on MapReduce in sematic web reasoning","authors":"Haijiang Wu, Jie Liu, Dan Ye, Hua Zhong, Jun Wei","doi":"10.1145/2532443.2532457","DOIUrl":"https://doi.org/10.1145/2532443.2532457","url":null,"abstract":"Rule execution is the core step of rule-based semantic web reasoning. However, most existing approaches are centralized, which cannot scale out to reason big semantic web datasets. In this paper, we described a kind of semantic web rule execution mechanism using MapReduce programming model, which not only can handle RDFS and OWL ter Horst semantic rules, but also can be used in SWRL reasoning. Theoretical analysis is present on the scalability of this rule execution mechanism. Result shows that it can scale well as Mapreduce framework.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115478514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable crawler framework for FLOSS data","authors":"Lingxiao Zhang, Yanzhen Zou, Bing Xie","doi":"10.1145/2532443.2532454","DOIUrl":"https://doi.org/10.1145/2532443.2532454","url":null,"abstract":"Free / Libre / Open Source Software (FLOSS) data, such as bug reports, mailing lists and related webpages, contains valuable information for reusing open source software projects. Before conducting further experiment on FLOSS data, researchers often need to download these data into a local storage system. We refer to this pre-process as FLOSS data retrieval, which in many cases can be a challenging task. In this paper, we proposed a crawler framework to ease the process of FLOSS data retrieval. To cope with various types of FLOSS data scattered on the Internet, we designed the framework in a scalable manner where a crawler program can be easily plugged into the system to extend its functionality. Researchers can perform the retrieval process on datasets of various types and sources simply by adding new configurations to the system. We have implemented the framework and provided basic functions via web-based interfaces. We presented the usage of the system by a detailed case study where we retrieved various types of datasets related to Apache Lucene project using our framework.","PeriodicalId":362187,"journal":{"name":"Proceedings of the 5th Asia-Pacific Symposium on Internetware","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115832445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}