Taiming Wang, Yue Kou, Derong Shen, Heng Liu, Ge Yu
{"title":"SIER: An Efficient Entity Resolution Mechanism Combining SNM and Iteration","authors":"Taiming Wang, Yue Kou, Derong Shen, Heng Liu, Ge Yu","doi":"10.1109/WISA.2014.50","DOIUrl":"https://doi.org/10.1109/WISA.2014.50","url":null,"abstract":"With the rapid increase of data, entity resolution (ER) faces two challenges: high quality and high performance. Correspondingly, current work focuses on iteration-based entity resolution or sorted neighborhood (SNM) - based entity resolution. The former iteratively merges similar records to acquire higher precision and recall. The latter only compares the records within the same sliding window to maintain higher performance. However, they are at the cost of either sacrificing efficiency or result quality. In this paper, we present an entity resolution mechanism combining SNM and iteration (called SIER). Unlike traditional approaches, SIER can fully exploit the advantages of SNM and iteration. Also a two-stage entity matching algorithm is proposed. In the first stage, the records are initially matched based on sliding window. In the second stage, the matching result is rectified iteratively to improve the quality of the result. The experiments demonstrate the feasibility and effectiveness of our method.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123034822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangqing Deng, L. Zhou, Zhang Tao, N. Kong, Shuo Shen
{"title":"Design and Implementation of Heterogeneous Internet of Things Identifier Recognition System","authors":"Guangqing Deng, L. Zhou, Zhang Tao, N. Kong, Shuo Shen","doi":"10.1109/WISA.2014.10","DOIUrl":"https://doi.org/10.1109/WISA.2014.10","url":null,"abstract":"In Internet of Things (IoT), both physical things (such as devices, gateways and commodities etc.) and virtual things (like multimedia content and application software) are assigned unique IoT identifiers (IoT ID), which are similar to the domain name used for locating online resources in Internet. However, until now the format of IoT ID is not globally unified due to all kinds of reasons and thus in the IoT world there are even thousands of types of IoT ID (such as EPC code in USA, uCode in Japan, CPC code in China etc.). Due to the heterogeneity of IoT ID, it becomes the key point of resolving an IoT ID to the information referring to it (just as resolving the domain name to IP address) to at first determine the standard that the given IoT ID belongs to. Towards this issue, an extraction mechanism of IoT ID characteristic is proposed where three kinds of IoT ID characteristics, namely length, value range and special algorithm, are captured from the IoT ID standards. Based on these IoT ID characteristics, a fast IoT ID recognition algorithm is proposed to highly reduce the times of characteristic matching. Then a prototype system is built based on the above algorithm and the experiment results demonstrate that the IoT ID recognition algorithm has a high recognition ratio and low computational complexity.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124739932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Graph-Based Bursty Topic Detection Approach in User-Generated Texts","authors":"Li Zhao, Yan Li, Xinran Liu, Hong Zhang","doi":"10.1109/WISA.2014.57","DOIUrl":"https://doi.org/10.1109/WISA.2014.57","url":null,"abstract":"The problem of hot bursty topic detection in user generated texts deserves great attentions with the proliferation of Internet technologies. However, traditional document clustering and probabilistic topic models that were developed for formal news articles are less effective for informal user-generated corpora. In this paper, we provide a graph-based perspective that well reflects the latent pattern of bursty topics in text stream and develop an effective solution of the bursty topic detection problem. We represent texts with topics using a directed and weighted graph, with the bursty words as vertices and Tversky index of bursty words being edges. Topic detection from the texts is then converted into dividing the constructed graph into separate sub graphs, each significant sub graph corresponding to a bursty topic. To accomplish this, we partition the bursty word graph into the graph's strongly connected components, based on the analysis that the important topical words within a graph are connected to each other with high weights and thus form strongly connected components. We demonstrate through experiments on two user-generated corpora collected from English web log and Chinese weibo (microblog) sites that the proposed approach can effectively detects the hot bursty topics, more appropriate than other topic detection models such as the LDA topic model and the EGF approach in TDT project.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126822135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research of Workflow Access Control Strategy based on Trust","authors":"Rui Ma, Linying Xu, Pengxiang Gao","doi":"10.1109/WISA.2014.24","DOIUrl":"https://doi.org/10.1109/WISA.2014.24","url":null,"abstract":"The traditional workflow access control strategy has often found to be inadequate to detect and restrain malicious behavior effectively. With the aim to solve this problem, this paper presents a new workflow access control model based on trust, and a new access control strategy with an authorization process. This strategy introduces user behavior evaluation, trust computation and role hierarchy into role access control strategy. Through the trust computation of user behavior, it can dynamically adjust user's role and permissions, realizing the dynamic authorization process. Theory analysis and simulation experiments show that this access control strategy is more sensitive in dynamic authorization, and it has fine-grained trust computation. Also this strategy can detect malicious behaviors in time, effectively restraining malicious behavior harming the system so enhancing the security of the system.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131563730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Security Framework for Cloud-Based Web Crawling System","authors":"Yan Li, Li Zhao, Xinran Liu, Peng Zhang","doi":"10.1109/WISA.2014.27","DOIUrl":"https://doi.org/10.1109/WISA.2014.27","url":null,"abstract":"In the face of large amounts of complicated web information, a professional and individualized web crawling system is required for users to acquire information effectively. In this context, a cloud-based web crawling system is proposed, which can improve the efficiency of application development and reduce its maintenance costs. However, it also poses security risks for application developers. To address the problem of security and privacy protection in a cloud based web crawling system, this paper proposes a framework based on the SSL connection and simHash to ensure the security of storage and transmission. Moreover, a case study demonstrates the effectiveness and efficiency of the security framework.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133123145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Shi, Yuzhong Chen, Mingyue Fang, Wanhua Li, Shining
{"title":"A Hierarchical Multi-label Propagation Algorithm for Overlapping Community Discovery in Social Networks","authors":"Song Shi, Yuzhong Chen, Mingyue Fang, Wanhua Li, Shining","doi":"10.1109/WISA.2014.29","DOIUrl":"https://doi.org/10.1109/WISA.2014.29","url":null,"abstract":"Multi-label propagation algorithms (MLPAs) have nearly linear time complexity, but the accuracy and stability still need to be improved when applied to overlapping community discovery. Inspired from the idea that boundary nodes are more probable to appear in the overlapping regions of different communities, a Hierarchical Multi-label Propagation Algorithm (HMPA) based on node hierarchy and label propagation gain for overlapping community discovery in social networks is proposed in this paper. HMPA consists of three stages. Firstly, HMPA utilizes LPAm to unfold initial non-overlapping communities. Secondly, a PageRank-like method is proposed to mark the hierarchy of each node according to the initial partition of the first stage. Finally, multi-label propagation algorithm considering label propagation gain between nodes, which is calculated based on node hierarchy, is introduced to refine overlapping region. Experimental results on both the synthetic and real world networks show that the proposed algorithm can effectively solve the problems of traditional multi-label propagation algorithms in terms of accuracy and stability.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123366515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application Performance Monitoring and Analyzing Based on Bayesian Network","authors":"Chao Wang, Lili Su, Xue Zhao, Y. Zhang","doi":"10.1109/WISA.2014.19","DOIUrl":"https://doi.org/10.1109/WISA.2014.19","url":null,"abstract":"Application performance monitoring and analyzing system has been widely used in online applications and services, to obtain high performance and reliability. However, existing performance monitoring systems only raise an alarm when certain parameter exceeds its threshold and do not provide further analysis. This paper presents an application performance monitoring and analyzing model based on Bayesian Network, in which we discover the implied causality between performance parameters and user experience. We also design a feedback correction algorithm that can improve the validity of our model. A series of experiments and tests have demonstrated that our model provides proper analysis. With these desired properties, the performance monitoring systems with our model and algorithms can provide high performance and reliability to users.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128706509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Terms Semantic Query Model Based on Wikipedia","authors":"Dexin Zhao, Pengjie Liu, Liangliang Qin, Yukun Li","doi":"10.1109/WISA.2014.54","DOIUrl":"https://doi.org/10.1109/WISA.2014.54","url":null,"abstract":"Search engines have become the main way for people to get expected information, most of them are based on keyword search. However, keyword search is based on computing the similarity of letters of the keywords, instead of semantic meaning, therefore the searching results often include irrelevant information to user intention. This paper aims to find a way on improving keyword search efficiency. Using Wikipedia, which is the largest online encyclopedia, this paper explores the relations of terms through computing the semantic relatedness between words, and presents an algorithm called WLA in the light of link structure and text message in Wikipedia. What is more, we design a terms query platform through which users will be able to get all the meanings about the concepts. By making a comparison with lexical database WordNet, it has demonstrated the feasibility on our methods.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114253247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Indicator of Impact Factor Based on Journals' Life Cycles","authors":"Shuofei Zhu, Yanhui Li","doi":"10.1109/WISA.2014.36","DOIUrl":"https://doi.org/10.1109/WISA.2014.36","url":null,"abstract":"In this paper we introduce a new modification of calculating the Impact Factor(IF) for journals based on the variation of papers' popularity in different journals and research fields over time, which we call \"life cycle\". The impact factor value for a journal calculated by the proposed method is called Improved Impact Factor (IIF). Papers in forty journals published from 1980 to 2014 are selected and examined, and the impact factor values from IF and IIF methods then compared. The position of the journals ranked by impact factors obtained from IF method were different from those from the IIF method. It is concluded that IIF is more suitable for journals with longer life cycle and journals with longer IIF.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129624237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Xue, Tiezheng Nie, Derong Shen, Yue Kou, Wenjie Li
{"title":"A Reachability Query Approach with Path Interval Labeling","authors":"Peng Xue, Tiezheng Nie, Derong Shen, Yue Kou, Wenjie Li","doi":"10.1109/WISA.2014.39","DOIUrl":"https://doi.org/10.1109/WISA.2014.39","url":null,"abstract":"For a directed graph and two vertices, to check whether there is a path between them is so-called reach ability query, how to establish efficient index to answer the reach ability of two nodes has always been a research direction in the field of database. In this paper, we proposed a reachability query approach combining the concept of path dividing and interval labeling. We establish an index with path interval labeling, also two kinds of query strategies are presented. And experimental results on real data sets show that with a path interval labeling it has a better time efficiency of querying.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126590254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}