Hui Wang, Weijie Ou, Zhiyong Peng, W. Zhai, Ran Chen
{"title":"User Demand Description and Optimization in Web Data Management","authors":"Hui Wang, Weijie Ou, Zhiyong Peng, W. Zhai, Ran Chen","doi":"10.1109/WISA.2011.25","DOIUrl":"https://doi.org/10.1109/WISA.2011.25","url":null,"abstract":"Based on object deputy database, newly proposed web data management system (WDMS) provides user with personal data spaces to flexibly manage their various web data. Limited to database capacity, WDMS should gather data that user need from Web according to user demand implied in their personal data spaces. However, user demand that expressed in SQL sentences in data spaces can't be comprehended and executed by meta-search engine. Confronted with the problem, this paper proposed a user demand description method which is helpful to formalize user demand expression and bridge the gap between user demand and web data sources. Based on user demand description, this paper also proposed user demand set optimization method to eliminate the subset and intersection relationship among user demand set. The experimental results demonstrate that user demand description and its optimization method can well express complex user demand and reduce redundant query cost. This work can be widely used in various kinds of personal Web data service system.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114332983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficiently Detecting Frequent Patterns in Biological Sequences","authors":"W. Liu, Ling Chen","doi":"10.1109/WISA.2011.27","DOIUrl":"https://doi.org/10.1109/WISA.2011.27","url":null,"abstract":"Most of the existing algorithms for mining frequent patterns could produce lots of projected databases and short candidate patterns which could increase the time and memory cost of mining. In order to overcome such shortcoming, we propose two fast and efficient algorithms named SBPM and MSPM for mining frequent patterns in single and multiple biological respectively. We first present the concept of primary pattern, and then use prefix tree for mining frequent primary patterns. A pattern growth approach is also presented to mine all the frequent patterns without producing large amount of irrelevant patterns. Our experimental results show that our algorithms not only improve the performance but also achieve effective mining results.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129459054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Splog Filtering algorithm Based on Combinational Features","authors":"Yong-gong Ren, Ming-fei Yin, Jian Wang","doi":"10.1109/WISA.2011.23","DOIUrl":"https://doi.org/10.1109/WISA.2011.23","url":null,"abstract":"","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131973469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Wang, Weifeng Zhang, Yingzhou Zhang, XiaoHua Ji
{"title":"Detecting Image Spam Based on Cross Entropy","authors":"M. Wang, Weifeng Zhang, Yingzhou Zhang, XiaoHua Ji","doi":"10.1109/WISA.2011.11","DOIUrl":"https://doi.org/10.1109/WISA.2011.11","url":null,"abstract":"To detect image spam effectively, it is necessary to analyze the image content. We do research on the local invariant features of images, and thus propose a novel method: near-duplicate image spam detecting based on CE (cross entropy), in which the SURF (Speeded up Robust Features) is used to extract the local invariant features of each image (spam and ham); then the GMM (Gaussian Mixture Models) of local invariant features are fitted. Using CE as the distance measurement between Gaussian distributions, we improve the Kmeans to cluster the GMMs since our dataset is very large. Experiments show that using CE as the distance measurement is beneficial, and the proposed method achieves better performance than some existing methods, the precision of the method can get up to 96%.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121241467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Entity Relation Extraction Model Based on Semantic Pattern Matching","authors":"Tiezheng Nie, Derong Shen, Yue Kou, Ge Yu, D. Yue","doi":"10.1109/WISA.2011.9","DOIUrl":"https://doi.org/10.1109/WISA.2011.9","url":null,"abstract":"This paper proposes a relation extraction model based on semantic pattern matching in Web environment. It consists of frequent pattern extraction, pattern clustering based on density, and pattern matching based on semantic similarity. First, based on the entities with known relations in a limited training set, we extract relation patterns containing these named entities from the web page. Then the relations between entities from the web page in specific areas can be extracted based on these relation patterns extracted. Experiments show the affectivity and the self-adaptive of our method on extracting relations between entities from dynamic web environment.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134033123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Star Join for Column-oriented Data Store in the MapReduce Environment","authors":"Haitong Zhu, Minqi Zhou, Fan Xia, Aoying Zhou","doi":"10.1109/WISA.2011.10","DOIUrl":"https://doi.org/10.1109/WISA.2011.10","url":null,"abstract":"Map Reduce is a parallel computing paradigm that has gained a lot of attention from both industry and academia recent years. Unlike parallel DBMSs, with Map Reduce, it is easier for non-expert to develop scalable parallel programs for analytical applications over huge data sets across clusters of commodity machines. As the nature of scan-oriented processing, the performance of Map Reduce for relation operators can be enhanced dramatically since it is inevitably accessing lots of unnecessary data tuples, especially for table join operators. In this paper, we propose an efficient star join strategy called HdBmp join for column-oriented data store by using a three-level content aware index (i.e., HdBmp Index). Armed with this index, most of the unnecessary tuples in the join processing can be filtered out, and consequently result in immense reduction in both communication cost and execution time. Our extensive experimental studies confirm the efficiency, scalability and effectiveness of our new proposed join methods.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134475666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation Method of Information Credibility Based on the Trust Features of Web Page","authors":"Jing Xu, Xiaoping Yang, Liang Wang","doi":"10.1109/WISA.2011.20","DOIUrl":"https://doi.org/10.1109/WISA.2011.20","url":null,"abstract":"The uncertainty of the trustworthiness of the information has been a great obstacle for users to obtain and share true information on the Internet. Therefore, it is worth to further study the evaluation of the information credibility. This paper introduces a kind of evaluation method of information credibility based on the trust features of web page. After the trust features are defined according to the attributes of web page, the evaluation criteria of information credibility are established. Finally, the reference value of credibility can be obtained through the calculation of reliability evaluation. Experiments prove the method to be effective.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124876282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Rule Description Model Based on Massive Data Processing","authors":"Guigang Zhang, Yong Zhang, Chunxiao Xing, P. Sheu","doi":"10.1109/WISA.2011.14","DOIUrl":"https://doi.org/10.1109/WISA.2011.14","url":null,"abstract":"Massive rules processing has attracted more attention in recently years. Firstly, we propose a rule description language that can express all kind of rules by structured nature language. We design a set of graphical symbols for rule nodes. We also propose a rule traffic flow model and a rule cost model. Thought these models, it is easier to process massive numbers rules and optimize them.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133739243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Concept Hierarchy from Folksonomy","authors":"Shubin Cai, Heng Sun, Sishan Gu, Zhong Ming","doi":"10.1109/WISA.2011.16","DOIUrl":"https://doi.org/10.1109/WISA.2011.16","url":null,"abstract":"Users often use tags to annotate and categorize web content. A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags. The most significant feature of a folksonomy is that it directly reflects the vocabulary of users. This feature is very useful in tag-based content searching and user browsing. Based on mutual-overlapping measurement of tag's instance sets, an ontology learning algorithm to construct concept hierarchy from folksonomy is proposed. A case study of datasets from a famous Chinese e-business website taobao is carried out. The precision, valid, recall and F-measure rates of the constructed concept hierarchy are 54%, 84%, 100% and 70% respectively. The experimental results on real world datasets show that the proposed method is feasible.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"23 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114025284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Metric for Test Case Prioritization","authors":"Xiaofang Zhang, Bo Qu","doi":"10.1109/WISA.2011.31","DOIUrl":"https://doi.org/10.1109/WISA.2011.31","url":null,"abstract":"Test case prioritization is an effective and practical technique of regression testing. To illustrate its effectiveness, many test metrics were proposed. In this paper, the physical meanings of these metrics were explained and their limitations were pointed out. Then, an improved metric and its extension for test case prioritization were proposed. The case study indicates that, compared with existing metrics, our new metric can provide much more precise illustration of the effectiveness of test case prioritization techniques.","PeriodicalId":242633,"journal":{"name":"2011 Eighth Web Information Systems and Applications Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122388558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}