{"title":"Design and Implementation of Multi-Version Disk Backup Data Merging Algorithm","authors":"Guangjun Wu, Xiao-chun Yun, Shupeng Wang","doi":"10.1109/WAIM.2008.51","DOIUrl":"https://doi.org/10.1109/WAIM.2008.51","url":null,"abstract":"Multi-version data management in disk backup and recovery is to manage the temporal attribute of backuped data. It can support to retrieve timestamp (time slice) disk data according to different query type. Exiting multi-version data management algorithms have two shortcomings. First, they are inefficient in multi-time point data query and updating which are adopted by data backup and recovery usually. Second, they use centralized data indexes which are not suitable for backup data management. To overcome these limitations, Backup Data Merging (BDM) algorithm is proposed in this paper, which uses distributed storage structure according to disk data format. By range operation, BDM algorithm can generate timestamp (time slice) data index dynamically. By comparing with traditional algorithms, BDM algorithm achieves high performance in storage utilization and query efficiency.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cost-Efficient Method for Continuous Top-k Processing over Data Stream","authors":"Li Zhang, Li Tian, Peng Zou, Aiping Li","doi":"10.1109/WAIM.2008.76","DOIUrl":"https://doi.org/10.1109/WAIM.2008.76","url":null,"abstract":"Continuous top-k query over data stream is very important for several on-line applications, including network monitoring, communication, sensor networks and stock market trading, etc. In this paper, we propose an effective pruning technique, which minimizes the number of tuples that need to be stored and manipulated. Based on it, a cost-efficient method for continuous top-k processing over single data stream is proposed, whose computation complex and memory requirements are greatly decreased. The data structure we use is able to support preference function whether it is or not monotonic and the running time is hardly effected by dimensions. Theoretical analysis and experimental evidences show the efficiency of proposed approaches both on storage reduction and performance improvement.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134234379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Strategy Selection Model for Network Survivability Based on Fuzzy Matrix Game","authors":"Guo-sheng Zhao, Huiqiang Wang, Jian Wang","doi":"10.1109/WAIM.2008.105","DOIUrl":"https://doi.org/10.1109/WAIM.2008.105","url":null,"abstract":"Survivability has emerged as a new phase for the development of network security technique, and how to improve the system survivability using effective strategy is an important problem. In this paper, by analyzing the fuzzy matrix game (FMG) theory and network survivability mechanism, a novel strategy selection model for network survivability based on FMG theory and its dynamic analysis method are presented from the macroscopically view. The result of instance analysis and validation show that the proposed method can provide guarantee conditions for network survivability effectively.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128925228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Correlated Item Pairs through Efficient Pruning with a Given Threshold","authors":"Bo Wang, Liang Su, Aiping Li, Peng Zou","doi":"10.1109/WAIM.2008.84","DOIUrl":"https://doi.org/10.1109/WAIM.2008.84","url":null,"abstract":"Given a minimum threshold in a massive market-basket data set, an item pair whose correlation above the threshold is considered correlated. In this paper, we provide a randomized algorithm SERIT-a Searching-corrElated-pair Randomized algorithm for dIfferent Thresholds- to find all correlated pairs effectively, which adopts the Pearson's correlation coefficient [11] as the measure criterion. In their CIKM'06 paper [2], Zhang et al. address the same problem by taking the relation of Pearson's coefficient and Jaccard distance into account. However, it is inefficient when the threshold is small. We propose a new probability function to prune uncorrelated item pairs based on [2], which can cover the shortage of the former one. Experimental results with synthetic and real data sets reveal that with a given threshold, even if it is small, SERIT algorithm can prune the item pairs unwanted efficiently and save large computational resources.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"2001 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133145009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advanced Star Coordinates","authors":"Yang Sun, Jiuyang Tang, Daquan Tang, W. Xiao","doi":"10.1109/WAIM.2008.20","DOIUrl":"https://doi.org/10.1109/WAIM.2008.20","url":null,"abstract":"With the development of data collection technology, effective visualization tools are needed urgently to understand the abundant multidimensional and multivariate data and information in the science, engineering and commerce fields. Star Coordinates is a traditional multivariate data visualization technique, but there are some limitations of it. In the paper we propose the advanced star coordinates (ASC), which addresses these drawbacks. ASC uses the diameter instead of the radius as the dimension axis, projects the multidimensional information object to low dimension visual space, which is meaningful to users, and designs the dimension configuration strategy to optimize the order and angle of the dimension axes. The experiment results show that the dimension configuration strategy reduces the user operation burden greatly and helps them explore the connotative characteristics of the multidimensional information aggregation quickly and exactly. The visualization result is easily understandable and expresses the dimension distribution information effectively.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122107024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Communication-Efficient Method for Distributed Threshold Monitoring","authors":"Li Tian, Peng Zou, Feng Wu, Aiping Li","doi":"10.1109/WAIM.2008.55","DOIUrl":"https://doi.org/10.1109/WAIM.2008.55","url":null,"abstract":"The problem of communication reduction over continuous threshold monitoring in distributed systems is considered in this paper. A Communication Efficient Method (CEM) is proposed which utilizes the relationship among objects and processes them as a whole, therefore achieves better performance than those who holding each object separately. In specific, the object with largest value is chose as the representative object, and adjustment factors are used to guarantee that local value of representative object is also the largest one in each remote node. Therefore, only the representative object needs to be monitored continuously as long as all the local constraints are valid. When local constraint is violated, communication is needed among the coordinator and remote nodes to rebuild the constraint. The algorithms are described in this paper; algorithms' correctness proof and extension are also provided. Experimental evaluation on real data sets show the efficiency of CEM on communication reduction over distributed threshold monitoring.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128647691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AMCAS: An Automatic Malicious Code Analysis System","authors":"Jia Zhang, Yuntao Guan, Xiaoxin Jiang, Haixin Duan, Jianping Wu","doi":"10.1109/WAIM.2008.44","DOIUrl":"https://doi.org/10.1109/WAIM.2008.44","url":null,"abstract":"With the development of malicious code technology, the number of malicious code has continued to increase. So it is imperative to optimize the traditional manual analysis method by automatic malicious code analysis system. This paper presents AMCAS - an automatic malicious code analysis system. It includes malicious code static analyzer, dynamic analyzer and network behavior analyzer. Compared with some existing automatic analysis systems, this system integrates the advantages of static and dynamic analysis, and imports network behavior analysis. Static analyzer can get the unpacked binary code and CallGraph; dynamic analyzer can get the host behavior of malicious code and network behavior analyzer can get the malicious network behavior profile. Experiment shows that this system can get malicious code information efficiently.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127139256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Ontology to Enhance Collaborative Recommendation Based on Community","authors":"Li Yu","doi":"10.1109/WAIM.2008.47","DOIUrl":"https://doi.org/10.1109/WAIM.2008.47","url":null,"abstract":"Collaborative filtering is an important personalized recommendation technique applied widely in E-commerce. It is not adapted to multi-interest or title recommendation for the 'general neighbourhood' problem which is analyzed in this paper. Based on it, collaborative filtering recommendation based on community is presented by introducing the concept 'community neighbourhood' in the paper. Unfortunately, it results into severer sparsity problem which makes heavy effect on its performance. In order to overcome it, an ontological A-priori score is used to infer user preference and to pre-fill null rating first. After pre-filling using the ontology method, then collaborative filtering based on community is executed based on a dense rating matrix. The experiment shows that collaborative filtering based on community makes generally better performance than traditional method when data is not very sparse, and ontology method can truly enhance collaborative filtering based on community since the sparsity is overcame.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129117461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-ling Chang, Xiao-chun Yun, Binxing Fang, Shupeng Wang, Shuhao Li
{"title":"A Quasi Word-Based Compression Method of English Text Using Byte-Oriented Coding Scheme","authors":"Wei-ling Chang, Xiao-chun Yun, Binxing Fang, Shupeng Wang, Shuhao Li","doi":"10.1109/WAIM.2008.89","DOIUrl":"https://doi.org/10.1109/WAIM.2008.89","url":null,"abstract":"In this paper we present a universal compression algorithm for English text, ERecode. The proposed scheme highlights the importance of pre-processing work for English text, and employs one or two bytes code values to recode the 511 most common used English words, sequences of symbols and ASCII codes based on their occurrence frequency. Acting as a pre-processing tool for English text by the popular compression utilities, ERecode can improve their compression ratio from 0.89% to 19.65%. The proposed method also is applicable to text files for other languages.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhemin Zhu, Chen Wang, Li Ma, Yue Pan, Zhiming Ding
{"title":"Scalable Community Discovery of Large Networks","authors":"Zhemin Zhu, Chen Wang, Li Ma, Yue Pan, Zhiming Ding","doi":"10.1109/WAIM.2008.13","DOIUrl":"https://doi.org/10.1109/WAIM.2008.13","url":null,"abstract":"Over the past decade, community structure, a statistical property of networked systems such as social network and World Wide Web, has attracted considerable attention in data mining field because it enables description and prediction of complex networks. Many highly sensitive graph clustering algorithms were developed for identification of communities having dense connections internally and loose connections with others. In this context, Newman and Girvan proposed modularity Q score for quantifying the strength of community structure and measuring the fitness of a division. The Q function has become an important standard recently. In this paper, combining the strengths of the Q score and multilevel paradigm first developed for graph partitioning, we introduced a scalable algorithm MOME (i.e. modularity-based multilevel graph clustering) to efficiently discover communities from a network. The experimental results indicated that MOME ran extremely faster and finally achieved a division with a slightly higher Q score against the latest modularity-based method and its variants, particularly when the network was of a large-scale.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122242259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}