Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu
{"title":"ERGP: A Combined Entity Resolution Approach with Genetic Programming","authors":"Chenchen Sun, Derong Shen, Yue Kou, Tiezheng Nie, Ge Yu","doi":"10.1109/WISA.2014.46","DOIUrl":"https://doi.org/10.1109/WISA.2014.46","url":null,"abstract":"Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties' comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes' pair, leaving no burden of setting thresholds to the user.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133438556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Evolution of Complex Event Based on Correlations Between Events","authors":"Xia Li, Yongqing Zheng, Yongquan Dong","doi":"10.1109/WISA.2014.17","DOIUrl":"https://doi.org/10.1109/WISA.2014.17","url":null,"abstract":"There are large numbers of news articles on Web pages every day. Each article usually reports some aspects of a complex event, but they do not report the whole picture of the event. People are often interested in not only a single event but also the correlations between events and the evolution of the complex event, if they want to be aware of the whole picture of the complex event, they have to browse many Web pages. To solve this problem, we propose a method to find the correlations between events, and the evolution of the complex events. We use the signal words and the co-occurrence of the events in the news articles to discover the correlations between events, and construct an event correlation evolution graph. Experiments test and validate our method.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130578897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwen Jiang, Yongji Wu, Yong Zhang, C. Li, Chunxiao Xing
{"title":"AB-Tree: A Write-Optimized Adaptive Index Structure on Solid State Disk","authors":"Zhiwen Jiang, Yongji Wu, Yong Zhang, C. Li, Chunxiao Xing","doi":"10.1109/WISA.2014.42","DOIUrl":"https://doi.org/10.1109/WISA.2014.42","url":null,"abstract":"Big Data boosts the development of data management and analysis in database systems but it also poses a challenge to traditional storages. Flash-based Solid State Disks (SSDs) are provided to deal with the new challenges brought by Big Data. However, SSD has the problem of read-write asymmetry due to the unique features of flash memory, which presents significant challenges in designing tree index for flash-based DBMS. In this paper, we designed the Adaptive Batched Tree (AB-Tree), a variant of the B-Tree to improve write performance. AB-Tree implements a bucket-based structure to perform bulk insertion. In addition, all the modifications to existing entries are performed in a lazy way so as to avoid small random writes. Besides, AB-Tree also has an adaptive bucket layout which can dynamically adapt to workload characteristics on-the-fly. Experimental results show that AB-Tree can achieve 6X to 133X gains over the state-of-art tree indexes across a range of workloads on SSDs.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Software Birthmark Based on System Call and Program Data Dependence","authors":"Kaige Liu, Tao Zheng, Linxi Wei","doi":"10.1109/WISA.2014.28","DOIUrl":"https://doi.org/10.1109/WISA.2014.28","url":null,"abstract":"With the rapid development of software technology and open source projects, software industry becomes more and more threatened by software piracy. As an excellent detection technique of software piracy, software birthmark, which can describe the unique characteristic of a program, has obtained more and more attention. In this paper, we propose a software birthmark called SCDG-DDGB (System Call Dependence Graph - C Data Dependence Graph Birthmark) which combines system call dependence with program data dependence. SCDG-DDGB keeps the advantages of system call based software birthmark and expands the scope of detection. What's more, SCDG-DDGB also can be used to detect algorithm plagiarism. We demonstrate the accuracy of SCDG-DDGB and evaluate the robustness with many powerful obfuscation techniques. The result shows that SCDG-DDGB is reliable and effective in detecting software piracy.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122701522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanli Song, Zhuoming Xu, Yan Tang, Lili Lin, Lixian Ni
{"title":"Towards a Lay-User Interface for Querying DBpedia","authors":"Wanli Song, Zhuoming Xu, Yan Tang, Lili Lin, Lixian Ni","doi":"10.1109/WISA.2014.11","DOIUrl":"https://doi.org/10.1109/WISA.2014.11","url":null,"abstract":"As an RDF representation of information extracted from Wikipedia, DBpedia has been serving as an interlinking hub for many other RDF data sets in the Linked Open Data cloud on the Web of Data. Therefore, accessing the DBpedia data set is a common task for various Semantic Web applications. Given the fact that DBpedia is currently accessed through public SPARQL endpoints, which requires the users to know and understand the vocabularies of the DBpedia ontology and to master the complicated syntax and semantics for the SPARQL language. This is a tough task for the querying users, especially for lay-users without sufficient knowledge of Semantic Web techniques. The goal of this paper is to provide the users, especially lay-users, with a friendly, easy-to-use interface for querying the DBpedia dataset. Such a user interface can create a visual class hierarchy of the DBpedia ontology for the users to browse and let the users specify query conditions by means of simple form-based operations. We design a group of algorithms for the lay-user interface, implement a prototype, LUQI-DBpedia, of the interface, and use LUQI-DBpedia to carry out query experiments. The implementation and experimental results indicate that the proposed approach is achievable using Java programming language and Apache Jena framework for Java.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132069222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongji Wu, Yong Zhang, Chaoshuo Wang, C. Li, Chunxiao Xing
{"title":"HuaVideo: Towards a Secure, Scalable and Compatible HTML5 Video Providing System","authors":"Yongji Wu, Yong Zhang, Chaoshuo Wang, C. Li, Chunxiao Xing","doi":"10.1109/WISA.2014.23","DOIUrl":"https://doi.org/10.1109/WISA.2014.23","url":null,"abstract":"Video is able to convey huge amount of information and it is now one of the most popular media of content service on Internet. After HTML5 emerged, supporting playing video files by using the \"\" tag becomes the new standard. Nowadays, most of the devices and browsers have already supports playing video online by HTML5 player. Unlike the traditional stream video service, the video tag in HTML5 requires a URL of file, which makes the video able to be downloaded. Video occupies high volume storage, which requires a video server has huge amount of storage, therefore applying scalable distributed storage is necessary. In order to protect the content and provide large storage to HTML5 videos, we designed a new architect of video server, called HuaVideo. It applies DDB(Distributed Database) to store videos, which provides scalable service. Besides, we make use of HTTP header to identify whether the request is valid. At the same time, we generate a unique URL to locate the video which can be used only once. Moreover, only partial data is available when user try to download the content. These mechanisms guarantee the contents of the server cannot be downloaded by normal means. The results of our test prove the effectiveness of our design.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128093306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology-Based Integration and Sharing of Big Data Educational Resources","authors":"Jing Xiong, Yuntong Liu, W. Liu","doi":"10.1109/WISA.2014.51","DOIUrl":"https://doi.org/10.1109/WISA.2014.51","url":null,"abstract":"In the era of big data, massive educational resources are stored on the internet and mobile networks. However, most of these resources are heterogeneous and decentralized, they have different format. The resources are designed for humans to read and not understandable to the machine. Their low level sharing and reuse make them difficult to acquire. How to access the resources the users need quickly and efficiently and take advantage of them is a serious problem. In order to solve the problem, an ontology-based integration educational resources framework and sharing strategies are proposed. Using the advantages of ontological semantics these educational resources can be annotated semantically, so that the computer can understand and deal with the marked information. The ontology-based integration and sharing strategies can improve the recall and precision of the educational resources retrieval.","PeriodicalId":366169,"journal":{"name":"2014 11th Web Information System and Application Conference","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122354890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}