{"title":"rsLDA: A Bayesian hierarchical model for relational learning","authors":"Claudio Taranto, Nicola Di Mauro, F. Esposito","doi":"10.1109/ICDKE.2011.6053932","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053932","url":null,"abstract":"We introduce and evaluate a technique to tackle relational learning tasks combining a framework for mining relational queries with a hierarchical Bayesian model. We present the novel rsLDA algorithm that works as follows. It initially discovers a set of relevant features from the relational data useful to describe in a propositional way the examples. This corresponds to reformulate the problem from a relational representation space into an attribute-value form. Afterwards, given this new features space, a supervised version of the Latent Dirichlet Allocation model is applied in order to learn the probabilistic model. The performance of the proposed method when applied on two real-world datasets shows an improvement when compared to other methods.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114264346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SVM based approaches for classifying protein tertiary structures","authors":"G. Mirceva, D. Davcev","doi":"10.1109/ICDKE.2011.6053917","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053917","url":null,"abstract":"The tertiary structure of a protein molecule is the main factor which can be used to determine its chemical properties as well as its function. The knowledge of the protein function is crucial in the development of new drugs, better crops and synthetic biochemicals. With the rapid development in technology, the number of determined protein structures increases every day, so retrieving structurally similar proteins using current algorithms takes too long. Therefore, improving the efficiency of the methods for protein structure retrieval and classification is an important research issue in bioinformatics community. In this paper, we present two SVM based protein classifiers. Our classifiers use the information about the conformation of protein structures in 3D space. Namely, our protein voxel and ray based protein descriptors are used for representing the protein structures. A part of the SCOP 1.73 database is used for evaluation of our classifiers. The results show that our approach achieves 98.7% classification accuracy by using the protein ray based descriptor, while it is much faster than other similar algorithms with comparable accuracy. We provide some experimental results.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132670293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multicriteria recommendation method for data with missing rating scores","authors":"A. Takasu","doi":"10.1109/ICDKE.2011.6053931","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053931","url":null,"abstract":"This paper proposes a recommendation method for multi-criteria (MC) collaborative filtering, where users are required to give rating scores from multiple aspects to each item and systems utilize the rich information to improve the recommendation accuracy. One drawback of MC recommender systems is user's cost to give scores to items because it requires rating scores on MC for each item. To overcome this drawback, we aim at developing a MC recommender system that allows missing rating information. This paper proposes generative models for MC recommendation that are robust against missing scores. In these models we convert a list of rating scores on MC to a low dimensional feature space. Correlation among scores on MC is embedded in the feature space. So we can expect that a score list is mapped to a close point in the feature space even if some scores are missing. We conducted experiments to check the robustness of the proposed models by using Yahoo! movie data and experimentally show that they are less affected by missing information compared to Pearson correlation base collaborative filtering method.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127408456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a distributed authenticating CDN","authors":"Sam Moffatt","doi":"10.1109/ICDKE.2011.6053930","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053930","url":null,"abstract":"In recent times, much has been made of the security, or lack thereof, utilised within Facebook's content distribution network (CDN). Their CDN is noted to enable public access to any resource via a GET request presuming the user knows the URL for the resource. This means that not only can users directly access material that they would otherwise not have access to but it also means that material that has been considered “deleted” may still be accessible. noncdn is a content distribution network designed to provide light-weight authenticated access to content stored at edge nodes with easily replicated authentication access through time limited authentication tokens. noncdn provides “volumes” as a container for handling access control and authentication nodes for generation and validation of authentication tokens. As tokens identify individuals, accesses can be logged and tracked to provide extra auditing functionality.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117282305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eman El-Dawy, Hoda M. O. Mokhtar, A. El-Bastawissy
{"title":"Multi-level continuous skyline queries (MCSQ)","authors":"Eman El-Dawy, Hoda M. O. Mokhtar, A. El-Bastawissy","doi":"10.1109/ICDKE.2011.6053927","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053927","url":null,"abstract":"Most of the current work on skyline queries mainly dealt with querying static query points over static data sets. With the advances in wireless communication, mobile computing, and positioning technologies, it has become possible to obtain and manage (model, index, query, etc.) the trajectories of moving objects in real life, and consequently the need for continuous skyline query processing has become more and more pressing. In this paper, we address the problem of efficiently maintaining continuous skyline queries which contain both static and dynamic attributes. We present a Multi-level Continuous Skyline Query (MCSQ) algorithm, which basically creates a pre-computed skyline data set, facilitates skyline update, and enhances query running time and performance. Our algorithm in brief proceeds as follows: First, we distinguish the data points that are permanently in the skyline and use them to derive a search bound. Second, we establish a pre-computed data set for dynamic skyline that depends on the number of skyline levels (M) which is later used to update the first (initial) skyline points. Finally, every time the skyline needs to be updated we use the pre-computed data sets of skyline to update the previous skyline set and consequently updating first skyline. Finally, we present experimental results to demonstrate the performance and efficiency of our algorithm.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114352120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta-model based knowledge discovery","authors":"Dominic Girardi, J. Dirnberger, M. Giretzlehner","doi":"10.1109/ICDKE.2011.6053918","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053918","url":null,"abstract":"Data acquisition and data mining are often seen as two independent processes in research. We introduce a meta-information based, highly generic data acquisition system which is able to store data of almost arbitrary structure. Based on the meta-information we plan to apply data mining algorithms for knowledge retrieval. Furthermore, the results from the data mining algorithms will be used to apply plausibility checks for the subsequent data acquisition, in order to maintain the quality of the collected data. So, the gap between data acquisition and data mining shall be decreased.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121696778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A purpose based usage access control model for E-healthcare services","authors":"Lili Sun, Hua Wang","doi":"10.1109/ICDKE.2011.6053928","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053928","url":null,"abstract":"Information privacy becomes a major concern for customers to provide their private data that can promote future business service, especially in E-healthcare services. E-healthcare is the use of web-based systems to share and deliver information across the Internet that is easy to disclose private data provided by customers. The private data has to be protected through proper authorization and access control models for e-Health systems in a large health organization. Usage access control is considered as the next generation access control model with distinguishing properties of decision continuity. It has been proven efficient to improve security administration with flexible authorization management. Usage control enables finer-grained control over usage of digital objects that offers a better access control to private information in E-healthcare systems. In this paper, we design a comprehensive usage access control approach with purpose extension to tackle such private data protection in E-healthcare services.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117058456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generalization of blocking and windowing algorithms for duplicate detection","authors":"Uwe Draisbach, Felix Naumann","doi":"10.1109/ICDKE.2011.6053920","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053920","url":null,"abstract":"Duplicate detection is the process of finding multiple records in a dataset that represent the same real-world entity. Due to the enormous costs of an exhaustive comparison, typical algorithms select only promising record pairs for comparison. Two competing approaches are blocking and windowing. Blocking methods partition records into disjoint subsets, while windowing methods, in particular the Sorted Neighborhood Method, slide a window over the sorted records and compare records only within the window. We present a new algorithm called Sorted Blocks in several variants, which generalizes both approaches. To evaluate Sorted Blocks, we have conducted extensive experiments with different datasets. These show that our new algorithm needs fewer comparisons to find the same number of duplicates.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127242533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A database model for heterogeneous spatial collections: Definition and algebra","authors":"G. Psaila","doi":"10.1109/ICDKE.2011.6053926","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053926","url":null,"abstract":"Spatial DBMSs usually extends the classical relational model with data types for georeferenced data, providing a suitable extension of SQL. Designing a database for heterogeneous technological infrastructures may be hard, and queries may be hard to write and low to execute. We define a data model able to model, in a natural way, heterogeneous collections of spatial objects. The query algebra provides new operators able to naturally express complex queries on heterogeneous collections, by automatically deriving spatial descriptions from the composition relationships.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127504286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dealing with domain knowledge in association rules mining — Several experiments","authors":"J. Rauch, M. Simunek","doi":"10.1109/ICDKE.2011.6053919","DOIUrl":"https://doi.org/10.1109/ICDKE.2011.6053919","url":null,"abstract":"Experiments concerning dealing with domain knowledge in association rules mining are presented. Formalized items of domain knowledge are used. Each such item is converted into a set of all association rules that can be considered as its consequences.","PeriodicalId":377148,"journal":{"name":"2011 International Conference on Data and Knowledge Engineering (ICDKE)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114752107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}