Trans. Data Priv.Pub Date : 2022-05-13DOI: 10.48550/arXiv.2205.06506
Balázs Pejó, Mina Remeli, Adam Arany, M. Galtier, G. Ács
{"title":"Collaborative Drug Discovery: Inference-level Data Protection Perspective","authors":"Balázs Pejó, Mina Remeli, Adam Arany, M. Galtier, G. Ács","doi":"10.48550/arXiv.2205.06506","DOIUrl":"https://doi.org/10.48550/arXiv.2205.06506","url":null,"abstract":"Pharmaceutical industry can better leverage its data assets to virtualize drug discovery through a collaborative machine learning platform. On the other hand, there are non-negligible risks stemming from the unintended leakage of participants' training data, hence, it is essential for such a platform to be secure and privacy-preserving. This paper describes a privacy risk assessment for collaborative modeling in the preclinical phase of drug discovery to accelerate the selection of promising drug candidates. After a short taxonomy of state-of-the-art inference attacks we adopt and customize several to the underlying scenario. Finally we describe and experiments with a handful of relevant privacy protection techniques to mitigate such attacks.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125271961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreword for the special issue of selected papers from the 7th EDBT/ICDT Workshop on Privacy and Anonymity in Information Society (PAIS 2014)","authors":"T. Truta, Li Xiong, F. Fotouhi","doi":"10.5555/2870564.2870565","DOIUrl":"https://doi.org/10.5555/2870564.2870565","url":null,"abstract":"The seventh Workshop on Privacy and Anonymity in Information Society (PAIS 2014) was held in conjunction with the International Conference on Extending Database Technology (EDBT) and International Conference on Database Theory (ICDT) in Athens, Greece. \u0000 \u0000The PAIS 2014 workshop provided an open yet focused platform for researchers and practitioners from fields such as computer science, statistics, healthcare informatics, and law to discuss and present current research challenges and advances in data privacy and anonymity research. \u0000 \u0000The present special issue contains three extended papers that have been selected as the best three papers presented at PAIS 2014 workshop.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125063100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Censors for Boolean Description Logic","authors":"T. Studer, Johannes Werner","doi":"10.7892/BORIS.61797","DOIUrl":"https://doi.org/10.7892/BORIS.61797","url":null,"abstract":"Protecting different kinds of information has become an important area of research. One aspect is to provide effective means to avoid that secrets can be deduced from the answers of legitimate queries. In the context of atomic propositional databases several methods have been developed to achieve this goal. However, in those databases it is not possible to formalize structural information. Also they are quite restrictive with respect to the specification of secrets. In this paper we extend those methods to match the much greater expressive power of Boolean description logics. In addition to the formal framework, we provide a discussion of various kinds of censors and establish different levels of security they can provide.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131313758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Korra Sathya Babu, N. Reddy, Nitesh Kumar, M. Elliot, S. K. Jena
{"title":"Achieving k-anonymity Using Improved Greedy Heuristics for Very Large Relational Databases","authors":"Korra Sathya Babu, N. Reddy, Nitesh Kumar, M. Elliot, S. K. Jena","doi":"10.5555/2612156.2612157","DOIUrl":"https://doi.org/10.5555/2612156.2612157","url":null,"abstract":"Advances in data storage, data collection and inference techniques have enabled the creation of huge databases of personal information. Dissemination of information from such databases-even if formally anonymised, creates a serious threat to individual privacy through statistical disclosure. One of the key methods developed to limit statistical disclosure risk is k-anonymity. Several methods have been proposed to enforce k-anonymity notably Samarati's algorithm and Sweeney's Datafly, which both adhere to full domain generalisation. Such methods require a trade off between computing time and information loss. This paper describes an improved greedy heuristic for enforcing k-anonymity with full domain generalisation. The improved greedy algorithm was compared with the original methods. Metrics like information loss, computing time and level of generalisation were deployed for comparison. Results show that the improved greedy algorithm maintains a better balance between computing time and information loss.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116227876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fang-Yu Rao, Jianneng Cao, Mehmet Kuzu, E. Bertino, Murat Kantarcioglu
{"title":"Efficient tree pattern queries on encrypted XML documents","authors":"Fang-Yu Rao, Jianneng Cao, Mehmet Kuzu, E. Bertino, Murat Kantarcioglu","doi":"10.1145/2457317.2457338","DOIUrl":"https://doi.org/10.1145/2457317.2457338","url":null,"abstract":"Outsourcing XML documents is a challenging task, because it encrypts the documents, while still requiring efficient query processing. Past approaches on this topic either leak structural information or fail to support searching that has constraints on XML node content. In addition, they adopt a filtering-and-refining framework, which requires the users to prune false positives from the query results. To address these problems, we present a solution for efficient evaluation of tree pattern queries (TPQs) on encrypted XML documents. We create a domain hierarchy, such that each XML document can be embedded in it. By assigning each node in the hierarchy a position, we create for each document a vector, which encodes both the structural and textual information about the document. Similarly, a vector is created also for a TPQ. Then, the matching between a TPQ and a document is reduced to calculating the distance between their vectors. For the sake of privacy, such vectors are encrypted before being outsourced. To improve the matching efficiency, we use a k-d tree to partition the vectors into non-overlapping subsets, such that non-matchable documents are pruned as early as possible. The extensive evaluation shows that our solution is efficient and scalable to large dataset.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121748301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vignesh Ganapathy, Dilys Thomas, T. Feder, H. Garcia-Molina, R. Motwani
{"title":"Distributing data for secure database services","authors":"Vignesh Ganapathy, Dilys Thomas, T. Feder, H. Garcia-Molina, R. Motwani","doi":"10.1145/1971690.1971698","DOIUrl":"https://doi.org/10.1145/1971690.1971698","url":null,"abstract":"The advent of database services has resulted in privacy concerns on the part of the client storing data with third party database service providers. Previous approaches to enabling such a service have been based on data encryption, causing a large overhead in query processing. A distributed architecture for secure database services is proposed as a solution to this problem where data is stored at multiple servers. The distributed architecture provides both privacy as well as fault tolerance to the client. In this paper we provide algorithms for (1) distributing data: our results include hardness of approximation results and hence a heuristic greedy algorithm for the distribution problem (2) partitioning the query at the client to queries for the servers is done by a bottom up state based algorithm. Finally the results at the servers are integrated to obtain the answer at the client. We provide an experimental validation and performance study of our algorithms.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"121 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128491963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Disclosure Control for Microdata Using the R-Package sdcMicro","authors":"M. Templ","doi":"10.18637/JSS.V067.I04","DOIUrl":"https://doi.org/10.18637/JSS.V067.I04","url":null,"abstract":"The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data.The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.","PeriodicalId":374808,"journal":{"name":"Trans. Data Priv.","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128281584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}