{"title":"Maximizing data utility while preserving privacy through database fragmentation","authors":"Ali Amiri","doi":"10.1016/j.eswa.2025.126873","DOIUrl":null,"url":null,"abstract":"<div><div>Efficiently managing databases that balance data privacy with utility is a critical challenge in today’s data-driven landscape. This study addresses the problem of database fragmentation, which involves dividing a database into smaller fragments, each containing a subset of attributes. The primary objective is to strike a balance between safeguarding the confidentiality of sensitive attribute sets and optimizing the database’s utility. Sensitive attribute sets include combinations of attributes that could disclose private information or identify individuals, such as personal quasi-identifiers, necessitating their separation into distinct fragments to reduce the risk of sensitive data exposure. Conversely, utility attribute sets consist of attributes that enhance data usability and query efficiency. Maximizing utility requires grouping attributes from the same utility set into as few fragments as possible. To effectively solve this complex NP-hard problem, A column generation-based solution leveraging a set partitioning formulation is presented. Experimental evaluations on real and synthetic datasets validate the efficiency of the proposed approach, demonstrating its superiority over the state-of-the-art commercial solver, CPLEX.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126873"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004956","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Efficiently managing databases that balance data privacy with utility is a critical challenge in today’s data-driven landscape. This study addresses the problem of database fragmentation, which involves dividing a database into smaller fragments, each containing a subset of attributes. The primary objective is to strike a balance between safeguarding the confidentiality of sensitive attribute sets and optimizing the database’s utility. Sensitive attribute sets include combinations of attributes that could disclose private information or identify individuals, such as personal quasi-identifiers, necessitating their separation into distinct fragments to reduce the risk of sensitive data exposure. Conversely, utility attribute sets consist of attributes that enhance data usability and query efficiency. Maximizing utility requires grouping attributes from the same utility set into as few fragments as possible. To effectively solve this complex NP-hard problem, A column generation-based solution leveraging a set partitioning formulation is presented. Experimental evaluations on real and synthetic datasets validate the efficiency of the proposed approach, demonstrating its superiority over the state-of-the-art commercial solver, CPLEX.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.