{"title":"基于子模块的信息瓶颈和隐私漏斗聚类算法","authors":"Ni Ding, P. Sadeghi","doi":"10.1109/ITW44776.2019.8989355","DOIUrl":null,"url":null,"abstract":"For the relevant data $S$ that nests in the observation $X$, the information bottleneck (IB) aims to encode $X$ into $\\hat {X}$ in order to maximize the extracted useful information $I(S;\\hat {X})$ with the minimum coding rate $I(X;\\hat {X})$. For the dual privacy tunnel (PF) problem where $S$ denotes the sensitive$/\\mathrm {p}\\mathrm {r}\\mathrm {i}\\mathrm {v}\\mathrm {a}\\mathrm {t}\\mathrm {e}\\wedge $ data, the goal is to minimize the privacy leakage $I(S;X)$ while maintain a certain level of utility $I(X;\\hat {X})$. For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet $\\hat {\\mathcal {X}}:= \\mathcal {X}$ and iteratively merges the elements in the current alphabet $ \\hat {\\mathcal {X}}$ that optimizes the Lagrangian function $I(S;\\hat {X})-\\lambda I(X;X)$. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\\hat {\\mathcal {X}}$ by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier $\\lambda $, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $I(S;\\hat {X}) \\mathrm {v}\\mathrm {s}. -I(X;\\hat {X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.","PeriodicalId":214379,"journal":{"name":"2019 IEEE Information Theory Workshop (ITW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel\",\"authors\":\"Ni Ding, P. Sadeghi\",\"doi\":\"10.1109/ITW44776.2019.8989355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the relevant data $S$ that nests in the observation $X$, the information bottleneck (IB) aims to encode $X$ into $\\\\hat {X}$ in order to maximize the extracted useful information $I(S;\\\\hat {X})$ with the minimum coding rate $I(X;\\\\hat {X})$. For the dual privacy tunnel (PF) problem where $S$ denotes the sensitive$/\\\\mathrm {p}\\\\mathrm {r}\\\\mathrm {i}\\\\mathrm {v}\\\\mathrm {a}\\\\mathrm {t}\\\\mathrm {e}\\\\wedge $ data, the goal is to minimize the privacy leakage $I(S;X)$ while maintain a certain level of utility $I(X;\\\\hat {X})$. For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet $\\\\hat {\\\\mathcal {X}}:= \\\\mathcal {X}$ and iteratively merges the elements in the current alphabet $ \\\\hat {\\\\mathcal {X}}$ that optimizes the Lagrangian function $I(S;\\\\hat {X})-\\\\lambda I(X;X)$. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\\\\hat {\\\\mathcal {X}}$ by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier $\\\\lambda $, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $I(S;\\\\hat {X}) \\\\mathrm {v}\\\\mathrm {s}. -I(X;\\\\hat {X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.\",\"PeriodicalId\":214379,\"journal\":{\"name\":\"2019 IEEE Information Theory Workshop (ITW)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Information Theory Workshop (ITW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITW44776.2019.8989355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Information Theory Workshop (ITW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITW44776.2019.8989355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel
For the relevant data $S$ that nests in the observation $X$, the information bottleneck (IB) aims to encode $X$ into $\hat {X}$ in order to maximize the extracted useful information $I(S;\hat {X})$ with the minimum coding rate $I(X;\hat {X})$. For the dual privacy tunnel (PF) problem where $S$ denotes the sensitive$/\mathrm {p}\mathrm {r}\mathrm {i}\mathrm {v}\mathrm {a}\mathrm {t}\mathrm {e}\wedge $ data, the goal is to minimize the privacy leakage $I(S;X)$ while maintain a certain level of utility $I(X;\hat {X})$. For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet $\hat {\mathcal {X}}:= \mathcal {X}$ and iteratively merges the elements in the current alphabet $ \hat {\mathcal {X}}$ that optimizes the Lagrangian function $I(S;\hat {X})-\lambda I(X;X)$. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\hat {\mathcal {X}}$ by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier $\lambda $, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $I(S;\hat {X}) \mathrm {v}\mathrm {s}. -I(X;\hat {X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.