Enabling dynamic linkage of linguistic census data at Statistics Canada (extended abstract)

A. Casteigts, Marie-Hélène Chomienne, L. Bouchard, Guy-Vincent Jourdan
{"title":"Enabling dynamic linkage of linguistic census data at Statistics Canada (extended abstract)","authors":"A. Casteigts, Marie-Hélène Chomienne, L. Bouchard, Guy-Vincent Jourdan","doi":"10.1109/ISI.2011.5984777","DOIUrl":null,"url":null,"abstract":"Research in population health consists in studying the impact of various factors (determinants) on health, with the longterm objective of yielding better policies, programs, and services. Researchers of Official Language Minority Communities (OLMCs) focus specifically on determinants related to speaking a minority language, such as English in Quebec, or French in the rest of Canada. Investigations of this type require the possibility of associating health data to linguistic information. Unfortunately, the largest health databases in Ontario, held at the Institute for Clinical Evaluative Sciences (ICES), do not contain usable linguistic variables to date. High-quality language variables however exist at Statistics Canada (2006 Census), and we are interested in enabling its linkage to ICES health data in a dynamic way. The linkage we consider is intrinsically transient and aggregated: it consists in allowing ICES to learn interactively how many Francophones are present in a given sample of individuals (sum queries). We suggest two possible privacy-preserving mechanisms to enable dynamic sum queries: 1) by constraining the dataflow itself; 2) by adapting recent results ([1]) to characterize what leakage is at play in our scenario and what parameters impact the tradeoff between leakage and utility. We rely on these results to argue that a safe exposition of linguistic data could indeed be envisioned, and beyond, that similar techniques could be used to enrich provincial health databases in general with a range of federal census data, making it possible to perform fine-grained community-based studies in Canada.","PeriodicalId":220165,"journal":{"name":"Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2011.5984777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Research in population health consists in studying the impact of various factors (determinants) on health, with the longterm objective of yielding better policies, programs, and services. Researchers of Official Language Minority Communities (OLMCs) focus specifically on determinants related to speaking a minority language, such as English in Quebec, or French in the rest of Canada. Investigations of this type require the possibility of associating health data to linguistic information. Unfortunately, the largest health databases in Ontario, held at the Institute for Clinical Evaluative Sciences (ICES), do not contain usable linguistic variables to date. High-quality language variables however exist at Statistics Canada (2006 Census), and we are interested in enabling its linkage to ICES health data in a dynamic way. The linkage we consider is intrinsically transient and aggregated: it consists in allowing ICES to learn interactively how many Francophones are present in a given sample of individuals (sum queries). We suggest two possible privacy-preserving mechanisms to enable dynamic sum queries: 1) by constraining the dataflow itself; 2) by adapting recent results ([1]) to characterize what leakage is at play in our scenario and what parameters impact the tradeoff between leakage and utility. We rely on these results to argue that a safe exposition of linguistic data could indeed be envisioned, and beyond, that similar techniques could be used to enrich provincial health databases in general with a range of federal census data, making it possible to perform fine-grained community-based studies in Canada.
启用加拿大统计局语言普查数据的动态链接(扩展摘要)
人口健康研究包括研究各种因素(决定因素)对健康的影响,其长期目标是产生更好的政策、方案和服务。官方语言少数群体社区(OLMCs)的研究人员特别关注与使用少数群体语言相关的决定因素,例如魁北克省的英语或加拿大其他地区的法语。这类调查需要有可能将健康数据与语言信息联系起来。不幸的是,安大略省最大的临床评价科学研究所(ICES)的健康数据库迄今为止没有包含可用的语言变量。然而,加拿大统计局(2006年人口普查)中存在高质量的语言变量,我们有兴趣以动态的方式使其与ICES健康数据联系起来。我们考虑的联系本质上是短暂的和聚合的:它包括允许ICES交互式地了解在给定的个人样本中有多少讲法语的人(总和查询)。我们建议两种可能的隐私保护机制来实现动态求和查询:1)通过约束数据流本身;2)通过调整最近的结果([1])来描述我们的场景中发生了什么泄漏,以及哪些参数会影响泄漏和效用之间的权衡。我们依靠这些结果来证明,语言数据的安全展示确实是可以设想的,而且除此之外,类似的技术可以用一系列联邦人口普查数据来丰富省级卫生数据库,从而使在加拿大进行细粒度的社区研究成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信