Microsoft Purview: A System for Central Governance of Data

IF 2.6 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Shafi Ahmad, Dillidorai Arumugam, Srdan Bozovic, Elnata Degefa, Sailesh Duvvuri, Steven Gott, Nitish Gupta, Joachim Hammer, Nivedita Kaluskar, Raghav Kaushik, Rakesh Khanduja, Prasad Mujumdar, Gaurav Malhotra, Pankaj Naik, Nikolas Ogg, Krishna Kumar Parthasarthy, Raghu Ramakrishnan, Vlad Rodriguez, Rahul Sharma, Jakub Szymaszek, Andreas Wolter
{"title":"Microsoft Purview: A System for Central Governance of Data","authors":"Shafi Ahmad, Dillidorai Arumugam, Srdan Bozovic, Elnata Degefa, Sailesh Duvvuri, Steven Gott, Nitish Gupta, Joachim Hammer, Nivedita Kaluskar, Raghav Kaushik, Rakesh Khanduja, Prasad Mujumdar, Gaurav Malhotra, Pankaj Naik, Nikolas Ogg, Krishna Kumar Parthasarthy, Raghu Ramakrishnan, Vlad Rodriguez, Rahul Sharma, Jakub Szymaszek, Andreas Wolter","doi":"10.14778/3611540.3611552","DOIUrl":null,"url":null,"abstract":"Modern data estates are spread across data located on premises, on the edge and in one or more public clouds, spread across various sources like multiple relational databases, file and storage systems, and no-SQL systems, both operational and analytic; this phenomenon is referred to as data sprawl. Data administrators who wish to enforce compliance across the entire organization have to inventory their data, identify what parts of it are sensitive, and govern the sensitive data appropriately --- across the entirety of their sprawling data estate. Today, governance of data is completely siloed; each of the data subsystems has its own (and varied) governance features. Policies applied to sensitive data are applied piece-meal by iterating over all the data sources in a custom language specific to each source. This makes data governance cumbersome, error-prone (because a given policy must be manually enforced across different subsystems, inconsistencies can easily arise), and expensive. This paper presents Microsoft Purview , a service for unified governance of the entire data estate of an organization from a single central pane of glass. The Purview service consists of three parts: (1) a Data Map or metadata catalog that is populated by automated scanning of data sources in the organization, (2) a system to store and manage sensitivity classification of data, and (3) a policy system that enables data security officers to author and implement policies that span the entire organization, e.g., a policy that says, \"Non-full-time employees should be denied access to data classified as PII (Personally Identifiable Information.\") Purview transforms data governance across a complex data estate by offering the ability to govern centrally and automating data discovery, classification and policy enforcement. While other commercial catalog systems also build a global catalog, Purview is unique in its support for policies. It is also distinguished by covering both structured and unstructured data, thanks to its deep integration with Office 365 and its governance framework; indeed, \"Microsoft Purview\" represents a new unified offering that combines Office 365 governance and what was formerly a service for governing structured data called \"Azure Purview\". By integrating with Office 365's Rights Management Service, Purview offers central governance over structured data stored in databases and stores, reports in systems such as Power BI, as well as document data stored in Office 365. The Purview vision is to make the metadata in the Data Map increasingly richer through further automation and curation support and to use this 360 degree view of the data estate to support a wide range of governance policies, ranging from access control to lifecycle management (e.g., retention, deletion, restricting data movement). This paper covers the design and implementation challenges in building the Purview service for Attribute-Based Access Control (ABAC) policies, focusing specifically on a detailed description of its integration with Azure SQL Database. We illustrate the power of unifying Office 365 governance with structured data governance through Purview policies that enforce consistent access control even as data flows between Office 365 and structured data engines like Azure SQL Database. We also describe the results of our empirical evaluation of the performance overheads imposed by Purview.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"11 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611552","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Modern data estates are spread across data located on premises, on the edge and in one or more public clouds, spread across various sources like multiple relational databases, file and storage systems, and no-SQL systems, both operational and analytic; this phenomenon is referred to as data sprawl. Data administrators who wish to enforce compliance across the entire organization have to inventory their data, identify what parts of it are sensitive, and govern the sensitive data appropriately --- across the entirety of their sprawling data estate. Today, governance of data is completely siloed; each of the data subsystems has its own (and varied) governance features. Policies applied to sensitive data are applied piece-meal by iterating over all the data sources in a custom language specific to each source. This makes data governance cumbersome, error-prone (because a given policy must be manually enforced across different subsystems, inconsistencies can easily arise), and expensive. This paper presents Microsoft Purview , a service for unified governance of the entire data estate of an organization from a single central pane of glass. The Purview service consists of three parts: (1) a Data Map or metadata catalog that is populated by automated scanning of data sources in the organization, (2) a system to store and manage sensitivity classification of data, and (3) a policy system that enables data security officers to author and implement policies that span the entire organization, e.g., a policy that says, "Non-full-time employees should be denied access to data classified as PII (Personally Identifiable Information.") Purview transforms data governance across a complex data estate by offering the ability to govern centrally and automating data discovery, classification and policy enforcement. While other commercial catalog systems also build a global catalog, Purview is unique in its support for policies. It is also distinguished by covering both structured and unstructured data, thanks to its deep integration with Office 365 and its governance framework; indeed, "Microsoft Purview" represents a new unified offering that combines Office 365 governance and what was formerly a service for governing structured data called "Azure Purview". By integrating with Office 365's Rights Management Service, Purview offers central governance over structured data stored in databases and stores, reports in systems such as Power BI, as well as document data stored in Office 365. The Purview vision is to make the metadata in the Data Map increasingly richer through further automation and curation support and to use this 360 degree view of the data estate to support a wide range of governance policies, ranging from access control to lifecycle management (e.g., retention, deletion, restricting data movement). This paper covers the design and implementation challenges in building the Purview service for Attribute-Based Access Control (ABAC) policies, focusing specifically on a detailed description of its integration with Azure SQL Database. We illustrate the power of unifying Office 365 governance with structured data governance through Purview policies that enforce consistent access control even as data flows between Office 365 and structured data engines like Azure SQL Database. We also describe the results of our empirical evaluation of the performance overheads imposed by Purview.
微软权限:数据中央治理系统
现代数据资产分布在位于本地、边缘和一个或多个公共云中的数据中,分布在各种数据源中,如多个关系数据库、文件和存储系统以及无sql系统,包括操作和分析;这种现象被称为数据蔓延。希望在整个组织中实施法规遵从性的数据管理员必须对其数据进行盘点,确定其中哪些部分是敏感的,并在整个庞大的数据资产中适当地管理敏感数据。如今,对数据的管理是完全孤立的;每个数据子系统都有自己的(和不同的)治理特性。应用于敏感数据的策略是通过使用特定于每个数据源的自定义语言遍历所有数据源来逐步应用的。这使得数据治理很麻烦,容易出错(因为给定的策略必须在不同的子系统之间手动执行,很容易出现不一致),而且成本很高。本文介绍了Microsoft Purview,这是一种用于从单个中心窗格统一治理组织的整个数据资产的服务。Purview服务由三个部分组成:(1)通过自动扫描组织内的数据源填充的数据地图或元数据目录,(2)存储和管理数据敏感性分类的系统,以及(3)使数据安全官员能够编写和实施跨越整个组织的政策的政策系统,例如,这样的政策:“应该禁止非全职员工访问归类为PII(个人身份信息)的数据。”Purview通过提供集中管理和自动化数据发现、分类和策略执行的能力,转变了复杂数据资产的数据治理。虽然其他商业目录系统也构建一个全局目录,但Purview在支持策略方面是独一无二的。由于它与Office 365及其治理框架的深度集成,它的特点还在于涵盖了结构化和非结构化数据;事实上,“Microsoft Purview”代表了一种新的统一产品,它结合了Office 365管理和以前用于管理结构化数据的服务“Azure Purview”。通过与Office 365的权限管理服务集成,Purview可以对存储在数据库和商店中的结构化数据、Power BI等系统中的报告以及存储在Office 365中的文档数据进行集中管理。Purview的愿景是通过进一步的自动化和管理支持,使Data Map中的元数据越来越丰富,并使用这种数据资产的360度视图来支持广泛的治理策略,从访问控制到生命周期管理(例如,保留、删除、限制数据移动)。本文涵盖了为基于属性的访问控制(ABAC)策略构建权限服务时的设计和实现挑战,特别关注其与Azure SQL数据库集成的详细描述。我们通过权限策略演示了统一Office 365治理与结构化数据治理的强大功能,这些策略即使在Office 365和结构化数据引擎(如Azure SQL Database)之间的数据流之间执行一致的访问控制。我们还描述了我们对由权限所施加的性能开销的经验评估的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proceedings of the Vldb Endowment
Proceedings of the Vldb Endowment Computer Science-General Computer Science
CiteScore
7.70
自引率
0.00%
发文量
95
期刊介绍: The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信