Automated text mining for requirements analysis of policy documents

2013 21st IEEE International Requirements Engineering Conference (RE) Pub Date : 2013-07-15 DOI:10.1109/RE.2013.6636700

Aaron K. Massey, Jacob Eisenstein, A. Antón, Peter P. Swire

{"title":"Automated text mining for requirements analysis of policy documents","authors":"Aaron K. Massey, Jacob Eisenstein, A. Antón, Peter P. Swire","doi":"10.1109/RE.2013.6636700","DOIUrl":null,"url":null,"abstract":"Businesses and organizations in jurisdictions around the world are required by law to provide their customers and users with information about their business practices in the form of policy documents. Requirements engineers analyze these documents as sources of requirements, but this analysis is a time-consuming and mostly manual process. Moreover, policy documents contain legalese and present readability challenges to requirements engineers seeking to analyze them. In this paper, we perform a large-scale analysis of 2,061 policy documents, including policy documents from the Google Top 1000 most visited websites and the Fortune 500 companies, for three purposes: (1) to assess the readability of these policy documents for requirements engineers; (2) to determine if automated text mining can indicate whether a policy document contains requirements expressed as either privacy protections or vulnerabilities; and (3) to establish the generalizability of prior work in the identification of privacy protections and vulnerabilities from privacy policies to other policy documents. Our results suggest that this requirements analysis technique, developed on a small set of policy documents in two domains, may generalize to other domains.","PeriodicalId":6342,"journal":{"name":"2013 21st IEEE International Requirements Engineering Conference (RE)","volume":"29 1","pages":"4-13"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 21st IEEE International Requirements Engineering Conference (RE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RE.2013.6636700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 75

Abstract

Businesses and organizations in jurisdictions around the world are required by law to provide their customers and users with information about their business practices in the form of policy documents. Requirements engineers analyze these documents as sources of requirements, but this analysis is a time-consuming and mostly manual process. Moreover, policy documents contain legalese and present readability challenges to requirements engineers seeking to analyze them. In this paper, we perform a large-scale analysis of 2,061 policy documents, including policy documents from the Google Top 1000 most visited websites and the Fortune 500 companies, for three purposes: (1) to assess the readability of these policy documents for requirements engineers; (2) to determine if automated text mining can indicate whether a policy document contains requirements expressed as either privacy protections or vulnerabilities; and (3) to establish the generalizability of prior work in the identification of privacy protections and vulnerabilities from privacy policies to other policy documents. Our results suggest that this requirements analysis technique, developed on a small set of policy documents in two domains, may generalize to other domains.

查看原文本刊更多论文

用于策略文档需求分析的自动文本挖掘

法律要求世界各地的企业和组织以政策文件的形式向其客户和用户提供有关其业务实践的信息。需求工程师将这些文档作为需求的来源进行分析，但是这种分析是一个耗时且主要是手工的过程。此外，策略文档包含法律术语，并对寻求分析它们的需求工程师提出可读性挑战。在本文中，我们对2061份政策文件进行了大规模的分析，其中包括来自b谷歌访问量最大的1000家网站和财富500强公司的政策文件，目的有三个:(1)评估这些政策文件对需求工程师的可读性;(2)确定自动文本挖掘是否可以指示策略文档是否包含以隐私保护或漏洞表示的需求;(3)建立从隐私政策到其他政策文件识别隐私保护和漏洞的先前工作的通用性。我们的结果表明，这种在两个领域的一小组策略文档上开发的需求分析技术可以推广到其他领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 21st IEEE International Requirements Engineering Conference (RE)

自引率

0.00%

发文量