{"title":"我们可以信任隐私政策吗:使用机器学习的隐私政策分类","authors":"Methus Narksenee, K. Sripanidkulchai","doi":"10.1109/IRCE.2019.00034","DOIUrl":null,"url":null,"abstract":"Mobile applications frequently request privacy information from users to supposedly use to improve online service and applications. The collected data, such as personally identifiable information, raises users’ concerns since some applications actually have malicious intentions to leak personal data. Privacy policies are an important resource as they are the sole source of information users can easily gain access in order to determine how applications plan to collect and use their data prior to downloading and using the application. However, users tend to ignore or gloss over privacy policies as they are often written in the complicated hard-to-understand language. Thus, users often miss crucial privacy-related information after reading such documents. In this paper, we experimentally determine how much we can trust an application’s privacy policy by looking at the language used in more than 9,000 privacy policies and compare them to what the applications actually do. We attempt to classify whether or not applications transmit privacy-related information using machine learning with three classifiers, support vector machines (SVMs), k- nearest neighbors (KNN), logistic regression (LR). The best results show the average recall and precision of 0.81 and 0.31, respectively. High recall indicates that we are able to correctly identify most of the applications that transmit personally identifiable information. But, low precision indicates that we often over-identify applications as ones that transmit personally identifiable information when in reality they do not.","PeriodicalId":298781,"journal":{"name":"2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can We Trust Privacy Policy: Privacy Policy Classification Using Machine Learning\",\"authors\":\"Methus Narksenee, K. Sripanidkulchai\",\"doi\":\"10.1109/IRCE.2019.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mobile applications frequently request privacy information from users to supposedly use to improve online service and applications. The collected data, such as personally identifiable information, raises users’ concerns since some applications actually have malicious intentions to leak personal data. Privacy policies are an important resource as they are the sole source of information users can easily gain access in order to determine how applications plan to collect and use their data prior to downloading and using the application. However, users tend to ignore or gloss over privacy policies as they are often written in the complicated hard-to-understand language. Thus, users often miss crucial privacy-related information after reading such documents. In this paper, we experimentally determine how much we can trust an application’s privacy policy by looking at the language used in more than 9,000 privacy policies and compare them to what the applications actually do. We attempt to classify whether or not applications transmit privacy-related information using machine learning with three classifiers, support vector machines (SVMs), k- nearest neighbors (KNN), logistic regression (LR). The best results show the average recall and precision of 0.81 and 0.31, respectively. High recall indicates that we are able to correctly identify most of the applications that transmit personally identifiable information. But, low precision indicates that we often over-identify applications as ones that transmit personally identifiable information when in reality they do not.\",\"PeriodicalId\":298781,\"journal\":{\"name\":\"2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRCE.2019.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference of Intelligent Robotic and Control Engineering (IRCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRCE.2019.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Can We Trust Privacy Policy: Privacy Policy Classification Using Machine Learning
Mobile applications frequently request privacy information from users to supposedly use to improve online service and applications. The collected data, such as personally identifiable information, raises users’ concerns since some applications actually have malicious intentions to leak personal data. Privacy policies are an important resource as they are the sole source of information users can easily gain access in order to determine how applications plan to collect and use their data prior to downloading and using the application. However, users tend to ignore or gloss over privacy policies as they are often written in the complicated hard-to-understand language. Thus, users often miss crucial privacy-related information after reading such documents. In this paper, we experimentally determine how much we can trust an application’s privacy policy by looking at the language used in more than 9,000 privacy policies and compare them to what the applications actually do. We attempt to classify whether or not applications transmit privacy-related information using machine learning with three classifiers, support vector machines (SVMs), k- nearest neighbors (KNN), logistic regression (LR). The best results show the average recall and precision of 0.81 and 0.31, respectively. High recall indicates that we are able to correctly identify most of the applications that transmit personally identifiable information. But, low precision indicates that we often over-identify applications as ones that transmit personally identifiable information when in reality they do not.