A semantic approach to understanding GDPR fines: From text to compliance insights

IF 3.2 3区社会学 Q1 LAW

Computer Law & Security Review Pub Date : 2025-09-26 DOI:10.1016/j.clsr.2025.106187

Albina Orlando, Mario Santoro

{"title":"A semantic approach to understanding GDPR fines: From text to compliance insights","authors":"Albina Orlando, Mario Santoro","doi":"10.1016/j.clsr.2025.106187","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline’s validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity–fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline’s validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework’s ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.</div></div>","PeriodicalId":51516,"journal":{"name":"Computer Law & Security Review","volume":"59 ","pages":"Article 106187"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Law & Security Review","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2212473X25000598","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LAW","Score":null,"Total":0}

引用次数: 0

Abstract

This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline’s validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity–fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline’s validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework’s ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.

查看原文本刊更多论文

理解GDPR罚款的语义方法：从文本到合规性洞察

本研究引入了一个可解释的人工智能（XAI）框架，该框架将法律领域的NLP与结构主题建模（STM）和WordNet语义图相结合，以严格分析来自公共数据集的1,900多个GDPR执行决策摘要。我们的方法侧重于通过检查四个众所周知的研究问题的结果来证明管道在人工分析方面的有效性：(1)跨国精细分布差异（自动元数据提取）；(2)违规严重程度-罚款金额关系（关键字和语义分析）；(3)结构文本模式（网络分析和STM）；(4)普遍执行触发器（主题流行度建模）该管道的有效性强调了它能够复制以前手工分析的关键发现，同时能够更细致地探索GDPR执行趋势。我们的研究结果证实了欧盟成员国在执法方面的显著差异，并揭示了罚款并不总是与违规严重程度相关。具体来说，严重的侵权行为，特别是涉及视频监控的侵权行为，往往会导致小额罚款，尤其是个人或较小的实体犯下的侵权行为。这突出表明，很大一部分严重侵犯行为是由较小的行为者造成的。在方法上，该框架能够快速复制这些众所周知的模式，以及它的透明度和可重复性，确立了它作为透明和可解释的GDPR执行分析的可扩展工具的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Law & Security Review LAW-

CiteScore

5.60

自引率

10.30%

发文量

审稿时长

67 days

期刊介绍： CLSR publishes refereed academic and practitioner papers on topics such as Web 2.0, IT security, Identity management, ID cards, RFID, interference with privacy, Internet law, telecoms regulation, online broadcasting, intellectual property, software law, e-commerce, outsourcing, data protection, EU policy, freedom of information, computer security and many other topics. In addition it provides a regular update on European Union developments, national news from more than 20 jurisdictions in both Europe and the Pacific Rim. It is looking for papers within the subject area that display good quality legal analysis and new lines of legal thought or policy development that go beyond mere description of the subject area, however accurate that may be.