{"title":"A semantic approach to understanding GDPR fines: From text to compliance insights","authors":"Albina Orlando, Mario Santoro","doi":"10.1016/j.clsr.2025.106187","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline’s validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity–fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline’s validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework’s ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.</div></div>","PeriodicalId":51516,"journal":{"name":"Computer Law & Security Review","volume":"59 ","pages":"Article 106187"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Law & Security Review","FirstCategoryId":"90","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2212473X25000598","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LAW","Score":null,"Total":0}
引用次数: 0
Abstract
This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline’s validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity–fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline’s validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework’s ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.
期刊介绍:
CLSR publishes refereed academic and practitioner papers on topics such as Web 2.0, IT security, Identity management, ID cards, RFID, interference with privacy, Internet law, telecoms regulation, online broadcasting, intellectual property, software law, e-commerce, outsourcing, data protection, EU policy, freedom of information, computer security and many other topics. In addition it provides a regular update on European Union developments, national news from more than 20 jurisdictions in both Europe and the Pacific Rim. It is looking for papers within the subject area that display good quality legal analysis and new lines of legal thought or policy development that go beyond mere description of the subject area, however accurate that may be.