{"title":"Connecting algorithmic fairness to quality dimensions in machine learning in official statistics and survey production","authors":"Patrick Oliver Schenk, Christoph Kern","doi":"10.1007/s11943-024-00344-2","DOIUrl":"10.1007/s11943-024-00344-2","url":null,"abstract":"<div><p>National Statistical Organizations (NSOs) increasingly draw on Machine Learning (ML) to improve the timeliness and cost-effectiveness of their products. When introducing ML solutions, NSOs must ensure that high standards with respect to robustness, reproducibility, and accuracy are upheld as codified, e.g., in the Quality Framework for Statistical Algorithms (QF4SA; Yung et al. 2022, <i>Statistical Journal of the IAOS</i>). At the same time, a growing body of research focuses on fairness as a pre-condition of a safe deployment of ML to prevent disparate social impacts in practice. However, fairness has not yet been explicitly discussed as a quality aspect in the context of the application of ML at NSOs. We employ the QF4SA quality framework and present a mapping of its quality dimensions to algorithmic fairness. We thereby extend the QF4SA framework in several ways: First, we investigate the interaction of fairness with each of these quality dimensions. Second, we argue for fairness as its own, additional quality dimension, beyond what is contained in the QF4SA so far. Third, we emphasize and explicitly address data, both on its own and its interaction with applied methodology. In parallel with empirical illustrations, we show how our mapping can contribute to methodology in the domains of official statistics, algorithmic fairness, and trustworthy machine learning.</p><p>Little to no prior knowledge of ML, fairness, and quality dimensions in official statistics is required as we provide introductions to these subjects. These introductions are also targeted to the discussion of quality dimensions and fairness.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"131 - 184"},"PeriodicalIF":0.0,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00344-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Bayesian variable selection methods for binary regression models with missing covariate data","authors":"Michael Bergrab, Christian Aßmann","doi":"10.1007/s11943-024-00345-1","DOIUrl":"10.1007/s11943-024-00345-1","url":null,"abstract":"<div><p>Data collection and the availability of large data sets has increased over the last decades. In both statistical and machine learning frameworks, two methodological issues typically arise when performing regression analysis on large data sets. First, variable selection is crucial in regression modeling, as it helps to identify an appropriate model with respect to the considered set of conditioning variables. Second, especially in the context of survey data, handling of missing values is important for estimation, which occur even with state-of-the-art data collection and processing methods. Within this paper, we provide an Bayesian approach based on a spike-and-slab prior for the regression coefficients, which allows for simultaneous handling of variable selection and estimation in combination with handling of missing values in covariate data. The paper also discusses the implementation of the approach using Markov chain Monte Carlo techniques and provides results for simulated data sets and an empirical illustration based on data from the German National Educational Panel Study. The suggested Bayesian approach is compared to other statistical and machine learning frameworks such as Lasso, ridge regression, and Elastic net, and is shown to perform well in terms of estimation performance and variable selection accuracy. The simulation results demonstrate that ignoring the handling of missing values in data sets can lead to the generation of biased selection results. Overall, the proposed Bayesian method offers a holistic, flexible, and powerful framework for variable selection in the presence of missing covariate data.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"203 - 244"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00345-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fairness als Qualitätskriterium im Maschinellen Lernen – Rekonstruktion des philosophischen Konzepts und Implikationen für die Nutzung außergesetzlicher Merkmale bei qualifizierten Mietspiegeln","authors":"Ludwig Bothmann, Kristina Peters","doi":"10.1007/s11943-024-00346-0","DOIUrl":"10.1007/s11943-024-00346-0","url":null,"abstract":"<p>Mit der verstärkten Nutzung von Modellen des Maschinellen Lernens (ML) innerhalb von Systemen der automatisierten Entscheidungsfindung wachsen die Anforderungen an die Qualität von ML-Modellen. Die reine Prognosegüte ist nicht länger das alleinige Qualitätskriterium; insbesondere wird vermehrt gefordert, dass Fairnessaspekte berücksichtigt werden. Dieser Beitrag verfolgt zwei Ziele. Zum einen werden die aktuelle Fairnessdiskussion im Bereich ML (fairML) zusammengefasst und die aktuellsten Entwicklungen, insbesondere in Bezug auf die philosophischen Grundlagen des Fairnessbegriffs innerhalb ML, beschrieben. Zum anderen wird die Frage behandelt, inwiefern sogenannte „außergesetzliche“ Merkmale bei der Erstellung qualifizierter Mietspiegel genutzt werden dürfen. Ein aktueller Vorschlag von Kauermann und Windmann (AStA Wirtschafts- und Sozialstatistisches Archiv, Band 17, 2023) zur Nutzung außergesetzlicher Merkmale in qualifizierten Mietspiegeln beinhaltet eine modellbasierte Imputationsmethode, welche wir den gesetzlichen Anforderungen gegenüberstellen. Schließlich zeigen wir auf, welche Alternativen aus dem Bereich fairML genutzt werden könnten und legen dar, welche unterschiedlichen philosophischen Grundannahmen hinter den verschiedenen Verfahren stehen.</p>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"185 - 201"},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00346-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interview mit Walter Krämer","authors":"Ulrich Rendtel","doi":"10.1007/s11943-024-00343-3","DOIUrl":"10.1007/s11943-024-00343-3","url":null,"abstract":"","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"289 - 295"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"„Mister SOEP et al.“ – ein Nachruf auf Gert G. Wagner","authors":"C. Katharina Spieß","doi":"10.1007/s11943-024-00342-4","DOIUrl":"10.1007/s11943-024-00342-4","url":null,"abstract":"","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"297 - 300"},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00342-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141663187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Observer—a guide to data that can help to inform evidence-based policymaking","authors":"Joachim Wagner","doi":"10.1007/s11943-024-00341-5","DOIUrl":"10.1007/s11943-024-00341-5","url":null,"abstract":"<div><p>For many attempts to inform evidence-based policymaking (or policy-makers in general) researchers have to rely on already available (instead of newly collected) data. These data have to be reliable, accessible (at best, without high hurdles, and with low or no fees to be paid) and findable. One way that helps to find suitable data that are easily accessible (and hopefully reliable) is to look at the contributions published in the <i>Data Observer</i> series described in this paper.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"279 - 287"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flat rent price prediction in Berlin with web scraping","authors":"Camilo Meyberg, Ulrich Rendtel, Holger Leerhoff","doi":"10.1007/s11943-024-00340-6","DOIUrl":"10.1007/s11943-024-00340-6","url":null,"abstract":"<div><p>Internet data pose a challenge to the traditional system of official statistics, which relies on more conventional sources such as surveys and registers, not readily adaptable to rapid changes. Expanding this system to include internet data is currently at an experimental stage, exploring these sources’ potentials and benefits. This paper describes a project conducted within the ESSnet <i>Trusted Smart Statistics – Web Intelligence Network</i> framework. It investigates the use of online apartment listings to analyze the rental market. We used web scraping to extract information from two online real estate portals for flats in the city of Berlin. Using this data, we developed a model to predict rental prices per square meter based on the accommodation’s features and location within the city. We detected offers which appear in both portals by means of statistical matching and removed duplicate offers. Missing values were treated by multiple imputation. The prediction model is a semi-parametric approach where the postal districts are used to describe the location effect. Comparisons with microcensus results and the local rent index reveal significant differences between the market of online flat offers and the stock of existing flat contracts. Interested readers will find the commented programming code in the internet supplement.</p></div>","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 2","pages":"245 - 278"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00340-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vorwort der Herausgeber","authors":"Markus Zwick, Jan Pablo Burgard","doi":"10.1007/s11943-024-00339-z","DOIUrl":"10.1007/s11943-024-00339-z","url":null,"abstract":"","PeriodicalId":100134,"journal":{"name":"AStA Wirtschafts- und Sozialstatistisches Archiv","volume":"18 1","pages":"1 - 4"},"PeriodicalIF":0.0,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11943-024-00339-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}