Quality & QuantityPub Date : 2026-01-01Epub Date: 2025-07-08DOI: 10.1007/s11135-025-02261-0
Orfeas Menis-Mastromichalakis, George Filandrianos, Maria Symeonaki, Glykeria Stamatopoulou, Dimitris Parsanoglou, Giorgos Stamou
{"title":"Gender bias in machine learning: insights from official labour statistics and textual analysis.","authors":"Orfeas Menis-Mastromichalakis, George Filandrianos, Maria Symeonaki, Glykeria Stamatopoulou, Dimitris Parsanoglou, Giorgos Stamou","doi":"10.1007/s11135-025-02261-0","DOIUrl":"https://doi.org/10.1007/s11135-025-02261-0","url":null,"abstract":"<p><p>The interplay between technology and societal norms often reveals a troubling reality: machine learning systems not only reflect existing gender stereotypes but can also amplify and entrench them, making these biases harder to detect and address. This paper adopts an interdisciplinary approach, combining quantitative and qualitative methods with recent technological advancements, such as machine learning techniques for textual analysis and computational linguistics, to offer a new framework for understanding occupational gender bias in machine learning. The study is motivated by persistent gender inequalities in the labour market and rising concerns about gendered algorithmic bias, as outlined in the European Commission's Gender Equality Strategy 2020-2025. Focusing on language translation technologies, the research explores how machine learning may perpetuate or amplify gender stereotypes, aiming to foster more inclusive digital systems aligned with EU strategic goals. More specifically, it investigates occupational gender segregation and its manifestations in various forms of gender bias in machine learning across English, French, and Greek. The study introduces a classification of gender biases in machine learning, providing insights into professional areas needing intervention to address gender imbalances and identifying enduring stereotypical representations in textual data. To support this, statistical analysis is conducted to explore gender variations in occupations over the past thirteen years, using official data and international classifications such as the International Standard Classification of Occupations (ISCO-08). Moreover, gendered occupational distributions are extracted from 200,920 text instances in the three languages, revealing significant discrepancies between official labour statistics and the training data.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"60 1","pages":"619-653"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920283/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2026-01-01Epub Date: 2026-01-03DOI: 10.1007/s11135-025-02519-7
Patricia A Iglesias
{"title":"Unlocking insights: assessing the quality of conventional and image-based responses on books at home in an online mobile survey.","authors":"Patricia A Iglesias","doi":"10.1007/s11135-025-02519-7","DOIUrl":"https://doi.org/10.1007/s11135-025-02519-7","url":null,"abstract":"<p><p>Despite growing interest in collecting photos within online surveys, little is known about the quality of visual data and its comparison with data obtained through conventional requests. To address this gap, a self-administered online mobile survey targeting parents of children attending primary school in Spain was conducted through the Netquest opt-in panel in 2023. The survey gathered information about books in respondents' homes through photos and conventional questions. First, a review of previous research using conventional questions, photos, and other emerging data types was conducted to identify indicators suitable to evaluate the quality of the information about books at home collected through conventional and image-based formats. Second, most of these indicators to measure quality were estimated. Results reveal important measurement errors in conventional questions, while photos submitted by respondents are generally in line and can be classified. However, concrete information of interest about the books, such as the intended audience or languages, is often difficult to extract from photos. When comparing quality, conventional answers provide more information about the items asked than photos, but photos have the potential to provide additional insights, such as book titles. Overall, while collecting and analyzing photos sent through surveys presents challenges, their integration into surveys offers unique opportunities to enrich data collection methods.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"60 2","pages":"6619-6643"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13083421/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147724385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2026-01-01Epub Date: 2025-07-26DOI: 10.1007/s11135-025-02266-9
Ettore Settanni, Jagjit Singh Srai
{"title":"It's a long way to the top (if you wanna biplot): a back-to-basics perspective on the implementation of principal component biplots in R.","authors":"Ettore Settanni, Jagjit Singh Srai","doi":"10.1007/s11135-025-02266-9","DOIUrl":"https://doi.org/10.1007/s11135-025-02266-9","url":null,"abstract":"<p><p>Principal Component Analysis and biplots are so well-established and readily implemented that it is just too tempting to take for granted their internal workings. In this note we compare how PCA and biplots are implemented in the R language for statistical computing, leveraging a software-agnostic understanding of computational building-blocks that both techniques have in common. We do so with a view to illustrating discrepancies that users might find elusive, as these arise from seemingly innocuous computational choices made under the hood. Wider implications are derived from a simplified case based on real-world clinical trial supply chains data. By getting back to basics, the proposed evaluation grid elevates aspects that are usually disregarded, including relationships that should hold if the computational rationale underpinning each technique is followed correctly. Strikingly, what is expected from these equivalences rarely follows without caveats from the output of specific implementations alone.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"60 1","pages":"1173-1213"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12920800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147272649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2025-01-01Epub Date: 2024-10-08DOI: 10.1007/s11135-024-01983-x
Thijs C Carrière, Laura Boeschoten, Bella Struminskaya, Heleen L Janssen, Niek C de Schipper, Theo Araujo
{"title":"Best practices for studies using digital data donation.","authors":"Thijs C Carrière, Laura Boeschoten, Bella Struminskaya, Heleen L Janssen, Niek C de Schipper, Theo Araujo","doi":"10.1007/s11135-024-01983-x","DOIUrl":"10.1007/s11135-024-01983-x","url":null,"abstract":"<p><p>Digital trace data form a rich, growing source of data for social sciences and humanities. Data donation offers an innovative and ethical approach to collect these digital trace data. In data donation studies, participants request a copy of the digital trace data a data controller (e.g., large digital social media or video platforms) collected about them. The European Union's General Data Protection Regulation obliges platforms to provide such a copy. Next, the participant can choose to share (part of) this data copy with the researcher. This way, the researcher can obtain the digital trace data of interest with active consent of the participant. Setting up a data donation study involves several steps and considerations. If executed poorly, these steps might threaten a study's quality. In this paper, we introduce a workflow for setting up a robust data donation study. This workflow is based on error sources identified in the Total Error Framework for data donation by Boeschoten et al. (2022a) as well as on experiences in earlier data donation studies by the authors. The workflow is discussed in detail and linked to challenges and considerations for each step. We aim to provide a starting point with guidelines for researchers seeking to set up and conduct a data donation study.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"59 Suppl 1","pages":"389-412"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971172/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2025-01-01Epub Date: 2025-01-29DOI: 10.1007/s11135-024-02034-1
Mykola Makhortykh, Ernesto de León, Clara Christner, Maryna Sydorova, Aleksandra Urman, Silke Adam, Michaela Maier, Teresa Gil-Lopez
{"title":"Is a single model enough? The systematic comparison of computational approaches for detecting populist radical right content.","authors":"Mykola Makhortykh, Ernesto de León, Clara Christner, Maryna Sydorova, Aleksandra Urman, Silke Adam, Michaela Maier, Teresa Gil-Lopez","doi":"10.1007/s11135-024-02034-1","DOIUrl":"10.1007/s11135-024-02034-1","url":null,"abstract":"<p><p>The rise of populist radical right (PRR) ideas stresses the importance of understanding how individuals engage with PRR content online. However, this task is complicated by the variety of channels through which such engagement can take place. In this article, we systematically compare computational approaches for detecting PRR content in textual data. Using 66 dictionary, classic supervised machine learning, and deep learning (DL) models, we compare how these distinct approaches perform on the PRR detection task for three Germanophone test datasets and how their performance is affected by different modes of text preprocessing. In addition to individual models, we examine the performance of 330 ensemble models combining the above-mentioned approaches for the dataset with a particularly high volume of noise. Our findings demonstrate that the DL models, in combination with more computationally intense forms of preprocessing, show the best performance among the individual models, but it remains suboptimal in the case of more noisy datasets. While the use of ensemble models shows some improvement for specific modes of preprocessing, overall, it mostly remains on par with individual DL models, thus stressing the challenging nature of computational detection of PRR content.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"59 Suppl 2","pages":"1163-1207"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144043246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2025-01-01Epub Date: 2025-04-29DOI: 10.1007/s11135-025-02188-6
Jessica A R Logan, Allyson L Hayward, Lexi E Swanz, Ayse Busra Ceviren
{"title":"Education researchers' beliefs and barriers towards data sharing.","authors":"Jessica A R Logan, Allyson L Hayward, Lexi E Swanz, Ayse Busra Ceviren","doi":"10.1007/s11135-025-02188-6","DOIUrl":"10.1007/s11135-025-02188-6","url":null,"abstract":"<p><p>Data sharing is increasingly becoming a highly encouraged or required practice for any federally funded research projects. However, the uptake of these practices in education science has been minimal. Research suggests that many researchers believe data sharing should be practiced always or often, but also suggests that many researchers rarely practice data sharing. This disconnect indicates a general lack of understanding around data sharing and suggests there are salient barriers that prevent education researchers from engaging in the practice. This work examines (a) the prevalence of positive attitudes and perceived barriers to data sharing in a sample of education researchers, and (b) if there is a difference between the perceived barriers for researchers who have different levels of data sharing experience. Results suggest education researchers generally hold positive attitudes towards data sharing, with 70% of the sample agreeing that it benefits their career, increases citations, and is good for science. However, barriers such as concerns about IRB issues and the potential for misinterpretation of shared data were prevalent among respondents. Additionally, researchers with more experience sharing data were less likely to agree with these barriers compared to those with less or no sharing experience.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11135-025-02188-6.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"59 5","pages":"4061-4075"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12476402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2025-01-01Epub Date: 2024-12-26DOI: 10.1007/s11135-024-02028-z
Kevin Emery, Matthias Studer, André Berchtold
{"title":"Comparison of imputation methods for univariate categorical longitudinal data.","authors":"Kevin Emery, Matthias Studer, André Berchtold","doi":"10.1007/s11135-024-02028-z","DOIUrl":"10.1007/s11135-024-02028-z","url":null,"abstract":"<p><p>The life course paradigm emphasizes the need to study not only the situation at a given point in time, but also its evolution over the life course in the medium and long term. These trajectories are often represented by categorical data. This article aims to provide a comprehensive review of the multiple imputation methods proposed so far in the context of univariate categorical data and to assess their practical relevance through a simulation study based on real data. The primary goal is to provide clear methodological guidelines and improve the handling of missing data in life course research. In parallel, we develop the MICT-timing algorithm, which is an extension of the MICT algorithm. This innovative multiple imputation method improves the quality of imputation in trajectories subject to time-varying transition rates, a situation often encountered in life course data.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s11135-024-02028-z.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"59 2","pages":"1767-1791"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144163549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quality & QuantityPub Date : 2024-01-01Epub Date: 2024-05-07DOI: 10.1007/s11135-024-01881-2
Vytaras Brazauskas, Francesca Greselin, Ričardas Zitikis
{"title":"Measuring income inequality via percentile relativities.","authors":"Vytaras Brazauskas, Francesca Greselin, Ričardas Zitikis","doi":"10.1007/s11135-024-01881-2","DOIUrl":"10.1007/s11135-024-01881-2","url":null,"abstract":"<p><p>The adage \"the rich are getting richer\" refers to increasingly skewed and heavily-tailed income distributions. For such distributions, the mean is not the best measure of the center, but the classical indices of income inequality, including the celebrated Gini index, are mean based. In view of this, it has been proposed in the literature to incorporate the median into the definition of the Gini index. In the present paper we make a further step in this direction and, to acknowledge the possibility of differing viewpoints, investigate three median-based indices of inequality. These indices overcome past limitations, such as: (1) they do not rely on the mean as the center of, or a reference point for, income distributions, which are skewed, and are getting even more heavily skewed; (2) they are suitable for populations of any degree of tail heaviness, and income distributions are becoming increasingly such; and (3) they are unchanged by, and even discourage, transfers among the rich persons, but they encourage transfers from the rich to the poor, as well as among the poor to alleviate their hardship. We study these indices analytically and numerically using various income distribution models. Real-world applications are showcased using capital incomes from 2001 and 2018 surveys from fifteen European countries.</p>","PeriodicalId":49649,"journal":{"name":"Quality & Quantity","volume":"58 5","pages":"4859-4896"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11415483/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}