PLOS digital healthPub Date : 2025-02-07eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000734
Thierry Jean, Rose Guay Hottin, Pierre Orban
{"title":"Forecasting mental states in schizophrenia using digital phenotyping data.","authors":"Thierry Jean, Rose Guay Hottin, Pierre Orban","doi":"10.1371/journal.pdig.0000734","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000734","url":null,"abstract":"<p><p>The promise of machine learning successfully exploiting digital phenotyping data to forecast mental states in psychiatric populations could greatly improve clinical practice. Previous research focused on binary classification and continuous regression, disregarding the often ordinal nature of prediction targets derived from clinical rating scales. In addition, mental health ratings typically show important class imbalance or skewness that need to be accounted for when evaluating predictive performance. Besides it remains unclear which machine learning algorithm is best suited for forecast tasks, the eXtreme Gradient Boosting (XGBoost) and long short-term memory (LSTM) algorithms being 2 popular choices in digital phenotyping studies. The CrossCheck dataset includes 6,364 mental state surveys using 4-point ordinal rating scales and 23,551 days of smartphone sensor data contributed by patients with schizophrenia. We trained 120 machine learning models to forecast 10 mental states (e.g., Calm, Depressed, Seeing things) from passive sensor data on 2 predictive tasks (ordinal regression, binary classification) with 2 learning algorithms (XGBoost, LSTM) over 3 forecast horizons (same day, next day, next week). A majority of ordinal regression and binary classification models performed significantly above baseline, with macro-averaged mean absolute error values between 1.19 and 0.77, and balanced accuracy between 58% and 73%, which corresponds to similar levels of performance when these metrics are scaled. Results also showed that metrics that do not account for imbalance (mean absolute error, accuracy) systematically overestimated performance, XGBoost models performed on par with or better than LSTM models, and a significant yet very small decrease in performance was observed as the forecast horizon expanded. In conclusion, when using performance metrics that properly account for class imbalance, ordinal forecast models demonstrated comparable performance to the prevalent binary classification approach without losing valuable clinical information from self-reports, thus providing richer and easier to interpret predictions.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000734"},"PeriodicalIF":0.0,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143371357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000524
Austen El-Osta, Mahmoud Al Ammouri, Shujhat Khan, Sami Altalib, Manisha Karki, Eva Riboli-Sasco, Azeem Majeed
{"title":"Community perspectives regarding brain-computer interfaces: A cross-sectional study of community-dwelling adults in the UK.","authors":"Austen El-Osta, Mahmoud Al Ammouri, Shujhat Khan, Sami Altalib, Manisha Karki, Eva Riboli-Sasco, Azeem Majeed","doi":"10.1371/journal.pdig.0000524","DOIUrl":"10.1371/journal.pdig.0000524","url":null,"abstract":"<p><strong>Background: </strong>Brain-computer interfaces (BCIs) represent a ground-breaking advancement in neuroscience, facilitating direct communication between the brain and external devices. This technology has the potential to significantly improve the lives of individuals with neurological disorders by providing innovative solutions for rehabilitation, communication and personal autonomy. However, despite the rapid progress in BCI technology and social media discussions around Neuralink, public perceptions and ethical considerations concerning BCIs-particularly within community settings in the UK-have not been thoroughly investigated.</p><p><strong>Objective: </strong>The primary aim of this study was to investigate public knowledge, attitudes and perceptions regarding BCIs including ethical considerations. The study also explored whether demographic factors were related to beliefs about BCIs increasing inequalities, support for strict regulations, and perceptions of appropriate fields for BCI design, testing and utilization in healthcare.</p><p><strong>Methods: </strong>This cross-sectional study was conducted between 1 December 2023 and 8 March 2024. The survey included 29 structured questions covering demographics, awareness of BCIs, ethical considerations and willingness to use BCIs for various applications. The survey was distributed via the Imperial College Qualtrics platform. Participants were recruited primarily through Prolific Academic's panel and personal networks. Data analysis involved summarizing responses using frequencies and percentages, with chi-squared tests to compare groups. All data were securely stored and pseudo-anonymized to ensure confidentiality.</p><p><strong>Results: </strong>Of the 950 invited respondents, 846 participated and 806 completed the survey. The demographic profile was diverse, with most respondents aged 36-45 years (26%) balanced in gender (52% female), and predominantly identifying as White (86%). Most respondents (98%) had never used BCIs, and 65% were unaware of them prior to the survey. Preferences for BCI types varied by condition. Ethical concerns were prevalent, particularly regarding implantation risks (98%) and costs (92%). Significant associations were observed between demographic variables and perceptions of BCIs regarding inequalities, regulation and their application in healthcare. Conclusion: Despite strong interest in BCIs, particularly for medical applications, ethical concerns, safety and privacy issues remain significant highlighting the need for clear regulatory frameworks and ethical guidelines, as well as educational initiatives to improve public understanding and trust. Promoting public discourse and involving stakeholders including potential users, ethicists and technologists in the design process through co-design principles can help align technological development with public concerns whilst also helping developers to proactively address ethical dilemmas.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000524"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000736
Qingqing Chen, Andrew Crooks, Adam J Sullivan, Jennifer A Surtees, Laurene Tumiel-Berhalter
{"title":"From print to perspective: A mixed-method analysis of the convergence and divergence of COVID-19 topics in newspapers and interviews.","authors":"Qingqing Chen, Andrew Crooks, Adam J Sullivan, Jennifer A Surtees, Laurene Tumiel-Berhalter","doi":"10.1371/journal.pdig.0000736","DOIUrl":"10.1371/journal.pdig.0000736","url":null,"abstract":"<p><p>In the face of the unprecedented COVID-19 pandemic, various government-led initiatives and individual actions (e.g., lockdowns, social distancing, and masking) have resulted in diverse pandemic experiences. This study aims to explore these varied experiences to inform more proactive responses for future public health crises. Employing a novel \"big-thick\" data approach, we analyze and compare key pandemic-related topics that have been disseminated to the public through newspapers with those collected from the public via interviews. Specifically, we utilized 82,533 U.S. newspaper articles from January 2020 to December 2021 and supplemented this \"big\" dataset with \"thick\" data from interviews and focus groups for topic modeling. Identified key topics were contextualized, compared and visualized at different scales to reveal areas of convergence and divergence. We found seven key topics from the \"big\" newspaper dataset, providing a macro-level view that covers public health, policies and economics. Conversely, three divergent topics were derived from the \"thick\" interview data, offering a micro-level view that focuses more on individuals' experiences, emotions and concerns. A notable finding is the public's concern about the reliability of news information, suggesting the need for further investigation on the impacts of mass media in shaping the public's perception and behavior. Overall, by exploring the convergence and divergence in identified topics, our study offers new insights into the complex impacts of the pandemic and enhances our understanding of key issues both disseminated to and resonating with the public, paving the way for further health communication and policy-making.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000736"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000723
Christine Mulligan, Grace Gillis, Lauren Remedios, Christopher Parsons, Laura Vergeer, Monique Potvin Kent
{"title":"Children's digital privacy on fast-food and dine-in restaurant mobile applications.","authors":"Christine Mulligan, Grace Gillis, Lauren Remedios, Christopher Parsons, Laura Vergeer, Monique Potvin Kent","doi":"10.1371/journal.pdig.0000723","DOIUrl":"10.1371/journal.pdig.0000723","url":null,"abstract":"<p><p>Children are targeted by unhealthy food marketing on digital media, influencing their food preferences, intakes and non-communicable disease risk. Restaurant mobile applications are powerful platforms for collecting users' data and are popular among children. This study aimed to provide insight into the privacy policies of top dine-in and fast-food mobile apps in Canada and data collected on child users. Privacy policies of the top 30 fast-food and dine-in restaurants in Canada were reviewed. A convenience sample of 11 English-speaking Canadian residents aged 9-12 years with fast-food apps on their mobile phones were recruited to use ≥1 fast-food restaurant mobile app(s). Children used the app(s) for 5-10 minutes and placed food orders. Parents submitted a Data Access Request (DAR) on their child's behalf to the food company. Descriptive analysis and a flexible deductive approach to content analysis evaluated data collected through DARs. Overall, 26 privacy policies were analyzed. The intended age of app users was indicated by 12 (46%) food companies, 10 (39%) of which specified it as ≥13 years. No company had a compulsory age verification process. Twenty-four (92%) companies disclosed the data collected on app users: 23 (89%) did not distinguish between information pertaining to children or adults, and 21 (81%) described a protocol for action if they inadvertently collected data on children. Twenty-four DARs were sent to companies; 11 (45.8%) of which were fulfilled by companies, and 4 (16.7%) resulted in the receipt of children's data. All responding food companies were found to collect sociodemographic information on child participants (e.g., name, email). Some collected other information, such as order details and available promotional offers. This study demonstrates current fast-food and dine-in restaurant privacy policies are insufficient and provides insight into data collected on children via fast-food apps. Policies must be strengthened to ensure children's privacy and protection online.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000723"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000543
Michael Owusu-Adjei, James Ben Hayfron-Acquah, Twum Frimpong, Abdul-Salaam Gaddafi
{"title":"An AI-based approach to predict delivery outcome based on measurable factors of pregnant mothers.","authors":"Michael Owusu-Adjei, James Ben Hayfron-Acquah, Twum Frimpong, Abdul-Salaam Gaddafi","doi":"10.1371/journal.pdig.0000543","DOIUrl":"10.1371/journal.pdig.0000543","url":null,"abstract":"<p><p>The desire for safer delivery mode that preserves the lives of both mother and child with minimal or no complications before, during and after childbirth is the wish for every expectant mother and their families. However, the choice for any particular delivery mode is supposedly influenced by a number of factors that leads to the ultimate decision of choice. Some of the factors identified include maternal birth history, maternal and child health conditions prevailing before and during labor onset. Predictive modeling has been used extensively to determine important contributory factors or artifacts influencing delivery choice in related research studies. However, missing among a myriad of features used in various research studies for this determination is maternal history of spontaneous, threatened and inevitable abortion(s). How its inclusion impacts delivery outcome has not been covered in extensive research work. This research work therefore takes measurable maternal features that include real time information on administered partographs to predict delivery outcome. This is achieved by adopting effective feature selection technique to estimate variable relationships with the target variable. Three supervised learning techniques are used and evaluated for performance. Prediction accuracy score of area under the curve obtained show Gradient Boosting classifier achieved 91% accuracy, Logistic Regression 93% and Random Forest 91%. Balanced accuracy score obtained for these techniques were; Gradient Boosting 82.73%, Logistic Regression 84.62% and Random Forest 83.02%. Correlation statistic for variable independence among input variables showed that delivery outcome type as an output is associated with fetal gestational age and the progress of maternal cervix dilatation during labor onset.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000543"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000729
Ashley Amson, Mariangela Bagnato, Lauren Remedios, Meghan Pritchard, Soulene Sabir, Grace Gillis, Elise Pauzé, Christine White, Lana Vanderlee, David Hammond, Monique Potvin Kent
{"title":"Beyond the screen: Exploring the dynamics of social media influencers, digital food marketing, and gendered influences on adolescent diets.","authors":"Ashley Amson, Mariangela Bagnato, Lauren Remedios, Meghan Pritchard, Soulene Sabir, Grace Gillis, Elise Pauzé, Christine White, Lana Vanderlee, David Hammond, Monique Potvin Kent","doi":"10.1371/journal.pdig.0000729","DOIUrl":"10.1371/journal.pdig.0000729","url":null,"abstract":"<p><p>Adolescent obesity remains a public health concern, exacerbated by unhealthy food marketing, particularly on digital platforms. Social media influencers are increasingly utilized in digital marketing, yet their impact remains understudied. This research explores the frequency of posts containing food products/brands, the most promoted food categories, the healthfulness of featured products, and the types of marketing techniques used by social media influencers popular with male and female adolescents. By analyzing these factors, the study aims to provide a deeper understanding of how social media influencer marketing might contribute to dietary choices and health outcomes among adolescents, from a gender perspective, shedding light on an important yet underexplored aspect of food marketing. A content analysis was conducted on posts made between June 1, 2021, and May 31, 2022, that were posted by the top three social media influencers popular with males and female adolescents (13-17) on Instagram, TikTok, and YouTube (N = 1373). Descriptive statistics were used to calculate frequencies for posts containing food products/brands, promoted food categories, product healthfulness, and marketing techniques. Health Canada's Nutrient Profile Model was used to classify products as either healthy or less healthy based on their content in sugar, sodium, and saturated fats. Influencers popular with males featured 1 food product/brand for every 2.5 posts, compared to 1 for every 6.1 posts for influencers popular with females. Water (27% of posts) was the primary food category for influencers popular with females, while restaurants (24% of posts) dominated for males. Influencers popular with males more commonly posted less healthy food products (89% vs 54%). Marketing techniques varied: influencers popular with females used songs or music (53% vs 26%), other influencers (26% vs 11%), appeals to fun or coolness (26% vs 13%), viral marketing (29% vs 19%), and appeals to beauty (11% vs 0%) more commonly. Influencers popular with males more commonly used calls-to-action (27% vs 6%) and price promotions (8% vs 1%). Social media influencers play a role in shaping adolescents' dietary preferences and behaviors. Understanding gender-specific dynamics is essential for developing targeted interventions, policies, and educational initiatives aimed at promoting healthier food choices among adolescents. Policy efforts should focus on regulating unhealthy food marketing, addressing gender-specific targeting, and fostering a healthy social media environment for adolescents to support healthier dietary patterns.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000729"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11798478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143257276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-04eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000722
Abdul-Fatawu Abdulai, Amanda Fuchsia Howard, Paul J Yong, Leanne M Currie
{"title":"Addressing technology-mediated stigma in sexual health-related digital platforms: Insights from design team members.","authors":"Abdul-Fatawu Abdulai, Amanda Fuchsia Howard, Paul J Yong, Leanne M Currie","doi":"10.1371/journal.pdig.0000722","DOIUrl":"10.1371/journal.pdig.0000722","url":null,"abstract":"<p><p>Digital health technologies are increasingly used as complementary tools in accessing sexual health-related services. At the same time, there are concerns regarding how some interface features and content of these technologies could inadvertently foment stigma among end users. In this study, we explored how design teams (i.e., those involved in creating digital health technologies) might address stigmatizing components when designing sexual health-related digital technologies. We interviewed 14 design team members (i.e., software engineers, user interface and user experience (UI/UX) designers, content creators, and project managers) who were involved in digital health design projects across two universities in western Canada. The interviews sought to undersand their perspectives of how to create destigmatizing digital technologies and were centered on strategies that they might adopt or the kind of expertise or support they might need to be able to address stigmatizing features or content on sexual health-related digital technologies. The findings revealed two overarching approaches regarding how digital health technologies could be designed to prevent the unintended effects of stigma. These include functional design considerations (i.e., pop-up notifications, infographics, and video-based testimonials, and avoiding the use of cookies or other security-risk features) and non-functional design considerations (i.e., adopting an interprofessional and collaborative approach to design, educating software designers on domain knowledge about stigma, and ensuring consistent user testing of content). These findings reflected functional and non-functional design strategies as applied in software design. These findings are considered crucial in addressing stigma but are not often apparent to designers involved in digital health projects. This suggests the need for software engineers to understand and consider non-functional, emotional, and content-related design strategies that could address stigmatizing attributes via digital health platforms.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000722"},"PeriodicalIF":0.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11793748/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143191603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-03eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000721
Matthieu Doutreligne, Tristan Struja, Judith Abecassis, Claire Morgand, Leo Anthony Celi, Gaël Varoquaux
{"title":"Step-by-step causal analysis of EHRs to ground decision-making.","authors":"Matthieu Doutreligne, Tristan Struja, Judith Abecassis, Claire Morgand, Leo Anthony Celi, Gaël Varoquaux","doi":"10.1371/journal.pdig.0000721","DOIUrl":"10.1371/journal.pdig.0000721","url":null,"abstract":"<p><p>Causal inference enables machine learning methods to estimate treatment effects of medical interventions from electronic health records (EHRs). The prevalence of such observational data and the difficulty for randomized controlled trials (RCT) to cover all population/treatment relationships make these methods increasingly attractive for studying causal effects. However, researchers should be wary of many pitfalls. We propose and illustrate a framework for causal inference estimating the effect of albumin on mortality in sepsis using an Intensive Care database (MIMIC-IV) and comparing various sensitivity analyses to results from RCTs as gold-standard. The first step is study design, using the target trial concept and the PICOT framework: Population (patients with sepsis), Intervention (combination of crystalloids and albumin for fluid resuscitation), Control (crystalloids only), Outcome (28-day mortality), Time (intervention start within 24h of admission). We show that too large treatment-initiation times induce immortal time bias. The second step is selection of the confounding variables based on expert knowledge. Increasingly adding confounders enables to recover the RCT results from observational data. As the third step, we assess the influence of multiple models with varying assumptions, showing that a doubly robust estimator (AIPW) with random forests proved to be the most reliable estimator. Results show that these steps are all important for valid causal estimates. A valid causal model can then be used to individualize decision making: subgroup analyses showed that treatment efficacy of albumin was better for patients >60 years old, males, and patients with septic shock. Without causal thinking, machine learning is not enough for optimal clinical decision on an individual patient level. Our step-by-step analytic framework helps avoiding many pitfalls of applying machine learning to EHR data, building models that avoid shortcuts and extract the best decision-making evidence.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000721"},"PeriodicalIF":0.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11790099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143124029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PLOS digital healthPub Date : 2025-02-03eCollection Date: 2025-02-01DOI: 10.1371/journal.pdig.0000735
David Pau, Camille Bachot, Charles Monteil, Laetitia Vinet, Mathieu Boucher, Nadir Sella, Romain Jegou
{"title":"Comparison of anonymization techniques regarding statistical reproducibility.","authors":"David Pau, Camille Bachot, Charles Monteil, Laetitia Vinet, Mathieu Boucher, Nadir Sella, Romain Jegou","doi":"10.1371/journal.pdig.0000735","DOIUrl":"10.1371/journal.pdig.0000735","url":null,"abstract":"<p><strong>Background: </strong>Anonymization opens up innovative ways of using secondary data without the requirements of the GDPR, as anonymized data does not affect anymore the privacy of data subjects. Anonymization requires data alteration, and this project aims to compare the ability of such privacy protection methods to maintain reliability and utility of scientific data for secondary research purposes.</p><p><strong>Methods: </strong>The French data protection authority (CNIL) defines anonymization as a processing activity that consists of using methods to make impossible any identification of people by any means in an irreversible manner. To answer project's objective, a series of analyses were performed on a cohort, and reproduced on four sets of anonymized data for comparison. Four assessment levels were used to evaluate impact of anonymization: level 1 referred to the replication of statistical outputs, level 2 referred to accuracy of statistical results, level 3 assessed data alteration (using Hellinger distances) and level 4 assessed privacy risks (using WP29 criteria).</p><p><strong>Results: </strong>87 items were produced on the raw cohort data and then reproduced on each of the four anonymized data. The overall level 1 replication score ranged from 67% to 100% depending on the anonymization solution. The most difficult analyses to replicate were regression models (sub-score ranging from 78% to 100%) and survival analysis (sub-score ranging from 0% to 100. The overall level 2 accuracy score ranged from 22% to 79% depending on the anonymization solution. For level 3, three methods had some variables with different probability distributions (Hellinger distance = 1). For level 4, all methods had reduced the privacy risk of singling out, with relative risk reductions ranging from 41% to 65%.</p><p><strong>Conclusion: </strong>None of the anonymization methods reproduced all outputs and results. A trade-off has to be find between context risk and the usefulness of data to answer the research question.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000735"},"PeriodicalIF":0.0,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11790161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143123568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}