Large language models for analyzing open text in global health surveys: why children are not accessing vaccine services in the Democratic Republic of the Congo.
IF 2.3 4区 医学Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
{"title":"Large language models for analyzing open text in global health surveys: why children are not accessing vaccine services in the Democratic Republic of the Congo.","authors":"Roy Burstein, Eric Mafuta, Joshua L Proctor","doi":"10.1093/inthealth/ihaf015","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study evaluates the use of large language models (LLMs) to analyze free-text responses from large-scale global health surveys, using data from the Enquête de Couverture Vaccinale (ECV) household coverage surveys from 2020, 2021, 2022 and 2023 as a case study.</p><p><strong>Methods: </strong>We tested several LLM approaches consisting of zero-shot and few-shot prompting, fine-tuning, and a natural language processing approach using semantic embeddings, to analyze responses on the reasons caregivers did not vaccinate their children.</p><p><strong>Results: </strong>Performance ranged from 61.5% to 96% based on testing against a curated benchmarking dataset drawn from the ECV surveys, with accuracy improving when LLMs were fine-tuned or provided examples for few-shot learning. We show that even with as few as 20-100 examples, LLMs can achieve high accuracy in categorizing free-text responses.</p><p><strong>Conclusions: </strong>This approach offers significant opportunities for reanalyzing existing datasets and designing surveys with more open-ended questions, providing a scalable, cost-effective solution for global health organizations. Despite challenges with closed-source models and computational costs, the study underscores LLMs' potential to enhance data analysis and inform global health policy.</p>","PeriodicalId":49060,"journal":{"name":"International Health","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/inthealth/ihaf015","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study evaluates the use of large language models (LLMs) to analyze free-text responses from large-scale global health surveys, using data from the Enquête de Couverture Vaccinale (ECV) household coverage surveys from 2020, 2021, 2022 and 2023 as a case study.
Methods: We tested several LLM approaches consisting of zero-shot and few-shot prompting, fine-tuning, and a natural language processing approach using semantic embeddings, to analyze responses on the reasons caregivers did not vaccinate their children.
Results: Performance ranged from 61.5% to 96% based on testing against a curated benchmarking dataset drawn from the ECV surveys, with accuracy improving when LLMs were fine-tuned or provided examples for few-shot learning. We show that even with as few as 20-100 examples, LLMs can achieve high accuracy in categorizing free-text responses.
Conclusions: This approach offers significant opportunities for reanalyzing existing datasets and designing surveys with more open-ended questions, providing a scalable, cost-effective solution for global health organizations. Despite challenges with closed-source models and computational costs, the study underscores LLMs' potential to enhance data analysis and inform global health policy.
期刊介绍:
International Health is an official journal of the Royal Society of Tropical Medicine and Hygiene. It publishes original, peer-reviewed articles and reviews on all aspects of global health including the social and economic aspects of communicable and non-communicable diseases, health systems research, policy and implementation, and the evaluation of disease control programmes and healthcare delivery solutions.
It aims to stimulate scientific and policy debate and provide a forum for analysis and opinion sharing for individuals and organisations engaged in all areas of global health.