Value in HealthPub Date : 2025-08-21DOI: 10.1016/j.jval.2025.08.007
Carlos Gallego-Moll, Lucía A Carrasco-Ribelles, Marc Casajuana, Laia Maynou, Pablo Arocena, Concepción Violán, Edurne Zabaleta-Del-Olmo
{"title":"Predicting Healthcare Utilization Outcomes With Artificial Intelligence: A Large Scoping Review.","authors":"Carlos Gallego-Moll, Lucía A Carrasco-Ribelles, Marc Casajuana, Laia Maynou, Pablo Arocena, Concepción Violán, Edurne Zabaleta-Del-Olmo","doi":"10.1016/j.jval.2025.08.007","DOIUrl":"10.1016/j.jval.2025.08.007","url":null,"abstract":"<p><strong>Objectives: </strong>To broadly map the research landscape to identify trends, gaps, and opportunities in data sets, methodologies, outcomes, and reporting standards for artificial intelligence (AI)-based healthcare utilization prediction.</p><p><strong>Methods: </strong>We conducted a scoping review following the Joanna Briggs Institute methodology. We searched 3 major international databases (from inception to January 2025) for studies applying AI in predictive healthcare utilization. Extracted data were categorized into data sets characteristics, AI methods and performance metrics, predicted outcomes, and adherence to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) + AI reporting guidelines.</p><p><strong>Results: </strong>Among 1116 records, 121 met inclusion criteria. Most were conducted in the United States (62%). No study incorporated all 6 relevant variable groups: demographic, socioeconomic, health status, perceived need, provider characteristics, and prior utilization. Only 7 studies included 5 of these groups. The main data sources were electronic health records (60%) and claims (28%). Ensemble models were the most frequently used (66.9%), whereas deep learning models were less common (16.5%). AI methods were primarily used to predict future events (90.1%), with hospitalizations (57.9%) and visits (33.1%) being the most predicted outcomes. Adherence to general reporting standards was moderate; however, compliance with AI-specific TRIPOD + AI items was limited.</p><p><strong>Conclusions: </strong>Future research should broaden predicted outcomes to include process- and logistics-oriented events, extend applications beyond prediction-such as cohort selection and matching-and explore underused AI methods, including distance-based algorithms and deep neural networks. Strengthening adherence to TRIPOD-AI reporting guidelines is also essential to enhance the reliability and impact of AI in healthcare planning and economic evaluation.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-21DOI: 10.1016/j.jval.2025.07.032
Aldéric M Fraslin, Anne Aupérin, Caroline Even, Jérôme Fayette, Esma Saada-Bouzid, Cédrik Lafond, Lionnel Geoffrois, Jean Bourhis, Joël Guigay, Julia Bonastre
{"title":"Cost-Utility Analysis Alongside the GORTEC 2014-01 TPExtreme Trial: TPEx Versus EXTREME as First-Line Treatment in Patients With Recurrent or Metastatic Head and Neck Squamous Cell Carcinoma.","authors":"Aldéric M Fraslin, Anne Aupérin, Caroline Even, Jérôme Fayette, Esma Saada-Bouzid, Cédrik Lafond, Lionnel Geoffrois, Jean Bourhis, Joël Guigay, Julia Bonastre","doi":"10.1016/j.jval.2025.07.032","DOIUrl":"10.1016/j.jval.2025.07.032","url":null,"abstract":"<p><strong>Objectives: </strong>The randomized GORTEC 2014-01 TPExtreme trial showed no significant improvement in overall survival with TPEx chemotherapy regimen (docetaxel-platinum-cetuximab) versus EXTREME regimen (platinum-fluorouracil-cetuximab) in first-line treatment of recurrent or metastatic head and neck squamous cell carcinoma. However, the TPEx regimen had a favorable safety profile and could provide an alternative to standard of care with the EXTREME regimen in this setting. Our aim was to assess the cost-utility of the TPEx strategy versus EXTREME strategy in the French setting.</p><p><strong>Methods: </strong>We used a decision-analytic semi-Markov model with 4 health states and 1-month cycles. Resource use was prospectively collected in the GORTEC 2014-01 TPExtreme trial (NCT02268695). Transition probabilities were assessed from patient-level data from the trial (n = 539). In the base-case analysis, direct medical costs from the French National Insurance Scheme and quality-adjusted life-years (QALYs) were computed in both arms over an 18-month time horizon to estimate the incremental net monetary benefit. Deterministic sensitivity analysis and probabilistic sensitivity analysis were conducted.</p><p><strong>Results: </strong>The TPEx regimen was associated with a gain in QALYs (+0.057) and a decrease in cost (-€4 485). In the base-case scenario, the TPEx strategy was dominant over the EXTREME strategy with a positive incremental net monetary benefit amounting to €7349. For a willingness to pay of €50 000 per QALY, the probability of TPEx regimen being cost-effective was 64% and varied between 58% and 67% in the scenario analyses.</p><p><strong>Conclusions: </strong>The TPEx regimen is likely to be cost-effective compared with EXTREME in the French setting.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-21DOI: 10.1016/j.jval.2025.08.006
Jason Shafrin, Jaehong Kim, Jacob Fajnor, Kyi-Sin Than, Elizabeth S Mearns, Stacey L Kowal, Thomas Majda, Jakub P Hlávka
{"title":"A Generalized Risk-Adjusted Cost-Effectiveness Economic Model for Measuring the Value of Interventions That Delay Mobility Impairment Across Neurological Conditions.","authors":"Jason Shafrin, Jaehong Kim, Jacob Fajnor, Kyi-Sin Than, Elizabeth S Mearns, Stacey L Kowal, Thomas Majda, Jakub P Hlávka","doi":"10.1016/j.jval.2025.08.006","DOIUrl":"10.1016/j.jval.2025.08.006","url":null,"abstract":"<p><strong>Objectives: </strong>To quantify how incorporating patient risk preferences and severity adjustments affect the value of a hypothetical treatment for mobility impairments caused by neurological conditions.</p><p><strong>Methods: </strong>A 5-state Markov model was developed to measure the health economic value of a hypothetical treatment delaying the progression of mobility impairments by 30.7% versus standard of care for patients who were 45-year-old, minimally impaired, and had received a diagnosis of a neurological condition. A generalized and risk-adjusted cost-effectiveness (GRACE) model was implemented using relative risk aversion estimates from a US general population survey. Treatment value was measured as risk-aversion and severity-adjusted net monetary benefit (NMB), defined as (1) risk-adjusted health gains (generalized risk-adjusted quality-adjusted life-years [GRA-QALYs]) monetized by (2) risk-aversion and severity-adjusted willingness to pay less (3) incremental costs. Risk-neutral results (traditional cost-effectiveness analysis [TCEA]) were compared.</p><p><strong>Results: </strong>Incorporating risk preferences and disease severity increased the value of health benefits. Incremental health gains from using the hypothetical treatment (vs standard of care) were valued more when accounting for risk preferences with GRACE (1.358 GRA-QALYs vs 1.199 QALY). Willingness to pay for these health gains was higher when computed under GRACE compared with TCEA ($109 656 per GRA-QALY vs $100 000 per QALY). Overall, NMB increased by 11.6% (risk-aversion and severity-adjusted NMB = $278 324 vs TCEA NMB = $249 311) using GRACE versus TCEA. Results were sensitive to risk-aversion estimates and the functional form of patient utility.</p><p><strong>Conclusions: </strong>In the first application of GRACE within neurology, GRACE increased the health economic value of a hypothetical neurology treatment, suggesting that TCEA may undervalue treatments for mobility-related neurological impairments.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-20DOI: 10.1016/j.jval.2025.08.005
Kailu Wang, Oliver Rivero-Arias, Annie Wai-Ling Cheung, Amy Yuen-Kwan Wong, Eng-Kiong Yeoh, Eliza Lai-Yi Wong
{"title":"Comparing Adolescent and Adult Preferences for EQ-5D-Y-3L Health States in Hong Kong.","authors":"Kailu Wang, Oliver Rivero-Arias, Annie Wai-Ling Cheung, Amy Yuen-Kwan Wong, Eng-Kiong Yeoh, Eliza Lai-Yi Wong","doi":"10.1016/j.jval.2025.08.005","DOIUrl":"10.1016/j.jval.2025.08.005","url":null,"abstract":"<p><strong>Objectives: </strong>The EuroQol EQ-5D-Y-3L valuation protocol suggests eliciting adult preferences from the perspective of a 10-year-old child. However, further research on whether it is feasible to elicit adolescent preferences for EQ-5D-Y-3L health states and how adolescent preferences compare with adult preferences is needed. This study aimed to compare preferences for EQ-5D-Y-3L health states and survey response behaviors between adolescents and adults in the general population of Hong Kong.</p><p><strong>Methods: </strong>Cross-sectional face-to-face surveys were conducted between December 2018 and July 2023 with adolescents and adults in Hong Kong. Discrete choice experiments (DCEs) were used to elicit adolescent preferences from their own perspective and adult preferences from a 10-year-old child's perspective for EQ-5D-Y-3L health states. Mixed logit models estimated the relative importance attribute levels for comparison between adolescents and adults using separate models for each group or a pooled model combining responses. Survey response behaviors were also analyzed by comparing the dominant task responses and feedback to DCE tasks between adolescents and adults.</p><p><strong>Results: </strong>DCE responses from 776 adolescents aged 12 to 17 years and 1001 adults were used in the analysis after exclusions. For both groups, the most important dimension was pain/discomfort, followed by worried/sad/unhappy, usual activities, mobility, and self-care. Adolescents placed greater importance on mobility and self-care, while valuing pain/discomfort and usual activities less. Significant differences in relative importance of levels across all dimensions between the 2 groups were observed.</p><p><strong>Conclusions: </strong>Adolescents showed different preference weightings compared with adults but reported greater challenges in completing the DCE tasks. These findings suggest that including adolescents in the valuation of EQ-5D-Y-3L health state is feasible; however, data provided by this group can be of lower data quality than adults.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implications of Value Set Choice on EQ-5D-Y-3L Child and Proxy Health-Related Quality of Life Ratings: What to Do When a Country-Specific \"Y\" Value Set Is Unavailable?","authors":"Diana Khanna, Jyoti Khadka, Christine Mpundu-Kaambwa, Rachel Milte, Julie Ratcliffe","doi":"10.1016/j.jval.2025.08.004","DOIUrl":"10.1016/j.jval.2025.08.004","url":null,"abstract":"<p><strong>Objectives: </strong>There is limited guidance on whether to apply an available EQ-5D-Y-3L \"Y\" value set from another country or use a country-specific EQ-5D-3L \"adult\" value set when a country-specific \"Y\" value set is unavailable. This study aims to examine how the choice of value set (ie, \"Y\" or \"adult\") influences the interrater gap between child-self and proxy-reported health-related quality of life (HRQoL).</p><p><strong>Methods: </strong>An online sample of 845 dyads (children aged 6-10 years and parents) independently completed the self and proxy versions of the EQ-5D-Y-3L. Corresponding HRQoL values were derived using the \"Y\" and the \"adult\" value sets for 5 countries: Germany, Hungary, Japan, The Netherlands, and Spain. Analyses were stratified by age (6-7 vs 8-10-year-olds), gender (boys vs girls), and health condition (no vs yes). Group differences were identified using paired t tests. The percentage of directional consistency in child-proxy discrepancies across value sets was also examined as a secondary analysis.</p><p><strong>Results: </strong>Proxies significantly overestimated HRQoL values across most \"Y\" value sets (Hungary, Japan, and Spain). Significant discrepancies using the corresponding \"adult\" value sets were observed only for Germany. Additionally, significant interrater differences were observed for children without health conditions across all value sets. Proportional agreement in direction was marginally higher when using \"Y\" value sets, except for Germany.</p><p><strong>Conclusions: </strong>The choice of value set influences child-proxy HRQoL assessments. In the absence of a country-specific \"Y\" value set, using an alternative \"Y\" value set is preferable to relying solely on a country-specific \"adult\" value set.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-18DOI: 10.1016/j.jval.2025.07.030
Qi Wu, Catherine Arundel, Charlie Welch, Puvanendran Tharmanathan, Nick Johnson, Belen Corbacho, Joseph J Dias
{"title":"The Cost-Effectiveness of Collagenase Injection Versus Limited Fasciectomy for Moderate Dupuytren's Contracture: An Economic Evaluation of the Dupuytren's Interventions Surgery Versus Collagenase Trial and a Decision Analytical Model.","authors":"Qi Wu, Catherine Arundel, Charlie Welch, Puvanendran Tharmanathan, Nick Johnson, Belen Corbacho, Joseph J Dias","doi":"10.1016/j.jval.2025.07.030","DOIUrl":"10.1016/j.jval.2025.07.030","url":null,"abstract":"<p><strong>Objectives: </strong>To compare the cost-effectiveness of collagenase injection (collagenase) and limited fasciectomy (LF) surgery in treating moderate Dupuytren's contracture (DC) in the United Kingdom over different time horizons.</p><p><strong>Methods: </strong>An incremental cost-effectiveness analysis was conducted alongside a multicenter, pragmatic, parallel randomized controlled trial (Dupuytren's interventions surgery versus collagenase trial), to determine the short-term cost-effectiveness of collagenase compared with LF. A Markov decision analytic model was developed to assess long-term cost-effectiveness.</p><p><strong>Results: </strong>Collagenase was associated with significantly lower cost and insignificantly lower quality-adjusted life-year (QALY) gain compared with LF at 1 year. The probability of collagenase being cost-effective was more than 99% at willingness-to-pay thresholds of £20 000 to £30 000 per QALY. At 2 years, collagenase was both significantly less costly and less effective compared with LF, and LF became cost-effective above a threshold of £25 488. There was a high level of uncertainty surrounding the 2-year results. Over a lifetime horizon, collagenase generated a cost saving of £2968 per patient but was associated with a mean QALY loss of -0.484. The probability of collagenase being cost-effective dropped to 22% and 16% at £20 000 to £30 000 per QALY, respectively.</p><p><strong>Conclusions: </strong>Collagenase was less costly and less effective than LF in treating Dupuytren's contracture. The cost-effectiveness of collagenase compared with LF was time dependent. Collagenase was highly cost-effective 1-year after treatment; however, the probability of collagenase being cost-effective declined over time. The Markov model suggested that LF is more cost-effective over a lifetime horizon. These findings emphasize the importance of longer follow-up when comparing surgical and nonsurgical interventions to fully capture overall costs and benefits.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-16DOI: 10.1016/j.jval.2025.08.003
Milena Izmirlieva
{"title":"Application of International Reference Pricing Rules to Forecast Pharmaceutical Launch Prices in 5 European Countries.","authors":"Milena Izmirlieva","doi":"10.1016/j.jval.2025.08.003","DOIUrl":"10.1016/j.jval.2025.08.003","url":null,"abstract":"<p><strong>Objectives: </strong>To assess whether applying official International Reference Pricing (IRP) rules allows accurate prediction of pharmaceutical launch prices in Austria, Bulgaria, Croatia, The Netherlands, and North Macedonia.</p><p><strong>Methods: </strong>Official pre-2019-reform IRP regulations were examined, with 1 Dutch rule further clarified via primary research. IRP rules were applied precisely to calculate the maximum price under IRP for all new chemical/molecular entities presentations first priced in each country in 2018 and customarily subject to IRP, using POLI pricing and reimbursement data for the referrer and the reference countries. Maximum allowed launch prices under IRP were calculated by using prices available for referencing 1 month before launch, the regulation-specified exchange rate and the definition of what constitutes an acceptable product for referencing. If an identical pack size/formulation was not available or multiple suitable products for referencing existed, the appropriate conversion rules were applied as stated in the IRP regulations. The actual first price was compared with the price calculated under IRP for each new chemical or molecular entity presentation, with several scenarios run to explain discrepancies.</p><p><strong>Results: </strong>The mean absolute percentage error (MAPE) between the actual first price and the forecasted/IRP-based price across the sample was lowest for Bulgaria (3.04%), followed by The Netherlands (4.53%), Austria (4.91%), Croatia (9.63%), and North Macedonia (22.09%). North Macedonia's high MAPE is because the country allows prices to exceed the IRP-based price by up to 20%. MAPE < 10% indicates outstanding model performance.</p><p><strong>Conclusions: </strong>Precise IRP rules application can accurately predict launch prices in countries where IRP is binding and is the main price-setting method.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144875442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-13DOI: 10.1016/j.jval.2025.07.001
Muhammed Rashid, Cheng Su Yi, Thipsukhon Sathapanasiri, Sariya Udayachalerm, Kansak Boonpattharatthiti, Suppachai Insuk, Sajesh K Veettil, Nai Ming Lai, Nathorn Chaiyakunapruk, Teerapon Dhippayom
{"title":"Role of Generative Artificial Intelligence in Assisting Systematic Review Process in Health Research: A Systematic Review.","authors":"Muhammed Rashid, Cheng Su Yi, Thipsukhon Sathapanasiri, Sariya Udayachalerm, Kansak Boonpattharatthiti, Suppachai Insuk, Sajesh K Veettil, Nai Ming Lai, Nathorn Chaiyakunapruk, Teerapon Dhippayom","doi":"10.1016/j.jval.2025.07.001","DOIUrl":"https://doi.org/10.1016/j.jval.2025.07.001","url":null,"abstract":"<p><strong>Objectives: </strong>Artificial intelligence (AI) is widely used in healthcare for various purposes, with generative AI (GAI) increasingly being applied to systematic review (SR) processes. We aimed to summarize the evidence on the performance metrics of GAI in the SR process.</p><p><strong>Methods: </strong>PubMed, EMBASE, Scopus, and ProQuest Dissertations & Theses Global were searched from their inception up to March 2025. Only experimental studies that compared GAI with other GAIs or human reviewers at any stage of the SR were included. Modified Quality Assessment of Diagnostic Accuracy Studies version 2 was used to assess the quality of the studies that used GAI in the study selection process. We summarized the findings of the included studies using a narrative approach.</p><p><strong>Results: </strong>Out of 7418 records screened, 30 studies were included. These studies used GAI tools such as ChatGPT, Bard, and Microsoft Bing AI. GAI appears to be effective for participant, intervention, comparator, and outcome formulation and data extraction processes, including complex information. However, because of inconsistent reliability, GAI is not recommended for literature search and study selection as it may retrieve nonrelevant articles and yield inconsistent results. There was mixed evidence on whether GAI can be used for risk of bias assessment. Studies using GAI for study selection were generally of high quality based on the modified Quality Assessment of Diagnostic Accuracy Studies version 2.</p><p><strong>Conclusions: </strong>GAI shows promising support in participant, intervention, comparator, and outcome-based question formulation and data extraction. Although it holds potential to enhance the SR process in healthcare, further practical application and validated evidence are needed before it can be fully integrated into standard workflows.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144970796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Value in HealthPub Date : 2025-08-13DOI: 10.1016/j.jval.2025.08.002
Kyeryoung Lee, Hunki Paek, Nneka Ofoegbu, Steven Rube, Mitchell K Higashi, Dalia Dawoud, Hua Xu, Lizheng Shi, Xiaoyan Wang
{"title":"A4SLR: An Agentic AI-Assisted Systematic Literature Review Framework to Augment Evidence Synthesis for HEOR and HTA.","authors":"Kyeryoung Lee, Hunki Paek, Nneka Ofoegbu, Steven Rube, Mitchell K Higashi, Dalia Dawoud, Hua Xu, Lizheng Shi, Xiaoyan Wang","doi":"10.1016/j.jval.2025.08.002","DOIUrl":"https://doi.org/10.1016/j.jval.2025.08.002","url":null,"abstract":"<p><strong>Objectives: </strong>Systematic literature reviews (SLRs) are essential for synthesizing high-quality evidence in clinical research, health economics and outcome research (HEOR), and health technology assessments (HTAs). However, the growing volume of published data has made SLRs time-consuming, labor-intensive, and costly. To address these challenges, we introduce A4SLR, an Agentic Artificial intelligence (AI)-Assisted SLR framework, that provides a flexible, extensible methodology for automating the entire SLR process-from initial query formulation to evidence synthesis-across various study fields.</p><p><strong>Methods: </strong>A4SLR comprises eight modules integrated with specialized AI agents powered by large language models: Search, I/E criteria deployment, Abstract/full-text screening, Text/table pre-processing, Data extraction, Assessment, Risk of bias analysis, and Report. We implemented and validated this framework using two use cases, non-small cell lung cancer and perinatal mood and anxiety disorders. Performance of the assessment was evaluated quantitatively and qualitatively.</p><p><strong>Results: </strong>Our implementation demonstrated high accuracy in article screening (F1 scores:0.917-0.977), risk of bias assessment (Cohen's κ:0.8442-0.9064), and data extraction (F-scores:0.96-0.998), including patient characteristics, safety and efficacy-outcomes, economic model parameters, and cost-effectiveness data. Notably, the Text/table pre-processing agent yielded comprehensive coverage of data elements, particularly in the challenging tasks of accurately matching outcome values to their corresponding study arms.</p><p><strong>Conclusions: </strong>Our findings highlight the potential of the A4SLR framework to transform the evidence synthesis process by addressing the limitations of manual SLRs, thereby enhancing HEOR and HTAs. Designed as a scalable, user-centric, extensible approach, A4SLR provides a robust solution for generating comprehensive up-to-date evidence to support researchers and decision-makers across diverse clinical and therapeutic areas.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144859679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}