验证Loon Lens 1.0用于自主摘要筛选和系统评论中自信引导的人在循环工作流程。

IF 6 2区医学 Q1 ECONOMICS

Value in Health Pub Date : 2025-09-23 DOI:10.1016/j.jval.2025.09.008

Ghayath Janoudi, Mara Uzun, Tim Disher, Mia Jurdana, Ena Fuzul, Josip Ivkovic, Brian Hutton

{"title":"验证Loon Lens 1.0用于自主摘要筛选和系统评论中自信引导的人在循环工作流程。","authors":"Ghayath Janoudi, Mara Uzun, Tim Disher, Mia Jurdana, Ena Fuzul, Josip Ivkovic, Brian Hutton","doi":"10.1016/j.jval.2025.09.008","DOIUrl":null,"url":null,"abstract":"Objectives: Title and Abstract (TiAb) screening is a labour-intensive step in systematic literature reviews (SLR). We examine the performance of Loon Lens 1.0, an agentic AI platform for autonomous TiAb screening and test whether its confidence scores can target minimal human oversight.Methods: Eight SLRs by Canada's Drug Agency were re-screened through dual-human reviewers and adjudicated process (3,796 citations, 287 includes, 7.6%) and separately by Loon Lens, based on predefined eligibility criteria. Accuracy, sensitivity, precision, and specificity were measured and bootstrapped to generate 95% confidence intervals. Logistic regression with (i) confidence alone and (ii) confidence + Include/Exclude decision predicted errors and informed simulated human-in-the-loop (HITL) strategies.Results: Loon Lens achieved 95.5% accuracy (95% CI 94.8-96.1), 98.9% sensitivity (97.6-100), 95.2% specificity (94.5-95.9) and 63.0% precision (58.4-67.3). Errors clustered in Low-Medium-confidence Includes. The extended logistic regression model (confidence + decision; C-index 0.98) estimated a 75% error probability for Low-confidence Includes versus <0.1% for Very-High-confidence Excludes. Simulated HITL review of Low + Medium-confidence Includes only (145 citations, 3.8%), lifted precision to 81.4% and overall accuracy to 98.2% while preserving sensitivity (99.0%). Adding High-confidence Includes (221 citations, 5.8%) pushed precision to 89.9% and accuracy to 99.0%.Conclusions: Across eight SLRs (3,796 citations), Loon Lens 1.0 reproduced adjudicated human screening with 98.9% sensitivity and 95.2% specificity. In simulation, restricting human-in-the-loop review to ≤5.8% of citations, by prioritising low- and medium-confidence Include calls, reduced false positives and increased precision to 89.9% while maintaining sensitivity and raising overall accuracy to 99.0%. These findings indicate that confidence-guided oversight can concentrate reviewer effort on a small subset of records.","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Validating Loon Lens 1.0 for Autonomous Abstract Screening and Confidence-Guided Human-in-the-Loop Workflows in Systematic Reviews.\",\"authors\":\"Ghayath Janoudi, Mara Uzun, Tim Disher, Mia Jurdana, Ena Fuzul, Josip Ivkovic, Brian Hutton\",\"doi\":\"10.1016/j.jval.2025.09.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Title and Abstract (TiAb) screening is a labour-intensive step in systematic literature reviews (SLR). We examine the performance of Loon Lens 1.0, an agentic AI platform for autonomous TiAb screening and test whether its confidence scores can target minimal human oversight.Methods: Eight SLRs by Canada's Drug Agency were re-screened through dual-human reviewers and adjudicated process (3,796 citations, 287 includes, 7.6%) and separately by Loon Lens, based on predefined eligibility criteria. Accuracy, sensitivity, precision, and specificity were measured and bootstrapped to generate 95% confidence intervals. Logistic regression with (i) confidence alone and (ii) confidence + Include/Exclude decision predicted errors and informed simulated human-in-the-loop (HITL) strategies.Results: Loon Lens achieved 95.5% accuracy (95% CI 94.8-96.1), 98.9% sensitivity (97.6-100), 95.2% specificity (94.5-95.9) and 63.0% precision (58.4-67.3). Errors clustered in Low-Medium-confidence Includes. The extended logistic regression model (confidence + decision; C-index 0.98) estimated a 75% error probability for Low-confidence Includes versus <0.1% for Very-High-confidence Excludes. Simulated HITL review of Low + Medium-confidence Includes only (145 citations, 3.8%), lifted precision to 81.4% and overall accuracy to 98.2% while preserving sensitivity (99.0%). Adding High-confidence Includes (221 citations, 5.8%) pushed precision to 89.9% and accuracy to 99.0%.Conclusions: Across eight SLRs (3,796 citations), Loon Lens 1.0 reproduced adjudicated human screening with 98.9% sensitivity and 95.2% specificity. In simulation, restricting human-in-the-loop review to ≤5.8% of citations, by prioritising low- and medium-confidence Include calls, reduced false positives and increased precision to 89.9% while maintaining sensitivity and raising overall accuracy to 99.0%. These findings indicate that confidence-guided oversight can concentrate reviewer effort on a small subset of records.\",\"PeriodicalId\":23508,\"journal\":{\"name\":\"Value in Health\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Value in Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jval.2025.09.008\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Value in Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jval.2025.09.008","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

摘要

目的：标题和摘要（TiAb）筛选是系统文献综述（SLR）的一个劳动密集型步骤。我们研究了Loon Lens 1.0的性能，这是一个用于自主TiAb筛选的人工智能平台，并测试其置信度评分是否可以针对最小的人为监督。方法：根据预先设定的资格标准，通过双人审稿人和评审程序对加拿大药品管理局的8个单反进行重新筛选（3796次引用，287次包括，7.6%），并分别由Loon Lens进行筛选。测量准确度、灵敏度、精密度和特异性，并自举生成95%置信区间。(i)单独置信度和（ii）置信度+包括/排除决策预测错误和知情的模拟人在环（HITL）策略的逻辑回归。结果：Loon Lens的准确率为95.5% (95% CI为94.8 ~ 96.1)，灵敏度为98.9%(97.6 ~ 100)，特异性为95.2%(94.5 ~ 95.9)，精密度为63.0%（58.4 ~ 67.3）。聚集在中低置信度包含中的错误。扩展逻辑回归模型（置信度+决策；c指数0.98）估计低置信度包括与结论的错误率为75%：在8个单反（3,796个引用）中，Loon Lens 1.0再现了判定的人类筛选，灵敏度为98.9%，特异性为95.2%。在模拟中，通过优先考虑低置信度和中等置信度的Include调用，将人工环路审查限制在≤5.8%的引用中，减少误报并将精度提高到89.9%，同时保持灵敏度并将总体精度提高到99.0%。这些发现表明信心引导的监督可以将审稿人的工作集中在一小部分记录上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Validating Loon Lens 1.0 for Autonomous Abstract Screening and Confidence-Guided Human-in-the-Loop Workflows in Systematic Reviews.

Objectives: Title and Abstract (TiAb) screening is a labour-intensive step in systematic literature reviews (SLR). We examine the performance of Loon Lens 1.0, an agentic AI platform for autonomous TiAb screening and test whether its confidence scores can target minimal human oversight.

Methods: Eight SLRs by Canada's Drug Agency were re-screened through dual-human reviewers and adjudicated process (3,796 citations, 287 includes, 7.6%) and separately by Loon Lens, based on predefined eligibility criteria. Accuracy, sensitivity, precision, and specificity were measured and bootstrapped to generate 95% confidence intervals. Logistic regression with (i) confidence alone and (ii) confidence + Include/Exclude decision predicted errors and informed simulated human-in-the-loop (HITL) strategies.

Results: Loon Lens achieved 95.5% accuracy (95% CI 94.8-96.1), 98.9% sensitivity (97.6-100), 95.2% specificity (94.5-95.9) and 63.0% precision (58.4-67.3). Errors clustered in Low-Medium-confidence Includes. The extended logistic regression model (confidence + decision; C-index 0.98) estimated a 75% error probability for Low-confidence Includes versus <0.1% for Very-High-confidence Excludes. Simulated HITL review of Low + Medium-confidence Includes only (145 citations, 3.8%), lifted precision to 81.4% and overall accuracy to 98.2% while preserving sensitivity (99.0%). Adding High-confidence Includes (221 citations, 5.8%) pushed precision to 89.9% and accuracy to 99.0%.

Conclusions: Across eight SLRs (3,796 citations), Loon Lens 1.0 reproduced adjudicated human screening with 98.9% sensitivity and 95.2% specificity. In simulation, restricting human-in-the-loop review to ≤5.8% of citations, by prioritising low- and medium-confidence Include calls, reduced false positives and increased precision to 89.9% while maintaining sensitivity and raising overall accuracy to 99.0%. These findings indicate that confidence-guided oversight can concentrate reviewer effort on a small subset of records.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Value in Health 医学-卫生保健

CiteScore

6.90

自引率

6.70%

发文量

3064

审稿时长

3-8 weeks

期刊介绍： Value in Health contains original research articles for pharmacoeconomics, health economics, and outcomes research (clinical, economic, and patient-reported outcomes/preference-based research), as well as conceptual and health policy articles that provide valuable information for health care decision-makers as well as the research community. As the official journal of ISPOR, Value in Health provides a forum for researchers, as well as health care decision-makers to translate outcomes research into health care decisions.