设计 GTP3 提示,筛选 RCT 系统性综述文章

IF 3.4 3区 医学 Q1 HEALTH POLICY & SERVICES
James A Strachan
{"title":"设计 GTP3 提示,筛选 RCT 系统性综述文章","authors":"James A Strachan","doi":"10.1016/j.hlpt.2024.100943","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.</div></div><div><h3>Methods</h3><div>A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.</div></div><div><h3>Results</h3><div>A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % &amp; 100.0 % respectively in the three test datasets derived from the three different reviews.</div></div><div><h3>Discussion</h3><div>Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.</div></div><div><h3>Public Interest Summary</h3><div>Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.</div></div>","PeriodicalId":48672,"journal":{"name":"Health Policy and Technology","volume":"14 1","pages":"Article 100943"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing GTP3 prompts to screen articles for systematic reviews of RCTs\",\"authors\":\"James A Strachan\",\"doi\":\"10.1016/j.hlpt.2024.100943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.</div></div><div><h3>Methods</h3><div>A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.</div></div><div><h3>Results</h3><div>A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % &amp; 100.0 % respectively in the three test datasets derived from the three different reviews.</div></div><div><h3>Discussion</h3><div>Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.</div></div><div><h3>Public Interest Summary</h3><div>Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.</div></div>\",\"PeriodicalId\":48672,\"journal\":{\"name\":\"Health Policy and Technology\",\"volume\":\"14 1\",\"pages\":\"Article 100943\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Policy and Technology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211883724001060\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH POLICY & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Policy and Technology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211883724001060","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH POLICY & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

导言在筛选文章以适当纳入系统综述方面,GPT 人工智能(AI)系统尚未达到令人满意的灵敏度。设计文章筛选提示的一个问题是,虽然提示的大部分内容在使用前都可以进行验证,即根据以前发表的系统综述进行验证,但包含纳入标准的部分却无法验证。本研究旨在通过尝试确定一种对收录标准的精确措辞变化具有鲁棒性的提示语来推进这一领域的工作。我们在一个训练数据集上进行了测试,该数据集是在对一篇已发表的综述重新进行电子检索时发现的文章。对提示语进行了修改和重新测试,直到在六种不同的纳入标准变体中达到令人满意的筛选灵敏度。然后,通过评估该提示在三个 "测试 "数据集上的性能,对其进行了验证,这三个数据集来自对三篇不同综述的重新电子检索。系统综述总结了所有试图回答科学问题的文章。系统综述通常是医学证据的黄金标准,并为医疗保健政策提供广泛参考。然而,撰写系统综述非常昂贵且耗时。撰写系统综述的初始阶段需要审阅可能数以万计的科学摘要。人工智能(AI)(包括由 OpenAI 运营的人工智能系统 GPT3)可以实现这一过程的自动化。之前使用密切相关的人工智能模型的尝试未能奏效,部分原因可能是 GPT3 的性能在很大程度上取决于给 GPT3 的确切指令或 "提示"。本研究调查了设计这些提示的新方法,该方法在对之前发表的三篇系统综述所收集的文章进行测试时,始终能达到令人满意的筛选性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Designing GTP3 prompts to screen articles for systematic reviews of RCTs

Introduction

Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.

Methods

A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.

Results

A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % & 100.0 % respectively in the three test datasets derived from the three different reviews.

Discussion

Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.

Public Interest Summary

Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Health Policy and Technology
Health Policy and Technology Medicine-Health Policy
CiteScore
9.20
自引率
3.30%
发文量
78
审稿时长
88 days
期刊介绍: Health Policy and Technology (HPT), is the official journal of the Fellowship of Postgraduate Medicine (FPM), a cross-disciplinary journal, which focuses on past, present and future health policy and the role of technology in clinical and non-clinical national and international health environments. HPT provides a further excellent way for the FPM to continue to make important national and international contributions to development of policy and practice within medicine and related disciplines. The aim of HPT is to publish relevant, timely and accessible articles and commentaries to support policy-makers, health professionals, health technology providers, patient groups and academia interested in health policy and technology. Topics covered by HPT will include: - Health technology, including drug discovery, diagnostics, medicines, devices, therapeutic delivery and eHealth systems - Cross-national comparisons on health policy using evidence-based approaches - National studies on health policy to determine the outcomes of technology-driven initiatives - Cross-border eHealth including health tourism - The digital divide in mobility, access and affordability of healthcare - Health technology assessment (HTA) methods and tools for evaluating the effectiveness of clinical and non-clinical health technologies - Health and eHealth indicators and benchmarks (measure/metrics) for understanding the adoption and diffusion of health technologies - Health and eHealth models and frameworks to support policy-makers and other stakeholders in decision-making - Stakeholder engagement with health technologies (clinical and patient/citizen buy-in) - Regulation and health economics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信