{"title":"设计 GTP3 提示,筛选 RCT 系统性综述文章","authors":"James A Strachan","doi":"10.1016/j.hlpt.2024.100943","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.</div></div><div><h3>Methods</h3><div>A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.</div></div><div><h3>Results</h3><div>A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % & 100.0 % respectively in the three test datasets derived from the three different reviews.</div></div><div><h3>Discussion</h3><div>Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.</div></div><div><h3>Public Interest Summary</h3><div>Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.</div></div>","PeriodicalId":48672,"journal":{"name":"Health Policy and Technology","volume":"14 1","pages":"Article 100943"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing GTP3 prompts to screen articles for systematic reviews of RCTs\",\"authors\":\"James A Strachan\",\"doi\":\"10.1016/j.hlpt.2024.100943\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.</div></div><div><h3>Methods</h3><div>A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.</div></div><div><h3>Results</h3><div>A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % & 100.0 % respectively in the three test datasets derived from the three different reviews.</div></div><div><h3>Discussion</h3><div>Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.</div></div><div><h3>Public Interest Summary</h3><div>Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.</div></div>\",\"PeriodicalId\":48672,\"journal\":{\"name\":\"Health Policy and Technology\",\"volume\":\"14 1\",\"pages\":\"Article 100943\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Policy and Technology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2211883724001060\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH POLICY & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Policy and Technology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211883724001060","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH POLICY & SERVICES","Score":null,"Total":0}
Designing GTP3 prompts to screen articles for systematic reviews of RCTs
Introduction
Satisfactory sensitivity in screening articles for appropriate inclusion in systematic reviews has not yet been achieved using the group of GPT artificial intelligence (AI) systems. One issue in designing prompts for article screening is that while most of the prompt can be validated before use, i.e. on previously published systematic reviews, the part containing the inclusion criteria cannot. This study aimed to advance work in this area by trying to identify a prompt that is robust to variations in the precise wording of inclusion criteria. Prompts with this property should be able to achieve more consistent performance when applied to similar systematic reviews of health topics.
Methods
A prompt, into which alternative wordings (variants) of inclusion criteria could be inserted, was tested on a training dataset of articles identified during the re-run of electronic searches for a single published review. Modification and re-testing of the prompt was undertaken until satisfactory screening sensitivity across six different inclusion criteria variants was achieved. This prompt was then validated by assessing its performance on three “test” datasets, derived from re-run electronic searches from three different reviews.
Results
A prompt was successfully developed using the training dataset that achieved sensitivities of 95.8 %, 100.0 % & 100.0 % respectively in the three test datasets derived from the three different reviews.
Discussion
Iterative design and testing on inclusion criteria variants produced a prompt that consistently achieved satisfactory screening sensitivity. The classification process was fast, cheap and had high specificity.
Public Interest Summary
Systematic reviews summarise all articles that have tried to answer scientific questions. They are usually the gold standard of evidence in medical science and widely inform healthcare policy. However, they are very expensive and time consuming to write. The initial stage of writing systematic reviews consists of reviewing potentially tens of thousands of scientific abstracts. This process may be able to be automated by artificial intelligence (AI) including GPT3 an AI system operated by OpenAI. Previous attempts to use closely related AI models have not worked likely in part because GPT3´s performance is strongly dependant on the exact instructions or “prompts” given to GPT3. This study investigated a new method of designing these prompts which consistently achieved satisfactory screening performance when tested on articles collected for three previously published systematic reviews.
期刊介绍:
Health Policy and Technology (HPT), is the official journal of the Fellowship of Postgraduate Medicine (FPM), a cross-disciplinary journal, which focuses on past, present and future health policy and the role of technology in clinical and non-clinical national and international health environments.
HPT provides a further excellent way for the FPM to continue to make important national and international contributions to development of policy and practice within medicine and related disciplines. The aim of HPT is to publish relevant, timely and accessible articles and commentaries to support policy-makers, health professionals, health technology providers, patient groups and academia interested in health policy and technology.
Topics covered by HPT will include:
- Health technology, including drug discovery, diagnostics, medicines, devices, therapeutic delivery and eHealth systems
- Cross-national comparisons on health policy using evidence-based approaches
- National studies on health policy to determine the outcomes of technology-driven initiatives
- Cross-border eHealth including health tourism
- The digital divide in mobility, access and affordability of healthcare
- Health technology assessment (HTA) methods and tools for evaluating the effectiveness of clinical and non-clinical health technologies
- Health and eHealth indicators and benchmarks (measure/metrics) for understanding the adoption and diffusion of health technologies
- Health and eHealth models and frameworks to support policy-makers and other stakeholders in decision-making
- Stakeholder engagement with health technologies (clinical and patient/citizen buy-in)
- Regulation and health economics