Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang
{"title":"应用人工智能对随机对照试验的半自动化可信度评估:一个案例研究。","authors":"Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang","doi":"10.1016/j.jclinepi.2025.111672","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective</h3><div>Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.</div></div><div><h3>Methods</h3><div>We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.</div></div><div><h3>Results</h3><div>ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111672"},"PeriodicalIF":7.3000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study\",\"authors\":\"Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang\",\"doi\":\"10.1016/j.jclinepi.2025.111672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective</h3><div>Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.</div></div><div><h3>Methods</h3><div>We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.</div></div><div><h3>Results</h3><div>ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.</div></div>\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\"180 \",\"pages\":\"Article 111672\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0895435625000058\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625000058","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study
Background and Objective
Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.
Methods
We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.
Results
ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.
Conclusion
ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.