Evaluating the utility of large language models in generating search strings for systematic reviews in anesthesiology: a comparative analysis of top-ranked journals.
{"title":"Evaluating the utility of large language models in generating search strings for systematic reviews in anesthesiology: a comparative analysis of top-ranked journals.","authors":"Alessandro De Cassai, Burhan Dost, Yunus Emre Karapinar, Müzeyyen Beldagli, Mirac Selcen Ozkal Yalin, Esra Turunc, Engin Ihsan Turan, Nicolò Sella","doi":"10.1136/rapm-2024-106231","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study evaluated the effectiveness of large language models (LLMs), specifically ChatGPT 4o and a custom-designed model, Meta-Analysis Librarian, in generating accurate search strings for systematic reviews (SRs) in the field of anesthesiology.</p><p><strong>Methods: </strong>We selected 85 SRs from the top 10 anesthesiology journals, according to Web of Science rankings, and extracted reference lists as benchmarks. Using study titles as input, we generated four search strings per SR: three with ChatGPT 4o using general prompts and one with the Meta-Analysis Librarian model, which follows a structured, Population, Intervention, Comparator, Outcome-based approach aligned with Cochrane Handbook standards. Each search string was used to query PubMed, and the retrieved results were compared with the PubMed retrieved studies from the original search string in each SR to assess retrieval accuracy. Statistical analysis compared the performance of each model.</p><p><strong>Results: </strong>Original search strings demonstrated superior performance with a 65% (IQR: 43%-81%) retrieval rate, which was statistically different from both LLM groups in PubMed retrieved studies (p=0.001). The Meta-Analysis Librarian achieved a superior median retrieval rate to ChatGPT 4o (median, (IQR); 24% (13%-38%) vs 6% (0%-14%), respectively).</p><p><strong>Conclusion: </strong>The findings of this study highlight the significant advantage of using original search strings over LLM-generated search strings in PubMed retrieval studies. The Meta-Analysis Librarian demonstrated notable superiority in retrieval performance compared with ChatGPT 4o. Further research is needed to assess the broader applicability of LLM-generated search strings, especially across multiple databases.</p>","PeriodicalId":54503,"journal":{"name":"Regional Anesthesia and Pain Medicine","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Regional Anesthesia and Pain Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/rapm-2024-106231","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study evaluated the effectiveness of large language models (LLMs), specifically ChatGPT 4o and a custom-designed model, Meta-Analysis Librarian, in generating accurate search strings for systematic reviews (SRs) in the field of anesthesiology.
Methods: We selected 85 SRs from the top 10 anesthesiology journals, according to Web of Science rankings, and extracted reference lists as benchmarks. Using study titles as input, we generated four search strings per SR: three with ChatGPT 4o using general prompts and one with the Meta-Analysis Librarian model, which follows a structured, Population, Intervention, Comparator, Outcome-based approach aligned with Cochrane Handbook standards. Each search string was used to query PubMed, and the retrieved results were compared with the PubMed retrieved studies from the original search string in each SR to assess retrieval accuracy. Statistical analysis compared the performance of each model.
Results: Original search strings demonstrated superior performance with a 65% (IQR: 43%-81%) retrieval rate, which was statistically different from both LLM groups in PubMed retrieved studies (p=0.001). The Meta-Analysis Librarian achieved a superior median retrieval rate to ChatGPT 4o (median, (IQR); 24% (13%-38%) vs 6% (0%-14%), respectively).
Conclusion: The findings of this study highlight the significant advantage of using original search strings over LLM-generated search strings in PubMed retrieval studies. The Meta-Analysis Librarian demonstrated notable superiority in retrieval performance compared with ChatGPT 4o. Further research is needed to assess the broader applicability of LLM-generated search strings, especially across multiple databases.
期刊介绍:
Regional Anesthesia & Pain Medicine, the official publication of the American Society of Regional Anesthesia and Pain Medicine (ASRA), is a monthly journal that publishes peer-reviewed scientific and clinical studies to advance the understanding and clinical application of regional techniques for surgical anesthesia and postoperative analgesia. Coverage includes intraoperative regional techniques, perioperative pain, chronic pain, obstetric anesthesia, pediatric anesthesia, outcome studies, and complications.
Published for over thirty years, this respected journal also serves as the official publication of the European Society of Regional Anaesthesia and Pain Therapy (ESRA), the Asian and Oceanic Society of Regional Anesthesia (AOSRA), the Latin American Society of Regional Anesthesia (LASRA), the African Society for Regional Anesthesia (AFSRA), and the Academy of Regional Anaesthesia of India (AORA).