PP54 Machine Learning For Accelerating Screening In Literature Reviews

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio
{"title":"PP54 Machine Learning For Accelerating Screening In Literature Reviews","authors":"Mary Chappell, Mary Edwards, Deborah Watkins, Christopher Marshall, Lavinia Ferrante di Ruffano, Anita Fitzgerald, Sara Graziadio","doi":"10.1017/s0266462323001988","DOIUrl":null,"url":null,"abstract":"<span>Introduction</span><p>Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.</p><span>Methods</span><p>A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).</p><span>Results</span><p>Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.</p><span>Conclusions</span><p>Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/s0266462323001988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Systematic reviews are important for informing decision-making and primary research, but they can be time consuming and costly. With the advent of machine learning, there is an opportunity to accelerate the review process in study screening. We aimed to understand the literature to make decisions about the use of machine learning for screening in our review workflow.

Methods

A pragmatic literature review of PubMed to obtain studies evaluating the accuracy of publicly available machine learning screening tools. A single reviewer used ‘snowballing’ searches to identify studies reporting accuracy data and extracted the sensitivity (ability to correctly identify included studies for a review) and specificity, or workload saved (ability to correctly exclude irrelevant studies).

Results

Ten tools (AbstractR, ASReview Lab, Cochrane RCT classifier, Concept encoder, Dpedia, DistillerAI, Rayyan, Research Screener, Robot Analyst, SWIFT-active screener) were evaluated in a total of 16 studies. Fourteen studies were single arm where, although compared with a reference standard (predominantly single reviewer screening), there was no other comparator. Two studies were comparative, where tools were compared with other tools as well as a reference standard. All tools ranked records by probability of inclusion and either (i) applied a cut-point to exclude records or (ii) were used to rank and re-rank records during screening iterations, with screening continuing until most relevant records were obtained. The accuracy of tools varied widely between different studies and review projects. When used in method (ii), at 95 percent to 100 percent sensitivity, tools achieved workload savings of between 7 percent and 99 percent. It was unclear whether evaluations were conducted independent of tool developers.

Conclusions

Evaluations suggest the potential for tools to correctly classify studies in screening. However, conclusions are limited since (i) tool accuracy is generally not compared with dual reviewer screening and (ii) the literature lacks comparative studies and, because of between-study heterogeneity, it is not possible to robustly determine the accuracy of tools compared with each other. Independent evaluations are needed.

PP54 机器学习加速文献综述筛选
系统评价对于为决策和初步研究提供信息是重要的,但是它们可能是耗时和昂贵的。随着机器学习的出现,有机会加速研究筛选的审查过程。我们的目标是了解文献,以便在我们的审查工作流程中决定使用机器学习进行筛选。方法对PubMed的实用文献进行综述,以获得评估公开可用机器学习筛选工具准确性的研究。单一审稿人使用“滚雪球”搜索来识别报告准确性数据的研究,并提取敏感性(正确识别纳入研究的能力)和特异性,或节省工作量(正确排除无关研究的能力)。结果共对16项研究中的10个工具(AbstractR、ASReview Lab、Cochrane RCT分类器、Concept encoder、Dpedia、DistillerAI、Rayyan、Research Screener、Robot Analyst、SWIFT-active Screener)进行了评估。14项研究是单组研究,虽然与参考标准(主要是单一审稿人筛选)进行比较,但没有其他比较物。两项研究是比较的,其中工具与其他工具以及参考标准进行了比较。所有工具根据纳入的概率对记录进行排序,或者(i)应用切点来排除记录,或者(ii)在筛选迭代过程中对记录进行排序和重新排序,直到获得最相关的记录。工具的准确性在不同的研究和综述项目之间差异很大。当在方法(ii)中使用时,在95%到100%的灵敏度下,工具实现了7%到99%的工作量节省。目前尚不清楚评估是否独立于工具开发人员进行。结论评价提示了在筛选中使用正确分类研究的工具的潜力。然而,结论是有限的,因为(i)工具的准确性通常没有与双重审稿人筛选进行比较,(ii)文献缺乏比较研究,并且由于研究之间的异质性,不可能可靠地确定工具相互比较的准确性。需要独立的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信