Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz
{"title":"The Literature Review Network: An Explainable Artificial Intelligence for Systematic Literature Reviews, Meta-analyses, and Method Development","authors":"Joshua Morriss, Tod Brindle, Jessica Bah Rösman, Daniel Reibsamen, Andreas Enz","doi":"arxiv-2408.05239","DOIUrl":null,"url":null,"abstract":"Systematic literature reviews are the highest quality of evidence in\nresearch. However, the review process is hindered by significant resource and\ndata constraints. The Literature Review Network (LRN) is the first of its kind\nexplainable AI platform adhering to PRISMA 2020 standards, designed to automate\nthe entire literature review process. LRN was evaluated in the domain of\nsurgical glove practices using 3 search strings developed by experts to query\nPubMed. A non-expert trained all LRN models. Performance was benchmarked\nagainst an expert manual review. Explainability and performance metrics\nassessed LRN's ability to replicate the experts' review. Concordance was\nmeasured with the Jaccard index and confusion matrices. Researchers were\nblinded to the other's results until study completion. Overlapping studies were\nintegrated into an LRN-generated systematic review. LRN models demonstrated\nsuperior classification accuracy without expert training, achieving 84.78% and\n85.71% accuracy. The highest performance model achieved high interrater\nreliability (k = 0.4953) and explainability metrics, linking 'reduce',\n'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51%\nof the relevant literature despite diverging from the non-expert's judgments (k\n= 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN\noutperformed the manual review (19,920 minutes over 11 months), reducing the\nentire process to 288.6 minutes over 5 days. This study demonstrates that\nexplainable AI does not require expert training to successfully conduct\nPRISMA-compliant systematic literature reviews like an expert. LRN summarized\nthe results of surgical glove studies and identified themes that were nearly\nidentical to the clinical researchers' findings. Explainable AI can accurately\nexpedite our understanding of clinical practices, potentially revolutionizing\nhealthcare research.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05239","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Systematic literature reviews are the highest quality of evidence in
research. However, the review process is hindered by significant resource and
data constraints. The Literature Review Network (LRN) is the first of its kind
explainable AI platform adhering to PRISMA 2020 standards, designed to automate
the entire literature review process. LRN was evaluated in the domain of
surgical glove practices using 3 search strings developed by experts to query
PubMed. A non-expert trained all LRN models. Performance was benchmarked
against an expert manual review. Explainability and performance metrics
assessed LRN's ability to replicate the experts' review. Concordance was
measured with the Jaccard index and confusion matrices. Researchers were
blinded to the other's results until study completion. Overlapping studies were
integrated into an LRN-generated systematic review. LRN models demonstrated
superior classification accuracy without expert training, achieving 84.78% and
85.71% accuracy. The highest performance model achieved high interrater
reliability (k = 0.4953) and explainability metrics, linking 'reduce',
'accident', and 'sharp' with 'double-gloving'. Another LRN model covered 91.51%
of the relevant literature despite diverging from the non-expert's judgments (k
= 0.2174), with the terms 'latex', 'double' (gloves), and 'indication'. LRN
outperformed the manual review (19,920 minutes over 11 months), reducing the
entire process to 288.6 minutes over 5 days. This study demonstrates that
explainable AI does not require expert training to successfully conduct
PRISMA-compliant systematic literature reviews like an expert. LRN summarized
the results of surgical glove studies and identified themes that were nearly
identical to the clinical researchers' findings. Explainable AI can accurately
expedite our understanding of clinical practices, potentially revolutionizing
healthcare research.