Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms.

IF 1.8 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Methods of Information in Medicine Pub Date : 2021-12-01 Epub Date: 2021-10-31 DOI:10.1055/s-0041-1735620

Asa Adadey, Robert Giannini, Lorraine B Possanza

{"title":"Developing an Analytical Pipeline to Classify Patient Safety Event Reports Using Optimized Predictive Algorithms.","authors":"Asa Adadey, Robert Giannini, Lorraine B Possanza","doi":"10.1055/s-0041-1735620","DOIUrl":null,"url":null,"abstract":"Background: Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.Objectives: The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.Methods: Datasets for 14 different labels (themes) were vectorized using a bag-of-words, tf-idf, or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.Results: F1 score for each optimized model ranged from 0.951 (\"Fall\") to 0.544 (\"Environment\"). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.Conclusions: We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"60 5-06","pages":"147-161"},"PeriodicalIF":1.8000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods of Information in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/s-0041-1735620","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/10/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patient safety event reports provide valuable insight into systemic safety issues but deriving insights from these reports requires computational tools to efficiently parse through large volumes of qualitative data. Natural language processing (NLP) combined with predictive learning provides an automated approach to evaluating these data and supporting the work of patient safety analysts.

Objectives: The objective of this study was to use NLP and machine learning techniques to develop a generalizable, scalable, and reliable approach to classifying event reports for the purpose of driving improvements in the safety and quality of patient care.

Methods: Datasets for 14 different labels (themes) were vectorized using a bag-of-words, tf-idf, or document embeddings approach and then applied to a series of classification algorithms via a hyperparameter grid search to derive an optimized model. Reports were also analyzed for terms strongly associated with each theme using an adjusted F-score calculation.

Results: F₁ score for each optimized model ranged from 0.951 ("Fall") to 0.544 ("Environment"). The bag-of-words approach proved optimal for 12 of 14 labels, and the naïve Bayes algorithm performed best for nine labels. Linear support vector machine was demonstrated as optimal for three labels and XGBoost for four of the 14 labels. Labels with more distinctly associated terms performed better than less distinct themes, as shown by a Pearson's correlation coefficient of 0.634.

Conclusions: We were able to demonstrate an analytical pipeline that broadly applies NLP and predictive modeling to categorize patient safety reports from multiple facilities. This pipeline allows analysts to more rapidly identify and structure information contained in patient safety data, which can enhance the evaluation and the use of this information over time.

查看原文本刊更多论文

开发一个分析管道，以分类患者安全事件报告使用优化的预测算法。

背景:患者安全事件报告为系统安全问题提供了有价值的见解，但从这些报告中获得见解需要计算工具来有效地分析大量定性数据。自然语言处理(NLP)与预测学习相结合，提供了一种自动化的方法来评估这些数据，并支持患者安全分析师的工作。目的:本研究的目的是使用NLP和机器学习技术开发一种可推广、可扩展和可靠的方法来对事件报告进行分类，以促进患者护理的安全性和质量的提高。方法:使用词袋、tf-idf或文档嵌入方法对14个不同标签(主题)的数据集进行矢量化，然后通过超参数网格搜索将其应用于一系列分类算法，以获得优化模型。报告还分析了使用调整后的f分数计算与每个主题密切相关的术语。结果:各优化模型的F1得分范围为0.951(“Fall”)~ 0.544(“Environment”)。单词袋方法被证明对14个标签中的12个是最优的，naïve贝叶斯算法对9个标签表现最好。线性支持向量机被证明对3个标签最优，XGBoost对14个标签中的4个标签最优。皮尔逊相关系数为0.634，结果表明，具有更明显相关术语的标签比不太明显的主题表现得更好。结论:我们能够展示一个广泛应用NLP和预测建模的分析管道，以对来自多个设施的患者安全报告进行分类。该管道允许分析人员更快地识别和构建患者安全数据中包含的信息，这可以随着时间的推移增强对这些信息的评估和使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods of Information in Medicine 医学-计算机：信息系统

CiteScore

3.70

自引率

11.80%

发文量

审稿时长

6-12 weeks

期刊介绍： Good medicine and good healthcare demand good information. Since the journal''s founding in 1962, Methods of Information in Medicine has stressed the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care. Covering publications in the fields of biomedical and health informatics, medical biometry, and epidemiology, the journal publishes original papers, reviews, reports, opinion papers, editorials, and letters to the editor. From time to time, the journal publishes articles on particular focus themes as part of a journal''s issue.