Development of a Natural Language Processing Model for deriving breast cancer quality indicators : A cross-sectional, multicenter study

IF 2.1 4区医学 Q4 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Revue D Epidemiologie Et De Sante Publique Pub Date : 2023-11-15 DOI:10.1016/j.respe.2023.102189

Etienne Guével , Sonia Priou , Rémi Flicoteaux , Guillaume Lamé , Romain Bey , Xavier Tannier , Ariel Cohen , Gilles Chatellier , Christel Daniel , Christophe Tournigand , Emmanuelle Kempf , On behalf of the AP-HP Cancer Group, a CRAB

{"title":"Development of a Natural Language Processing Model for deriving breast cancer quality indicators : A cross-sectional, multicenter study","authors":"Etienne Guével , Sonia Priou , Rémi Flicoteaux , Guillaume Lamé , Romain Bey , Xavier Tannier , Ariel Cohen , Gilles Chatellier , Christel Daniel , Christophe Tournigand , Emmanuelle Kempf , On behalf of the AP-HP Cancer Group, a CRAB","doi":"10.1016/j.respe.2023.102189","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><p>Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information.</p></div><div><h3>Method</h3><p>We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique – Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the <em>Programme de Médicalisation du Système d'Information</em> (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score.</p></div><div><h3>Results</h3><p>Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0–71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators.</p><p>The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%–93.3%]), an average precision of 77.7% [10.0%–97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators.</p></div><div><h3>Discussion</h3><p>The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated.</p></div><div><h3>Conclusions</h3><p>The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.</p></div>","PeriodicalId":21346,"journal":{"name":"Revue D Epidemiologie Et De Sante Publique","volume":"71 6","pages":"Article 102189"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revue D Epidemiologie Et De Sante Publique","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0398762023007927","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information.

Method

We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique – Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score.

Results

Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0–71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators.

The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%–93.3%]), an average precision of 77.7% [10.0%–97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators.

Discussion

The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated.

Conclusions

The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.

查看原文本刊更多论文

一种自然语言处理模型的发展，用于推导乳腺癌质量指标:一项横断面，多中心研究

目的医疗管理数据有望实现医疗质量安全指标的自动化计算。然而，并非所有相关指标都可以仅凭这一数据计算出来。我们的可行性研究目标是分析1)数据源的可用性;2)每个指标基本变量的可用性;3)应用自然语言处理来自动检索这些信息。方法对Assistance publicque - Hôpitaux de Paris (AP-HP)临床数据仓库进行多中心横断面观察可行性研究。我们研究了2019年1月至2021年6月期间在AP-HP治疗的乳腺癌患者的管理情况，以及欧洲乳腺癌专家协会发布的质量指标，使用了PMSI计划的索赔数据和病理报告。对于每个指标，我们计算了所有必要数据源均可获得的患者人数(%)，以及所有基本变量均可在数据源中获得且相关HQSI可计算的患者人数(%)。为了从自由文本报告中提取有用的数据，我们开发并验证了专用的基于规则的算法，其性能指标通过召回率、精度和f1-score进行评估。结果在5785例确诊为乳腺癌的女性患者(60.9岁，IQR[50.0-71.9])中，5147例(89.0%)接受过PMSI记录的与乳腺癌相关的手术，3732例(72.5%)至少接受过一次手术。在34个关键指标中，9个可以单独使用PMSI计算，另外6个可以使用病理报告数据计算。结合PMSI和病理报告计算6项指标需要10个基本变量。根据指标的不同，58.8%至94.6%的患者获得了必要的来源。所开发的提取算法平均准确率为76.5% (min-max[32.7% ~ 93.3%])，平均精密度为77.7%[10.0% ~ 97.4%]，平均灵敏度为71.6%[2.8% ~ 100.0%]。一旦应用了这些算法，根据指标的不同，计算指标所需的变量被提取为2%至88%的患者。电子健康记录中医疗报告的可用性、报告中基本变量的可用性以及提取算法的性能限制了可以计算指标的总体。结论电子病历质量指标的自动计算是一种可行的方法，但在实际应用中存在许多障碍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Revue D Epidemiologie Et De Sante Publique 医学-公共卫生、环境卫生与职业卫生

CiteScore

1.70

自引率

0.00%

发文量

672

审稿时长

78 days

期刊介绍： The Journal of Epidemiology and Public Health maintains and deepens its own work through the diversity of methodologies and disciplines covered in each issue. The journal also offers pedagogical articles for teachers and students. Articles can be submitted in French or English. Discover a variety of information through : - research articles and reviews, - all disciplines: epidemiology, health economics, - various subjects: cancer, nutrition, aging, - and a wide geographical scope.