Etienne Guével , Sonia Priou , Rémi Flicoteaux , Guillaume Lamé , Romain Bey , Xavier Tannier , Ariel Cohen , Gilles Chatellier , Christel Daniel , Christophe Tournigand , Emmanuelle Kempf , On behalf of the AP-HP Cancer Group, a CRAB
{"title":"Development of a Natural Language Processing Model for deriving breast cancer quality indicators : A cross-sectional, multicenter study","authors":"Etienne Guével , Sonia Priou , Rémi Flicoteaux , Guillaume Lamé , Romain Bey , Xavier Tannier , Ariel Cohen , Gilles Chatellier , Christel Daniel , Christophe Tournigand , Emmanuelle Kempf , On behalf of the AP-HP Cancer Group, a CRAB","doi":"10.1016/j.respe.2023.102189","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><p>Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information.</p></div><div><h3>Method</h3><p>We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique – Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the <em>Programme de Médicalisation du Système d'Information</em> (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score.</p></div><div><h3>Results</h3><p>Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0–71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators.</p><p>The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%–93.3%]), an average precision of 77.7% [10.0%–97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators.</p></div><div><h3>Discussion</h3><p>The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated.</p></div><div><h3>Conclusions</h3><p>The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.</p></div>","PeriodicalId":21346,"journal":{"name":"Revue D Epidemiologie Et De Sante Publique","volume":"71 6","pages":"Article 102189"},"PeriodicalIF":1.2000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revue D Epidemiologie Et De Sante Publique","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0398762023007927","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information.
Method
We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique – Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score.
Results
Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0–71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators.
The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%–93.3%]), an average precision of 77.7% [10.0%–97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators.
Discussion
The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated.
Conclusions
The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.
期刊介绍:
The Journal of Epidemiology and Public Health maintains and deepens its own work through the diversity of methodologies and disciplines covered in each issue. The journal also offers pedagogical articles for teachers and students. Articles can be submitted in French or English.
Discover a variety of information through :
- research articles and reviews,
- all disciplines: epidemiology, health economics,
- various subjects: cancer, nutrition, aging,
- and a wide geographical scope.