大巴黎教学医院临床数据仓库中一例罕见头颈癌患者队列的多模式鉴定

A. La Rosa , M. Verdoux , P. Riebler , I. Lolli , C. Daniel , X. Tannier , S. Atallah , B. Baujat , E. Kempf
{"title":"大巴黎教学医院临床数据仓库中一例罕见头颈癌患者队列的多模式鉴定","authors":"A. La Rosa ,&nbsp;M. Verdoux ,&nbsp;P. Riebler ,&nbsp;I. Lolli ,&nbsp;C. Daniel ,&nbsp;X. Tannier ,&nbsp;S. Atallah ,&nbsp;B. Baujat ,&nbsp;E. Kempf","doi":"10.1016/j.esmorw.2025.100151","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.</div></div><div><h3>Materials and methods</h3><div>We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.</div></div><div><h3>Results</h3><div>Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.</div></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"8 ","pages":"Article 100151"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal identification of a rare head and neck cancer patient cohort in the clinical data warehouse of Greater Paris Teaching Hospital\",\"authors\":\"A. La Rosa ,&nbsp;M. Verdoux ,&nbsp;P. Riebler ,&nbsp;I. Lolli ,&nbsp;C. Daniel ,&nbsp;X. Tannier ,&nbsp;S. Atallah ,&nbsp;B. Baujat ,&nbsp;E. Kempf\",\"doi\":\"10.1016/j.esmorw.2025.100151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.</div></div><div><h3>Materials and methods</h3><div>We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.</div></div><div><h3>Results</h3><div>Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.</div></div>\",\"PeriodicalId\":100491,\"journal\":{\"name\":\"ESMO Real World Data and Digital Oncology\",\"volume\":\"8 \",\"pages\":\"Article 100151\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESMO Real World Data and Digital Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949820125000402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949820125000402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:10%的头颈癌(HNCs)不同于常见的上消化道鳞状细胞癌。这些罕见的HNCs因其组织学或解剖位置而罕见。临床数据仓库联盟(cdw)具有促进我们对这些病理的理解的潜力。本研究旨在开发一种多模态算法来识别CDW中罕见的HNC患者。材料与方法我们对38所大学附属医院的CDW进行了横断面研究。我们开发了一种多模式分类算法,通过整合国际疾病分类第十版(ICD-10)代码、细胞学和病理解剖计算机科学发展协会(ADICAP)代码和使用自然语言处理(NLP)的病理报告的自由文本数据,来识别罕见的HNC患者。算法性能由HNC医学专家使用100个手动注释病例的验证集进行评估。结果33852例肿瘤患者中,9141例经ICD-10和ADICAP编码鉴定为HNC患者。使用ICD-10或ADICAP代码或nlp处理的自由文本的多模态算法将4515例患者分类为罕见HNC患者,其中2168例患者被至少两个数据源识别。当依赖于多个数据源时,它显示出91%的敏感性和95%的特异性,对罕见的组织学鉴定的阳性预测值为76%,而对罕见的地形的阳性预测值为43%。结论:本研究证明了一种基于多模式电子健康记录的方法在CDW中识别罕见HNC患者的可行性和实用性。结合自由文本和结构化数据提高了这种队列识别的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multimodal identification of a rare head and neck cancer patient cohort in the clinical data warehouse of Greater Paris Teaching Hospital

Background

Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.

Materials and methods

We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.

Results

Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.

Conclusions

This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信