A. La Rosa , M. Verdoux , P. Riebler , I. Lolli , C. Daniel , X. Tannier , S. Atallah , B. Baujat , E. Kempf
{"title":"大巴黎教学医院临床数据仓库中一例罕见头颈癌患者队列的多模式鉴定","authors":"A. La Rosa , M. Verdoux , P. Riebler , I. Lolli , C. Daniel , X. Tannier , S. Atallah , B. Baujat , E. Kempf","doi":"10.1016/j.esmorw.2025.100151","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.</div></div><div><h3>Materials and methods</h3><div>We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.</div></div><div><h3>Results</h3><div>Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.</div></div>","PeriodicalId":100491,"journal":{"name":"ESMO Real World Data and Digital Oncology","volume":"8 ","pages":"Article 100151"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal identification of a rare head and neck cancer patient cohort in the clinical data warehouse of Greater Paris Teaching Hospital\",\"authors\":\"A. La Rosa , M. Verdoux , P. Riebler , I. Lolli , C. Daniel , X. Tannier , S. Atallah , B. Baujat , E. Kempf\",\"doi\":\"10.1016/j.esmorw.2025.100151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.</div></div><div><h3>Materials and methods</h3><div>We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.</div></div><div><h3>Results</h3><div>Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.</div></div><div><h3>Conclusions</h3><div>This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.</div></div>\",\"PeriodicalId\":100491,\"journal\":{\"name\":\"ESMO Real World Data and Digital Oncology\",\"volume\":\"8 \",\"pages\":\"Article 100151\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESMO Real World Data and Digital Oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949820125000402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESMO Real World Data and Digital Oncology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949820125000402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal identification of a rare head and neck cancer patient cohort in the clinical data warehouse of Greater Paris Teaching Hospital
Background
Ten percent of head and neck cancers (HNCs) differ from the common upper aerodigestive tract squamous-cell carcinoma. These rare HNCs can be rare because of their histology or anatomical location. The federation of clinical data warehouses (CDWs) holds potential for advancing our understanding of these pathologies. This study aimed to develop a multimodal algorithm to identify rare HNC patients in a CDW.
Materials and methods
We carried out a cross-sectional study on the CDW of a conglomerate of 38 university hospitals. We developed a multimodal classification algorithm to identify rare HNC patients by integrating International Classification of Diseases, 10th revision (ICD-10) codes, Association for the Development of Computer Science in Cytology and Pathological Anatomy (ADICAP) codes and free-text data from pathology reports using natural language processing (NLP). Algorithm performance was evaluated by an HNC medical expert using a validation set of 100 manually annotated cases.
Results
Of 333 852 cancer patients, 9141 were identified as HNC patients based on ICD-10 and ADICAP codes. The multimodal algorithm using ICD-10 or ADICAP codes or NLP-processed free text classified 4515 patients as rare HNC patients, with 2168 identified by a minimum of two data sources. It showed a 91% sensitivity and a 95% specificity when relying on multiple data sources, with a 76% positive predictive value observed for rare histology identification compared with 43% for rare topography.
Conclusions
This study demonstrates the feasibility and utility of a multimodal electronic health record-based approach to identify rare HNC patients in a CDW. Incorporating free-text and structured data improves the reliability of such cohort identification.