T. Young , H. Drake , V. Butterworth , W. Wulaningsih , B. Dann , A. Giemza , E. Ivy , D. Adjogatse , K. Sambasivan , I. Petkar , M. Reis Ferreira , A. Kong , M. Lei , L. Collins , A. King , D. Vilic , T.G. Urbano
{"title":"RT-HaND_C:一个多来源,验证的真实世界头颈癌研究数据集","authors":"T. Young , H. Drake , V. Butterworth , W. Wulaningsih , B. Dann , A. Giemza , E. Ivy , D. Adjogatse , K. Sambasivan , I. Petkar , M. Reis Ferreira , A. Kong , M. Lei , L. Collins , A. King , D. Vilic , T.G. Urbano","doi":"10.1016/j.clon.2025.103935","DOIUrl":null,"url":null,"abstract":"<div><h3>Aims</h3><div>Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.</div></div><div><h3>Methods</h3><div>We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010–2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.</div></div><div><h3>Results</h3><div>The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.</div></div><div><h3>Conclusions</h3><div>RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.</div></div>","PeriodicalId":10403,"journal":{"name":"Clinical oncology","volume":"47 ","pages":"Article 103935"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research\",\"authors\":\"T. Young , H. Drake , V. Butterworth , W. Wulaningsih , B. Dann , A. Giemza , E. Ivy , D. Adjogatse , K. Sambasivan , I. Petkar , M. Reis Ferreira , A. Kong , M. Lei , L. Collins , A. King , D. Vilic , T.G. Urbano\",\"doi\":\"10.1016/j.clon.2025.103935\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Aims</h3><div>Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.</div></div><div><h3>Methods</h3><div>We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010–2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.</div></div><div><h3>Results</h3><div>The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.</div></div><div><h3>Conclusions</h3><div>RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.</div></div>\",\"PeriodicalId\":10403,\"journal\":{\"name\":\"Clinical oncology\",\"volume\":\"47 \",\"pages\":\"Article 103935\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0936655525001906\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0936655525001906","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research
Aims
Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.
Methods
We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010–2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.
Results
The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.
Conclusions
RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.
期刊介绍:
Clinical Oncology is an International cancer journal covering all aspects of the clinical management of cancer patients, reflecting a multidisciplinary approach to therapy. Papers, editorials and reviews are published on all types of malignant disease embracing, pathology, diagnosis and treatment, including radiotherapy, chemotherapy, surgery, combined modality treatment and palliative care. Research and review papers covering epidemiology, radiobiology, radiation physics, tumour biology, and immunology are also published, together with letters to the editor, case reports and book reviews.