RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research

IF 3 3区 医学 Q2 ONCOLOGY
T. Young , H. Drake , V. Butterworth , W. Wulaningsih , B. Dann , A. Giemza , E. Ivy , D. Adjogatse , K. Sambasivan , I. Petkar , M. Reis Ferreira , A. Kong , M. Lei , L. Collins , A. King , D. Vilic , T.G. Urbano
{"title":"RT-HaND_C: A Multi-Source, Validated Real-world Head and Neck Cancer Dataset for Research","authors":"T. Young ,&nbsp;H. Drake ,&nbsp;V. Butterworth ,&nbsp;W. Wulaningsih ,&nbsp;B. Dann ,&nbsp;A. Giemza ,&nbsp;E. Ivy ,&nbsp;D. Adjogatse ,&nbsp;K. Sambasivan ,&nbsp;I. Petkar ,&nbsp;M. Reis Ferreira ,&nbsp;A. Kong ,&nbsp;M. Lei ,&nbsp;L. Collins ,&nbsp;A. King ,&nbsp;D. Vilic ,&nbsp;T.G. Urbano","doi":"10.1016/j.clon.2025.103935","DOIUrl":null,"url":null,"abstract":"<div><h3>Aims</h3><div>Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.</div></div><div><h3>Methods</h3><div>We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010–2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.</div></div><div><h3>Results</h3><div>The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.</div></div><div><h3>Conclusions</h3><div>RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.</div></div>","PeriodicalId":10403,"journal":{"name":"Clinical oncology","volume":"47 ","pages":"Article 103935"},"PeriodicalIF":3.0000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0936655525001906","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Aims

Real-world data (RWD) are a valuable resource for head and neck cancer (HNC) research, offering insights into outcomes among diverse, comorbid patients who are often underrepresented in clinical trials. However, RWD pose challenges, including data quality and requires rigorous evaluation before being used to generate real-world evidence. We aimed to develop a large HNC oncology dataset containing comprehensive clinical data.

Methods

We developed RT-HaND_C, a multi-source clinical dataset integrating structured Electronic Health Record (EHR) data, unstructured EHR data extracted using a previously validated AI-driven Natural Language Processing tool, and manually curated datasets. RT-HaND_C incorporates extensive demographic, disease, laboratory, treatment, outcome (disease and toxicity) and radiotherapy dosimetry data for all HNC oncology patients seen at our centre (2010–2023). The dataset underwent rigorous evaluation for accuracy, completeness and consistency. We evaluated usability by addressing the unanswered question of long-term weight trends post-radical HNC radiotherapy.

Results

The retrospective cohort comprises 2,895 HNC patients with over 1.9 million data points across over 2000 data categories. Accuracy assessments exceeded 98% for most variables. High data completeness and consistency were observed for all key data categories. Dataset usability testing showed rapidly extractable and analysable data, with data demonstrating that HNC patients experienced statistically significant weight loss persisting at 5 years post-radical radiotherapy (even when accounting for disease recurrence), with peak weight loss observed at 6 months post-radiotherapy.

Conclusions

RT-HaND_C represents a novel, high-quality RWD resource and evaluation framework. RT-HaND_C is virtually linked to corresponding diagnostic and radiotherapy imaging data to facilitate multi-modal research. The dataset is available for research and collaboration, with ongoing work focused on enhancing completeness and incorporating prospective updates.
RT-HaND_C:一个多来源,验证的真实世界头颈癌研究数据集
真实世界数据(RWD)是头颈癌(HNC)研究的宝贵资源,为临床试验中往往代表性不足的各种合并症患者的结果提供了见解。然而,RWD带来了挑战,包括数据质量,在用于生成真实世界的证据之前需要进行严格的评估。我们的目标是开发一个包含全面临床数据的大型HNC肿瘤学数据集。方法我们开发了RT-HaND_C,这是一个多源临床数据集,集成了结构化电子健康记录(EHR)数据、使用先前经过验证的人工智能驱动的自然语言处理工具提取的非结构化电子健康记录数据以及手动整理的数据集。RT-HaND_C包含了我们中心所有HNC肿瘤患者(2010-2023年)的广泛人口统计、疾病、实验室、治疗、结果(疾病和毒性)和放疗剂量学数据。对数据集的准确性、完整性和一致性进行了严格的评估。我们通过解决根治性HNC放疗后长期体重趋势的悬而未决的问题来评估可用性。结果回顾性队列包括2895例HNC患者,超过2000个数据类别的190万个数据点。大多数变量的准确率评估超过98%。所有关键数据类别的数据完整性和一致性均较高。数据集可用性测试显示了可快速提取和分析的数据,数据表明,HNC患者在根治性放疗后5年(即使考虑到疾病复发)持续出现统计学上显着的体重减轻,在放疗后6个月观察到体重减轻高峰。结论srt - hand_c是一种新颖、高质量的RWD资源和评价框架。RT-HaND_C实际上与相应的诊断和放疗成像数据相关联,以促进多模式研究。该数据集可用于研究和合作,正在进行的工作重点是提高完整性和纳入前瞻性更新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Clinical oncology
Clinical oncology 医学-肿瘤学
CiteScore
5.20
自引率
8.80%
发文量
332
审稿时长
40 days
期刊介绍: Clinical Oncology is an International cancer journal covering all aspects of the clinical management of cancer patients, reflecting a multidisciplinary approach to therapy. Papers, editorials and reviews are published on all types of malignant disease embracing, pathology, diagnosis and treatment, including radiotherapy, chemotherapy, surgery, combined modality treatment and palliative care. Research and review papers covering epidemiology, radiobiology, radiation physics, tumour biology, and immunology are also published, together with letters to the editor, case reports and book reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信