在慢性荨麻疹研究中,合成数据可以允许更小的样本量吗?

IF 4 2区 医学 Q2 ALLERGY
Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz
{"title":"在慢性荨麻疹研究中,合成数据可以允许更小的样本量吗?","authors":"Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz","doi":"10.1002/clt2.70087","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.</p><p><strong>Methods: </strong>A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.</p><p><strong>Results: </strong>The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.</p><p><strong>Conclusion: </strong>Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.</p>","PeriodicalId":10334,"journal":{"name":"Clinical and Translational Allergy","volume":"15 8","pages":"e70087"},"PeriodicalIF":4.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12329239/pdf/","citationCount":"0","resultStr":"{\"title\":\"Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?\",\"authors\":\"Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz\",\"doi\":\"10.1002/clt2.70087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.</p><p><strong>Methods: </strong>A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.</p><p><strong>Results: </strong>The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.</p><p><strong>Conclusion: </strong>Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.</p>\",\"PeriodicalId\":10334,\"journal\":{\"name\":\"Clinical and Translational Allergy\",\"volume\":\"15 8\",\"pages\":\"e70087\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12329239/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Translational Allergy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/clt2.70087\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ALLERGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Allergy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/clt2.70087","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ALLERGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:可靠的数据对临床和流行病学研究至关重要,但在慢性自发性荨麻疹(CSU)中,某些患者群体,如老年人或合并症患者,往往代表性不足。在临床试验中,严格的纳入和排除标准往往限制了招募,使得难以获得足够的统计效力。同样,现实世界的观察性研究可能缺乏足够的样本量来进行稳健的分析。为了解决这些局限性,我们合成了反映这些组的临床特征和可变性的患者数据。这种方法可以进行更全面的分析,促进在否则无法进入的人群中进行假设检验,并支持在传统数据源不足的情况下产生证据。方法:采用基于树的决策模型,根据来自慢性荨麻疹登记(CURE)的现有真实世界数据集(RWD)生成合成数据。将相关RWD变量与其合成对应变量之间的描述性特征和关联强度作为复制准确性的指标进行分析,从而深入了解合成数据与RWD的吻合程度。最后,我们确定了生成高质量合成数据所需的最小样本量。结果:该算法产生了广泛的合成数据记录,密切反映了患者人口统计学和疾病临床特征。较小的亚组数据同样被复制,并遵循与RWD相同的分布。已知的疾病特异性因素(疾病控制)和危险因素(年龄)之间的关联和相关性得出了类似的结果,没有显著差异(p < 0.05)。在保持RWD高精度的同时生成合成数据的最低阈值为25%,使合成数据数量增加了四倍。结论:合成数据可以以合理的准确度复制CSU患者的RWD,其原始人群规模降至25%。这种方法有可能在临床和流行病学研究中扩展小患者亚组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?

Background: Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.

Methods: A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.

Results: The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.

Conclusion: Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Clinical and Translational Allergy
Clinical and Translational Allergy Immunology and Microbiology-Immunology
CiteScore
7.50
自引率
4.50%
发文量
117
审稿时长
12 weeks
期刊介绍: Clinical and Translational Allergy, one of several journals in the portfolio of the European Academy of Allergy and Clinical Immunology, provides a platform for the dissemination of allergy research and reviews, as well as EAACI position papers, task force reports and guidelines, amongst an international scientific audience. Clinical and Translational Allergy accepts clinical and translational research in the following areas and other related topics: asthma, rhinitis, rhinosinusitis, drug hypersensitivity, allergic conjunctivitis, allergic skin diseases, atopic eczema, urticaria, angioedema, venom hypersensitivity, anaphylaxis, food allergy, immunotherapy, immune modulators and biologics, animal models of allergic disease, immune mechanisms, or any other topic related to allergic disease.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信