Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz
{"title":"在慢性荨麻疹研究中,合成数据可以允许更小的样本量吗?","authors":"Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz","doi":"10.1002/clt2.70087","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.</p><p><strong>Methods: </strong>A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.</p><p><strong>Results: </strong>The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.</p><p><strong>Conclusion: </strong>Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.</p>","PeriodicalId":10334,"journal":{"name":"Clinical and Translational Allergy","volume":"15 8","pages":"e70087"},"PeriodicalIF":4.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12329239/pdf/","citationCount":"0","resultStr":"{\"title\":\"Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?\",\"authors\":\"Annika Gutsche, Pascale Salameh, Samad S Jahandideh, Mehran Roodsaz, Serkan Kutan, Ali Salehzadeh-Yazdi, Emek Kocatürk, Stamatios Gregoriou, Simon F Thomsen, Kanokvalai Kulthanan, Papapit Tuchinda, Joachim Dissemond, Alicja Kasperska-Zajac, Magdalena Zajac, Mateusz Zamłyński, Martijn van Doorn, Claudio A S Parisi, Jonny G Peter, Cascia Day, Cathryn McDougall, Michael Makris, Daria Fomina, Elena Kovalkova, Nikolai Streliaev, Gerelma Andrenova, Marina Lebedkina, Maryam Khoskhkui, Mehraneh M Aliabadi, Andrea Bauer, Lea Kiefer, Melba Muñoz, Karsten Weller, Pavel Kolkhir, Martin Metz\",\"doi\":\"10.1002/clt2.70087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.</p><p><strong>Methods: </strong>A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.</p><p><strong>Results: </strong>The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.</p><p><strong>Conclusion: </strong>Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.</p>\",\"PeriodicalId\":10334,\"journal\":{\"name\":\"Clinical and Translational Allergy\",\"volume\":\"15 8\",\"pages\":\"e70087\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12329239/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Translational Allergy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/clt2.70087\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ALLERGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Allergy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/clt2.70087","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ALLERGY","Score":null,"Total":0}
Can Synthetic Data Allow for Smaller Sample Sizes in Chronic Urticaria Research?
Background: Robust data are essential for clinical and epidemiological research, yet in chronic spontaneous urticaria (CSU), certain patient groups, such as the elderly or comorbid patients, are often underrepresented. In clinical trials, strict inclusion and exclusion criteria frequently limit recruitment, making it difficult to achieve sufficient statistical power. Similarly, real-world observational studies may lack sufficient sample sizes for robust analysis. To address these limitations, we generated synthetic patient data that reflect these groups' clinical characteristics and variability. This approach enables more comprehensive analyses, facilitates hypothesis testing in otherwise inaccessible populations, and supports the generation of evidence where traditional data sources are insufficient.
Methods: A tree-based decision model was applied to generate synthetic data based on an existing set of real-world data (RWD) from the Chronic Urticaria Registry (CURE). Descriptive characteristics and association strength between relevant RWD variables and their synthetic counterparts were analyzed as indicators of replication accuracy, providing insight into how closely the synthetic data aligns with the RWD. Finally, we determined the minimum sample size required to generate high-quality synthetic data.
Results: The algorithm produced extensive synthetic data records, closely mirroring patient demographics and disease clinical characteristics. Smaller subgroups of the data were equally replicated and followed the same distribution as RWD. Known associations and correlations between disease-specific factors (disease control) and risk factors (age) yielded similar results, with no significant difference (p > 0.05). The lowest threshold at which synthetic data could be generated while maintaining high accuracy in RWD was identified to be 25%, enabling a fourfold increase in the synthetic population.
Conclusion: Synthetic data could replicate RWD with reasonable accuracy for patients with CSU down to 25% of the original population size. This method has the potential to extend small patient subgroups in clinical and epidemiological research.
期刊介绍:
Clinical and Translational Allergy, one of several journals in the portfolio of the European Academy of Allergy and Clinical Immunology, provides a platform for the dissemination of allergy research and reviews, as well as EAACI position papers, task force reports and guidelines, amongst an international scientific audience.
Clinical and Translational Allergy accepts clinical and translational research in the following areas and other related topics: asthma, rhinitis, rhinosinusitis, drug hypersensitivity, allergic conjunctivitis, allergic skin diseases, atopic eczema, urticaria, angioedema, venom hypersensitivity, anaphylaxis, food allergy, immunotherapy, immune modulators and biologics, animal models of allergic disease, immune mechanisms, or any other topic related to allergic disease.