Bart K M Jacobs, Tafadzwa Maseko, Lutgarde Lynen, Aquiles Rodrigo Henriquez-Trujillo, Jozefien Buyze
{"title":"An extension of the Spiegelhalter-Knill-Jones method for continuous covariates in clinical decision making.","authors":"Bart K M Jacobs, Tafadzwa Maseko, Lutgarde Lynen, Aquiles Rodrigo Henriquez-Trujillo, Jozefien Buyze","doi":"10.1186/s12874-025-02591-5","DOIUrl":"https://doi.org/10.1186/s12874-025-02591-5","url":null,"abstract":"<p><strong>Background: </strong>There is still demand for algorithms that can be used at the point of care, especially when dealing with events that do not present with a single obvious clinical indicator. The Spiegelhalter-Knill-Jones (SKJ) method is an approach for the development of a clinical score that focuses on the effect size of predictors, which is more relevant in settings where events may be rare or data is scarce. However, it does require predictors to be binary or dichotomised.</p><p><strong>Methods: </strong>We developed an extension of the Spiegelhalter-Knill-Jones method that can include continuous variables and added additional features that make it more useful in a variety of settings. We illustrated our method on two historical datasets dealing with viral failure in HIV patients in Cambodia. We used area under the curve (AUC) and risk classification improvement (RCI) as metrics to evaluate the performance of resulting predictions scores and risk classifications.</p><p><strong>Results: </strong>All new features worked as intended. Scoring systems developed with the new method outperformed an earlier application of a classic version of SKJ method on the training dataset, while no significant difference was found on any of the performance measures in the test dataset.</p><p><strong>Conclusions: </strong>This extension provides a useful tool for clinical decision-making that is much more flexible than the original version of SKJ, and can be applied in a variety of settings.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"152"},"PeriodicalIF":3.9,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Behavior of test specificity under an imperfect gold standard: findings from a simulation study and analysis of real-world oncology data.","authors":"Mark S Walker, Lukas Slipski, Yanina Natanzon","doi":"10.1186/s12874-025-02603-4","DOIUrl":"10.1186/s12874-025-02603-4","url":null,"abstract":"<p><strong>Background: </strong>Gold standards used in validation of new tests may be imperfect, with sensitivity or specificity less than 100%. The impact of imperfection in a gold standard on measured test attributes has been demonstrated formally, but its relevance in real-world oncology research may not be well understood.</p><p><strong>Methods: </strong>This simulation study examined the impact of imperfect gold standard sensitivity on measured test specificity at different levels of condition prevalence for a hypothetical real-world measure of death. The study also evaluated real-world oncology datasets with a linked National Death Index (NDI) dataset, to examine the measured specificity of a death indicator at levels of death prevalence that matched the simulation. The simulation and real-world data analysis both examined measured specificity of the death indicator at death prevalence ranging from 50 to 98%. To isolate the effects of death prevalence and imperfect gold standard sensitivity, the simulation assumed a test with perfect sensitivity and specificity, and with perfect gold standard specificity. However, gold standard sensitivity was modeled at values from 90 to 99%.</p><p><strong>Results: </strong>Results of the simulation showed that decreasing gold standard sensitivity was associated with increasing underestimation of test specificity, and that the extent of underestimation increased with higher death prevalence. Analysis of the real-world data yielded findings that closely matched the simulation pattern. At 98% death prevalence, near-perfect gold standard sensitivity (99%) still resulted in suppression of specificity from the true value of 100% to the measured value of < 67%.</p><p><strong>Conclusions: </strong>New validation research, and review of existing validation studies, should consider the prevalence of the conditions assessed by a measure, and the possible impact on sensitivity and specificity of an imperfect gold standard.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"151"},"PeriodicalIF":3.9,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kim Nordmann, Stefanie Sauter, Mirjam Stein, Johanna Aigner, Marie-Christin Redlich, Michael Schaller, Florian Fischer
{"title":"Evaluating the performance of artificial intelligence in summarizing pre-coded text to support evidence synthesis: a comparison between chatbots and humans.","authors":"Kim Nordmann, Stefanie Sauter, Mirjam Stein, Johanna Aigner, Marie-Christin Redlich, Michael Schaller, Florian Fischer","doi":"10.1186/s12874-025-02532-2","DOIUrl":"10.1186/s12874-025-02532-2","url":null,"abstract":"<p><strong>Background: </strong>With the rise of large language models, the application of artificial intelligence in research is expanding, possibly accelerating specific stages of the research processes. This study aims to compare the accuracy, completeness and relevance of chatbot-generated responses against human responses in evidence synthesis as part of a scoping review.</p><p><strong>Methods: </strong>We employed a structured survey-based research methodology to analyse and compare responses between two human researchers and four chatbots (ZenoChat, ChatGPT 3.5, ChatGPT 4.0, and ChatFlash) to questions based on a pre-coded sample of 407 articles. These questions were part of an evidence synthesis of a scoping review dealing with digitally supported interaction between healthcare workers.</p><p><strong>Results: </strong>The analysis revealed no significant differences in judgments of correctness between answers by chatbots and those given by humans. However, chatbots' answers were found to recognise the context of the original text better, and they provided more complete, albeit longer, responses. Human responses were less likely to add new content to the original text or include interpretation. Amongst the chatbots, ZenoChat provided the best-rated answers, followed by ChatFlash, with ChatGPT 3.5 and ChatGPT 4.0 tying for third. Correct contextualisation of the answer was positively correlated with completeness and correctness of the answer.</p><p><strong>Conclusions: </strong>Chatbots powered by large language models may be a useful tool to accelerate qualitative evidence synthesis. Given the current speed of chatbot development and fine-tuning, the successful applications of chatbots to facilitate research will very likely continue to expand over the coming years.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"150"},"PeriodicalIF":3.9,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heike Valentin, Henriette Rau, Lizon Fiedler-Lacombe, Arne Blumentritt, Ekaterina Heim, Alexander Rudolph, Katrin Leyh, Monika Kraus, Bettina Lorenz-Depiereux, Irina Chaplinskaya, Christian Schäfer, Jens Schaller, Joerg Janne Vehreschild, Melanie Stecher, Margarete Scherer, Martin Witzenrath, Beate Balzuweit, Stefan Schreiber, Thomas Bahmer, Wolfgang Lieb, Steffen Cordes, Wolfgang Hoffmann, Sabine Hanß, Dana Stahl
{"title":"Managing withdrawals and exclusions of study participants in COVID-19-research by NUKLEUS.","authors":"Heike Valentin, Henriette Rau, Lizon Fiedler-Lacombe, Arne Blumentritt, Ekaterina Heim, Alexander Rudolph, Katrin Leyh, Monika Kraus, Bettina Lorenz-Depiereux, Irina Chaplinskaya, Christian Schäfer, Jens Schaller, Joerg Janne Vehreschild, Melanie Stecher, Margarete Scherer, Martin Witzenrath, Beate Balzuweit, Stefan Schreiber, Thomas Bahmer, Wolfgang Lieb, Steffen Cordes, Wolfgang Hoffmann, Sabine Hanß, Dana Stahl","doi":"10.1186/s12874-025-02526-0","DOIUrl":"10.1186/s12874-025-02526-0","url":null,"abstract":"<p><strong>Background: </strong>This article describes how withdrawals and exclusions of study participants can be managed in COVID-19-cohort studies by NUKLEUS (German: NUM Klinische Epidemiologie- und Studienplattform), using NAPKON (German: Nationales Pandemie Kohorten Netz). The aim of this manuscript was to describe, how partial withdrawals can be performed so that most of the data and bio-samples can be kept for research purposes.</p><p><strong>Methods: </strong>The study has taken all signed informed consents (ICs) of study participants into account in order to develop a method how partial withdrawals can be developed and installed. The informed consents, which comprise of mandatory and optional modules were investigated to find out which optional modules can be withdrawn from without withdrawing consent from the whole study.</p><p><strong>Results: </strong>Withdrawals refer to signed ICs including mandatory and optional modules. Withdrawals can be submitted verbally or in writing, and regarding the IC, as a whole, or only partially. Consequently, implemented withdrawals for NAPKON cohorts comprise partial withdrawals with partial or no data deletion or complete withdrawals with data deletion. Thus, more data is still available for research purpose, which would have been lost without the possibility of partial withdrawals. In NAPKON, a total of 3,97% of the participants have submitted a withdrawal or have been excluded from the study if the inclusion criteria were no longer met.</p><p><strong>Conclusions: </strong>This manuscript is to the author's knowledge one of the first article related to withdrawals within COVID-19-studies (NAPKON). The processes serve as 'best practice' examples for planning and establishing withdrawal processes in medical research.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"148"},"PeriodicalIF":3.9,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Holly Jackson, Yiyun Shou, Nur Amira Binte Mohamed Azad, Jing Wen Chua, Rebecca Lynn Perez, Xinru Wang, Marlieke E A de Kraker, Yin Mo
{"title":"A comparison of frequentist and Bayesian approaches to the Personalised Randomised Controlled Trial (PRACTical)-design and analysis considerations.","authors":"Holly Jackson, Yiyun Shou, Nur Amira Binte Mohamed Azad, Jing Wen Chua, Rebecca Lynn Perez, Xinru Wang, Marlieke E A de Kraker, Yin Mo","doi":"10.1186/s12874-025-02537-x","DOIUrl":"10.1186/s12874-025-02537-x","url":null,"abstract":"<p><strong>Background: </strong>Multiple treatment options frequently exist for a single medical condition with no single standard of care (SoC), rendering a classic randomised trial comparing a specific treatment to a control treatment infeasible. A novel design, the personalised randomised controlled trial (PRACTical), allows individualised randomisation lists and borrows information across patient subpopulations to rank treatments against each other without comparison to a SoC. We evaluated standard frequentist analysis with Bayesian analyses, and developed a novel performance measure, utilising the precision in treatment coefficient estimates, for treatment ranking.</p><p><strong>Methods: </strong>We simulated trial data to compare four targeted antibiotic treatments for multidrug resistant bloodstream infections as an example. Four patient subgroups were simulated based on different combinations of patient and bacteria characteristics, which required four different randomisation lists with some overlapping treatments. The primary outcome was binary, using 60-day mortality. Treatment effects were derived using frequentist and Bayesian analytical approaches, with logistic multivariable regression. The performance measures were: probability of predicting the true best treatment, and novel proxy variables for power (probability of interval separation) and type I error (probability of incorrect interval separation). Several scenarios with varying treatment effects and sample sizes were compared.</p><p><strong>Results: </strong>The Frequentist model and Bayesian model using a strong informative prior, were both likely to predict the true best treatment ( <math> <mrow><msub><mi>P</mi> <mrow><mi>best</mi></mrow> </msub> <mo>≥</mo> <mn>80</mn> <mo>%</mo></mrow> </math> ) and gave a large probability of interval separation (reaching a maximum of <math> <mrow><msub><mi>P</mi> <mrow><mi>IS</mi></mrow> </msub> <mo>=</mo> <mn>96</mn> <mo>%</mo></mrow> </math> ), at a given sample size. Both methods had a low probability of incorrect interval separation ( <math> <mrow><msub><mi>P</mi> <mrow><mi>IIS</mi></mrow> </msub> <mo><</mo> <mn>0.05</mn></mrow> </math> ), for all sample sizes ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>500</mn> <mo>-</mo> <mn>5000</mn></mrow> </math> ) in the null scenarios considered. The sample size required for probability of interval separation to reach 80% ( <math><mrow><mi>N</mi> <mo>=</mo> <mn>1500</mn> <mo>-</mo> <mn>3000</mn></mrow> </math> ), was larger than the sample size required for the probability of predicting the true best treatment to reach 80% ( <math><mrow><mi>N</mi> <mo>≤</mo> <mn>500</mn></mrow> </math> ).</p><p><strong>Conclusions: </strong>Utilising uncertainty intervals on the treatment coefficient estimates are highly conservative, limiting applicability to large pragmatic trials. Bayesian analysis performed similarly to the frequentist approach in terms of predicting the true best treatment.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"149"},"PeriodicalIF":3.9,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144180925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katharina Stahlmann, Bastiaan Kellerhuis, Johannes B Reitsma, Nandini Dendukuri, Antonia Zapf
{"title":"Comparison of methods to handle missing values in a continuous index test in a diagnostic accuracy study - a simulation study.","authors":"Katharina Stahlmann, Bastiaan Kellerhuis, Johannes B Reitsma, Nandini Dendukuri, Antonia Zapf","doi":"10.1186/s12874-025-02594-2","DOIUrl":"10.1186/s12874-025-02594-2","url":null,"abstract":"<p><strong>Background: </strong>Most diagnostic accuracy studies have applied a complete case analysis (CCA) or single imputation approach to address missing values in the index test, which may lead to biased results. Therefore, this simulation study aims to compare the performance of different methods in estimating the AUC of a continuous index test with missing values in a single-test diagnostic accuracy study.</p><p><strong>Methods: </strong>We simulated data for a reference standard, continuous index test, and three covariates using different sample sizes, prevalences of the target condition, correlations between index test and covariates, and true AUCs. Subsequently, missing values were induced for the continuous index test, assuming varying proportions of missing values and missingness mechanisms. Seven methods (multiple imputation (MI), empirical likelihood, and inverse probability weighting approaches) were compared to a CCA in terms of their performance to estimate the AUC given missing values in the index test.</p><p><strong>Results: </strong>Under missing completely at random (MCAR) and many missing values, CCA gives good results for a small sample size and all methods perform well for a large sample size. If missing values are missing at random (MAR), all methods are severely biased if the sample size and prevalence are small. An augmented inverse probability weighting method and standard MI methods perform well with higher prevalence and larger sample size, respectively. Most methods give biased results if missing values are missing not at random (MNAR) and the correlation or the sample size and prevalence are low. Methods using the covariates improve with increasing correlation.</p><p><strong>Conclusions: </strong>Most methods perform well if the proportion of missing values is small. Given a higher proportion of missing values and MCAR, we would recommend to conduct a CCA and standard MI methods for a small and large sample size, respectively. In the absence of better alternatives we recommend to conduct a CCA and to discuss its limitations, if the sample size is small, and missing values are M(N)AR. Standard MI methods and the augmented inverse probability approach may be a good alternative, if the sample size and/or correlation increases. All methods are biased under MNAR and a low correlation.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"147"},"PeriodicalIF":3.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144156855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trials evaluating drug discontinuation: a scoping review sub-analysis focusing on outcomes and research questions.","authors":"Nele Kornder, Norbert Donner-Banzhoff, Ina Staudt, Nina Grede, Annette Becker, Annika Viniol","doi":"10.1186/s12874-025-02597-z","DOIUrl":"10.1186/s12874-025-02597-z","url":null,"abstract":"<p><strong>Background: </strong>The widespread use of long-term pharmacological treatments for chronic conditions has led to polypharmacy, raising concerns about adverse effects and interactions. Deprescribing, the discontinuation of drugs with unfavorable benefit-risk ratios, is gaining attention. Studies evaluating the discontinuation of drugs have a broad methodological spectrum. The selection of outcomes poses a particular challenge. This scoping review addresses the methodological challenges of outcome selection in RCTs investigating drug discontinuation.</p><p><strong>Methods: </strong>The scoping review includes RCTs that investigated the discontinuation of drugs whose efficacy and/or safety was in doubt. Data on study characteristics, the motivation for evaluating drug discontinuation, the number and type of primary endpoints, and the stated hypotheses were extracted and analyzed.</p><p><strong>Results: </strong>We included 103 RCTs. Most studies were from Europe and the USA and mainly investigated antipsychotics/antidepressants, immunosuppressants, steroids and antiepileptics. The discontinuation studies were often conducted due to side effects of the treatment and doubts about the benefits of the drug. The primary endpoints reflected either the course of the disease (\"justification of treatment\") or the disadvantages of the drug (\"justification of withdrawal\"). Non-inferiority hypotheses were generally prevalent in justification of treatment studies, while superiority hypotheses were more commonly used in justification of withdrawal studies. However, due to methodological and practical challenges this was not always the case.</p><p><strong>Conclusion: </strong>We present a framework to choose outcomes and specify hypotheses for discontinuation studies. With regard to this, both key challenges (justification of treatment and justification of withdrawal) must be met.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"146"},"PeriodicalIF":3.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144156856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new method for dealing with collider bias in the PWP model for recurrent events in randomized controlled trials.","authors":"Chen Shi, Jia-Wei Wei, Zi-Shu Zhan, Xiao-Han Xu, Ze-Lin Yan, Chun-Quan Ou","doi":"10.1186/s12874-025-02596-0","DOIUrl":"10.1186/s12874-025-02596-0","url":null,"abstract":"<p><strong>Background: </strong>Evaluating recurrent events within a time-to-event analysis framework effectively utilizes all relevant information to address the clinical question of interest fully and has certain advantages in randomized controlled trials (RCTs). However, the Prentice, Williams, and Peterson (PWP) model disrupts the randomness of the risk set for subsequent recurrent events other than the first and consequently introduces bias in estimating effects. This study aimed to propose a weighted PWP model, evaluate its statistical performance, and assess the potential consequences of using common practices when each recurrence has different baseline hazard functions.</p><p><strong>Methods: </strong>We proposed adjusting the estimate of treatment effect through a weighting strategy that constructed a virtual population balanced between groups in each risk set. A simulation study was carried out. The characteristic of the simulation data was the baseline hazard changed with the number of events. The proposed weighted PWP model was compared with current methods, including Cox for time-to-first-event, Poisson, negative binomial (NB), Andersen-Gill (AG), Lin-Wei-Yang-Ying (LWYY), and PWP models. Model performance was evaluated by bias, type I error rates, and statistical power. All models were applied to a real case from a randomization trial of Chemoprophylaxis treatment for Recurrent Stage I Bladder Tumors.</p><p><strong>Results: </strong>The results showed that the proposed weighted PWP model performed best with the lowest bias and highest statistical power. However, other models, including the Cox for time-to-first-event, Poisson, NB, AG, LWYY, and PWP models, all showed different degrees of bias and inflated type I error rates or low statistical power in the case of the baseline hazard changed with the number of events. Covariate adjustment via outcome regression can lead to inflated type I error rates. When the number of recurrent events was restricted, all weighting strategies yielded stable and nearly consistent results.</p><p><strong>Conclusions: </strong>Recurrent event data should be analyzed with caution. The proposed methods may be generalized to model recurrent events. Our findings serve as an important clarification of how to deal with collider bias in the PWP model in RCTs.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"142"},"PeriodicalIF":3.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12105184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanif Abdul Rahman, Amirul Ariffin Noraidi, Amal Nadhirah Hj Khalid, Alanna Zawani Mohamad-Adam, Nurrabiatul Haziqah Zahari, Nurezzah Ezzaty Tuming
{"title":"Practical guide to calculate sample size for chi-square test in biomedical research.","authors":"Hanif Abdul Rahman, Amirul Ariffin Noraidi, Amal Nadhirah Hj Khalid, Alanna Zawani Mohamad-Adam, Nurrabiatul Haziqah Zahari, Nurezzah Ezzaty Tuming","doi":"10.1186/s12874-025-02584-4","DOIUrl":"10.1186/s12874-025-02584-4","url":null,"abstract":"<p><p>In biomedical research, the calculation of sample size is a critical component of study design. Adequate sample size ensures the reliability of statistical tests, including the chi-square test. This manuscript outlines the use of an online sample size calculator for chi-square tests. The paper includes detailed explanations of the formulas used in the calculations and highlights the importance of power analysis in planning research studies. This tool is designed to assist and guide researchers in determining the optimal sample size for detecting statistically significant differences in categorical data. We describe the theory behind the chi-square test, the statistical principles involved in sample size calculation, and the specific methodology for using the sample size calculator. The calculator is freely available to use at https://hanif-shiny.shinyapps.io/chi-sq/ .</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"144"},"PeriodicalIF":3.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rasmus Rask Kragh Jørgensen, Jonas Faartoft Jensen, Tarec El-Galaly, Martin Bøgsted, Rasmus Froberg Brøndum, Mikkel Runason Simonsen, Lasse Hjort Jakobsen
{"title":"Development of time to event prediction models using federated learning.","authors":"Rasmus Rask Kragh Jørgensen, Jonas Faartoft Jensen, Tarec El-Galaly, Martin Bøgsted, Rasmus Froberg Brøndum, Mikkel Runason Simonsen, Lasse Hjort Jakobsen","doi":"10.1186/s12874-025-02598-y","DOIUrl":"10.1186/s12874-025-02598-y","url":null,"abstract":"<p><strong>Background: </strong>In a wide range of diseases, it is necessary to utilize multiple data sources to obtain enough data for model training. However, performing centralized pooling of multiple data sources, while protecting each patients' sensitive data, can require a cumbersome process involving many institutional bodies. Alternatively, federated learning (FL) can be utilized to train models based on data located at multiple sites.</p><p><strong>Method: </strong>We propose two methods for training time-to-event prediction models based on distributed data, relying on FL algorithms, for time-to-event prediction models. Both approach incorporates steps to allow prediction of individual-level survival curves, without exposing individual-level event times. For Cox proportional hazards models, the latter is accomplished by using a kernel smoother for the baseline hazard function. The other proposed methodology is based on general parametric likelihood theory for right-censored data. We compared these two methods in four simulation and with one real-world dataset predicting the survival probability in patients with Hodgkin lymphoma (HL).</p><p><strong>Results: </strong>The simulations demonstrated that the FL models performed similarly to the non-distributed case in all four experiments, with only slight deviations in predicted survival probabilities compared to the true model. Our findings were similar in the real-world advanced-stage HL example where the FL models were compared to their non-distributed versions, revealing only small deviations in performance.</p><p><strong>Conclusion: </strong>The proposed procedures enable training of time-to-event models using data distributed across sites, without direct sharing of individual-level data and event times, while retaining a predictive performance on par with undistributed approaches.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"143"},"PeriodicalIF":3.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12105200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}