Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor
{"title":"A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models for individual risk-part 1: binary outcomes.","authors":"Richard D Riley, Gary S Collins, Rebecca Whittle, Lucinda Archer, Kym I E Snell, Paula Dhiman, Laura Kirton, Amardeep Legha, Xiaoxuan Liu, Alastair K Denniston, Frank E Harrell, Laure Wynants, Glen P Martin, Joie Ensor","doi":"10.1186/s41512-025-00193-9","DOIUrl":"10.1186/s41512-025-00193-9","url":null,"abstract":"<p><strong>Background: </strong>When using a dataset to develop or update a clinical prediction model, small sample sizes increase concerns of overfitting, instability, poor predictive performance and a lack of fairness. For models estimating the risk of a binary outcome, previous research has outlined sample size calculations that target low overfitting and a precise overall risk estimate. However, more guidance is needed for targeting precise and fair individual-level risk estimates.</p><p><strong>Methods: </strong>We propose a decomposition of Fisher's information matrix to help examine sample sizes required for developing or updating a model, aiming for precise and fair individual-level risk estimates. We outline a five-step process for use before data collection or when an existing dataset or pilot study is available. It requires researchers to specify the overall risk in the target population, the (anticipated) distribution of key predictors in the model and an assumed 'core model' either specified directly (i.e. a logistic regression equation is provided) or based on a specified C-statistic and relative effects of (standardised) predictors.</p><p><strong>Results: </strong>We produce closed-form solutions that decompose the variance of an individual's risk estimate into the Fisher's unit information matrix, predictor values and the total sample size. This allows researchers to quickly calculate and examine the anticipated precision of individual-level predictions and classifications for specified sample sizes. The information can be presented to key stakeholders (e.g. health professionals, patients, grant funders) to inform target sample sizes for prospective data collection or whether an existing dataset is sufficient. Our proposal is implemented in our new software module pmstabilityss. We provide two real examples and emphasise the importance of clinical context, including any risk thresholds for decision making and fairness checks.</p><p><strong>Conclusions: </strong>Our approach helps researchers examine potential sample sizes required to target precise and fair individual-level predictions when developing or updating prediction models for binary outcomes.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12235806/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144585768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anke Bruninx, Lianne Ippel, Rob Willems, Andre Dekker, Iñigo Bermejo
{"title":"Development and temporal evaluation of sex-specific models to predict 4-year atherosclerotic cardiovascular disease risk based on age and neighbourhood characteristics in South Limburg, the Netherlands.","authors":"Anke Bruninx, Lianne Ippel, Rob Willems, Andre Dekker, Iñigo Bermejo","doi":"10.1186/s41512-025-00198-4","DOIUrl":"10.1186/s41512-025-00198-4","url":null,"abstract":"<p><strong>Background: </strong>To improve screening for atherosclerotic cardiovascular disease (ASCVD), we aimed to develop and temporally evaluate sex-specific models to predict 4-year ASCVD risk in South Limburg based on age and neighbourhood characteristics concerning home address.</p><p><strong>Methods: </strong>We included 40- to 70-year-olds living in South Limburg on 1 January 2015 for model development, and 40- to 70-year-olds living in South Limburg on 1 January 2016 for model evaluation. We randomly sampled people selected in 1 year and in both years to create development and evaluation data sets. Follow-up of ASCVD and competing events (overall mortality excluding ASCVD) lasted until 31 December 2020. Candidate predictors were the individual's age, the neighbourhood's socio-economic status, and the neighbourhood's particulate matter concentration. Using the evaluation data sets, we compared two model types, subdistribution and cause-specific hazard models, and eight model structures. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration was assessed by calculating overall expected-observed ratios (E/O). For the final models, calibration plots were made additionally.</p><p><strong>Results: </strong>The development data sets consisted of 67,549 males (4-year cumulative ASCVD incidence: 3.08%) and 67,947 females (4-year cumulative ASCVD incidence: 1.50%). The evaluation data sets consisted of 66,068 males (4-year cumulative ASCVD incidence: 3.22%) and 66,231 females (4-year cumulative ASCVD incidence: 1.49%). For males, the AUROC of the final model equalled 0.6548. The E/O equalled 0.9466. For females, the AUROC equalled 0.6744. The E/O equalled 0.9838.</p><p><strong>Conclusions: </strong>The resulting model shows promise for further research. These models may be used for ASCVD screening in the future.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12220320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144556066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ischemic modified albumin and thiol levels in Coronavirus disease 19: a systematic review and meta-analysis.","authors":"Asma Mousavi, Shayan Shojaei, Peyvand Parhizkar, Razman Arabzadeh Bahri, Sanam Alilou, Hanieh Radkhah","doi":"10.1186/s41512-025-00196-6","DOIUrl":"10.1186/s41512-025-00196-6","url":null,"abstract":"<p><strong>Background: </strong>The COVID-19 pandemic has imposed a significant global health burden. Identifying prognostic markers for COVID-19 and its severity could contribute to improved patient outcomes by reducing morbidity and mortality. This systematic review and meta-analysis aimed to evaluate the relationship between ischemic-modified albumin (IMA) and thiol levels, both indicators of oxidative stress, in patients diagnosed with COVID-19.</p><p><strong>Method: </strong>We conducted a comprehensive search across PubMed, Scopus, Embase, and Web of Science for eligible original studies. The study assessed IMA and thiol levels in COVID-19 patients, examining their association with both disease severity and mortality. A random effect analysis was conducted to estimate the standardized mean difference (SMD) and confidence intervals (CI).</p><p><strong>Results: </strong>Sixteen studies comprising 2010 COVID-19 patients and 982 controls were included. A diagnosis of COVID-19 was associated with significantly elevated IMA levels (Hedges's g = 1.02, 95% CI: 0.45 to 1.60) and reduced total thiol levels (Hedges's g = -1.08, 95% CI: -2.10 to -0.07). However, native thiol levels did not reveal a significant difference between infected patients and healthy participants. Subgroup analysis showed significantly lower total thiol levels in patients with critical and severe COVID-19, as well as lower native thiol levels specifically in critical COVID-19 patients. IMA levels were significantly higher across the critical, severe, and moderate COVID-19 groups.</p><p><strong>Conclusion: </strong>Elevated IMA and reduced thiol levels may serve as novel markers for predicting COVID-19 severity and prognosis. Further research is needed to explore therapeutic interventions that target oxidative imbalance in COVID-19 patients.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bas E Kellerhuis, Kevin Jenniskens, Mike P T Kusters, Ewoud Schuit, Lotty Hooft, Karel G M Moons, Johannes B Reitsma
{"title":"Expert panel as reference standard procedure in diagnostic accuracy studies: a systematic scoping review and methodological guidance.","authors":"Bas E Kellerhuis, Kevin Jenniskens, Mike P T Kusters, Ewoud Schuit, Lotty Hooft, Karel G M Moons, Johannes B Reitsma","doi":"10.1186/s41512-025-00195-7","DOIUrl":"10.1186/s41512-025-00195-7","url":null,"abstract":"<p><strong>Background: </strong>In diagnostic accuracy studies, when no reference standard test is available, a group of experts, combined in an expert panel, is often used to assess the presence of the target condition using multiple relevant pieces of patient information. Based on the expert panel's judgment, the accuracy of a test or model can be determined. Methodological choices in design and analysis of the expert panel procedure have been shown to vary considerably between studies as well as the quality of reporting. This review maps the current landscape of expert panels used as reference standard in diagnostic accuracy or model studies.</p><p><strong>Methods: </strong>PubMed was systematically searched for eligible studies published between June 1, 2012, and October 1, 2022. Data extraction was performed by one author and, in cases of doubt, checked by another author. Study characteristics, expert panel characteristics, and expert panel methodology were extracted. Articles were included if the diagnostic accuracy of an index test or diagnostic model was assessed using an expert panel as reference standard and the study was reported in English, Dutch, or German.</p><p><strong>Results: </strong>After initial identification of 4078 studies, 318 were included for data extraction. Expert panels were used across numerous medical domains, of which oncology was the most common (20%). The number of experts judging the presence of the target condition in each patient was 2 or fewer in 29%, 3 or 4 in 55%, and 5 or more in 16% of the 318 studies. Expert panel types used were an independent panel (i.e., each expert returns a judgement without conferring with other experts in the panel) in 33% of studies, a panel using a consensus method (i.e., each case was discussed by the expert panel) in 27%, a staged (i.e., each expert independently returns a judgement and discordant cases were discussed in a consensus meeting) target condition assessment approach in 11%, and a tiebreaker (i.e., each expert independently returns a judgement and discordant cases were assessed by another expert) in 8%. The exact expert panel decision approach was unclear or not reported in 21% of studies. In 5% of studies, information about remaining uncertainty in experts about the target condition presence or absence was collected for each participant.</p><p><strong>Conclusions: </strong>There is large heterogeneity in the composition of expert panels and the way that expert panels are used as reference standard in diagnostic research. Key methodological characteristics of expert panels are frequently not reported, making it difficult to replicate or reproduce results, and potentially masking biasing factors. There is a clear need for more guidance on how to perform an expert panel procedure and specific extensions of the STARD and TRIPOD reporting guidelines when using an expert panel.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Angelo Capodici, Claudio Fanconi, Catherine Curtin, Alessandro Shapiro, Francesca Noci, Alberto Giannoni, Tina Hernandez-Boussard
{"title":"A scoping review of machine learning models to predict risk of falls in elders, without using sensor data.","authors":"Angelo Capodici, Claudio Fanconi, Catherine Curtin, Alessandro Shapiro, Francesca Noci, Alberto Giannoni, Tina Hernandez-Boussard","doi":"10.1186/s41512-025-00190-y","DOIUrl":"https://doi.org/10.1186/s41512-025-00190-y","url":null,"abstract":"<p><strong>Objectives: </strong>This scoping review assesses machine learning (ML) tools that predicted falls, relying on information in health records without using any sensor data. The aim was to assess the available evidence on innovative techniques to improve fall prevention management.</p><p><strong>Methods: </strong>Studies were included if they focused on predicting fall risk with machine learning in elderly populations and were written in English. There were 13 different extracted variables, including population characteristics (community dwelling, inpatients, age range, main pathology, ethnicity/race). Furthermore, the number of variables used in the final models, as well as their type, was extracted.</p><p><strong>Results: </strong>A total of 6331 studies were retrieved, and 19 articles met criteria for data extraction. Metric performances reported by authors were commonly high in terms of accuracy (e.g., greater than 0.70). The most represented features included cardiovascular status and mobility assessments. Common gaps identified included a lack of transparent reporting and insufficient fairness assessments.</p><p><strong>Conclusions: </strong>This review provides evidence that falls can be predicted using ML without using sensors if the amount of data and its quality is adequate. However, further studies are needed to validate these models in diverse groups and populations.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054167/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144013018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can we develop real-world prognostic models using observational healthcare data? Large-scale experiment to investigate model sensitivity to database and phenotypes.","authors":"Jenna M Reps, Peter R Rijnbeek, Patrick B Ryan","doi":"10.1186/s41512-025-00191-x","DOIUrl":"https://doi.org/10.1186/s41512-025-00191-x","url":null,"abstract":"<p><strong>Background: </strong>Large observational healthcare databases are frequently used to develop models to be implemented in real-world clinical practice populations. For example, these databases were used to develop COVID severity models that guided interventions such as who to prioritize vaccinating during the pandemic. However, the clinical setting and observational databases often differ in the types of patients (case mix), and it is a nontrivial process to identify patients with medical conditions (phenotyping) in these databases. In this study, we investigate how sensitive a model's performance is to the choice of development database, population, and outcome phenotype.</p><p><strong>Methods: </strong>We developed > 450 different logistic regression models for nine prediction tasks across seven databases with a range of suitable population and outcome phenotypes. Performance stability within tasks was calculated by applying each model to data created by permuting the database, population, or outcome phenotype. We investigate performance (AUROC, scaled Brier, and calibration-in-the-large) stability and individual risk estimate stability.</p><p><strong>Results: </strong>In general, changing the outcome definitions or population phenotype made little impact on the model validation discrimination. However, validation discrimination was unstable when the database changed. Calibration and Brier performance were unstable when the population, outcome definition, or database changed. This may be problematic if a model developed using observational data is implemented in a real-world setting.</p><p><strong>Conclusions: </strong>These results highlight the importance of validating a model developed using observational data in the clinical setting prior to using it for decision-making. Calibration and Brier score should be evaluated to prevent miscalibrated risk estimates being used to aid clinical decisions.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004590/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144054684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip Heesen, Sebastian M Christ, Olga Ciobanu-Caraus, Abdullah Kahraman, Georg Schelling, Gabriela Studer, Beata Bode-Lesniewska, Bruno Fuchs
{"title":"Clinical prognostic models for sarcomas: a systematic review and critical appraisal of development and validation studies.","authors":"Philip Heesen, Sebastian M Christ, Olga Ciobanu-Caraus, Abdullah Kahraman, Georg Schelling, Gabriela Studer, Beata Bode-Lesniewska, Bruno Fuchs","doi":"10.1186/s41512-025-00186-8","DOIUrl":"10.1186/s41512-025-00186-8","url":null,"abstract":"<p><strong>Background: </strong>Current clinical guidelines recommend the use of clinical prognostic models (CPMs) for therapeutic decision-making in sarcoma patients. However, the number and quality of developed and externally validated CPMs is unknown. Therefore, we aimed to describe and critically assess CPMs for sarcomas.</p><p><strong>Methods: </strong>We performed a systematic review including all studies describing the development and/or external validation of a CPM for sarcomas. We searched the databases MEDLINE, EMBASE, Cochrane Central, and Scopus from inception until June 7th, 2022. The risk of bias was assessed using the prediction model risk of bias assessment tool (PROBAST).</p><p><strong>Results: </strong>Seven thousand six hundred fifty-six records were screened, of which 145 studies were eventually included, developing 182 and externally validating 59 CPMs. The most frequently modeled type of sarcoma was osteosarcoma (43/182; 23.6%), and the most frequently predicted outcome was overall survival (81/182; 44.5%). The most used predictors were the patient's age (133/182; 73.1%) and tumor size (116/182; 63.7%). Univariable screening was used in 137 (75.3%) CPMs, and only 7 (3.9%) CPMs were developed using pre-specified predictors based on clinical knowledge or literature. The median c-statistic on the development dataset was 0.74 (interquartile range [IQR] 0.71, 0.78). Calibration was reported for 142 CPMs (142/182; 78.0%). The median c-statistic of external validations was 0.72 (IQR 0.68-0.75). Calibration was reported for 46 out of 59 (78.0%) externally validated CPMs. We found 169 out of 241 (70.1%) CPMs to be at high risk of bias, mostly due to the high risk of bias in the analysis domain.</p><p><strong>Discussion: </strong>While various CPMs for sarcomas have been developed, the clinical utility of most of them is hindered by a high risk of bias and limited external validation. Future research should prioritise validating and updating existing well-developed CPMs over developing new ones to ensure reliable prognostic tools.</p><p><strong>Trial registration: </strong>PROSPERO CRD42022335222.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"7"},"PeriodicalIF":0.0,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11974052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lasai Barreñada, Paula Dhiman, Dirk Timmerman, Anne-Laure Boulesteix, Ben Van Calster
{"title":"Correction: Understanding overfitting in random forest for probability estimation: a visualization and simulation study.","authors":"Lasai Barreñada, Paula Dhiman, Dirk Timmerman, Anne-Laure Boulesteix, Ben Van Calster","doi":"10.1186/s41512-025-00189-5","DOIUrl":"10.1186/s41512-025-00189-5","url":null,"abstract":"","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11967119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143774953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew J Vickers, Ben Van Calster, Laure Wynants, Ewout W Steyerberg
{"title":"Correction: Decision curve analysis: confidence intervals and hypothesis testing for net benefit.","authors":"Andrew J Vickers, Ben Van Calster, Laure Wynants, Ewout W Steyerberg","doi":"10.1186/s41512-025-00188-6","DOIUrl":"10.1186/s41512-025-00188-6","url":null,"abstract":"","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"8"},"PeriodicalIF":0.0,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11956174/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura J Bonnett, Thomas Spain, Alexandra Hunt, Jane L Hutton, Victoria Watson, Anthony G Marson, John Blakey
{"title":"Guide to evaluating performance of prediction models for recurrent clinical events.","authors":"Laura J Bonnett, Thomas Spain, Alexandra Hunt, Jane L Hutton, Victoria Watson, Anthony G Marson, John Blakey","doi":"10.1186/s41512-025-00187-7","DOIUrl":"10.1186/s41512-025-00187-7","url":null,"abstract":"<p><strong>Background: </strong>Many chronic conditions, such as epilepsy and asthma, are typified by recurrent events-repeated acute deterioration events of a similar type. Statistical models for these conditions often focus on evaluating the time to the first event. They therefore do not make use of data available on all events. Statistical models for recurrent events exist, but it is not clear how best to evaluate their performance. We compare the relative performance of statistical models for analysing recurrent events for epilepsy and asthma.</p><p><strong>Methods: </strong>We studied two clinical exemplars of common and infrequent events: asthma exacerbations using the Optimum Patient Clinical Research Database, and epileptic seizures using data from the Standard versus New Antiepileptic Drug Study. In both cases, count-based models (negative binomial and zero-inflated negative binomial) and variants on the Cox model (Andersen-Gill and Prentice, Williams and Peterson) were used to assess the risk of recurrence (of exacerbations or seizures respectively). Performance of models was evaluated via numerical (root mean square prediction error, mean absolute prediction error, and prediction bias) and graphical (calibration plots and Bland-Altman plots) approaches.</p><p><strong>Results: </strong>The performance of the prediction models for asthma and epilepsy recurrent events could be evaluated via the selected numerical and graphical measures. For both the asthma and epilepsy exemplars, the Prentice, Williams and Peterson model showed the closest agreement between predicted and observed outcomes.</p><p><strong>Conclusion: </strong>Inappropriate models can lead to incorrect conclusions which disadvantage patients. Therefore, prediction models for outcomes associated with chronic conditions should include all repeated events. Such models can be evaluated via the promoted numerical and graphical approaches alongside modified calibration measures.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"9 1","pages":"6"},"PeriodicalIF":0.0,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}