Manuel Hecht, Anette Blümle, Harald Binder, Martin Schumacher, Nadine Binder
{"title":"Investigator-initiated versus industry-sponsored trials - visibility and relevance of randomized controlled trials in clinical practice guidelines (IMPACT).","authors":"Manuel Hecht, Anette Blümle, Harald Binder, Martin Schumacher, Nadine Binder","doi":"10.1186/s12874-025-02535-z","DOIUrl":"10.1186/s12874-025-02535-z","url":null,"abstract":"<p><strong>Background: </strong>The goal of evidence-based medicine is to make clinical decisions based on the best available, relevant evidence. For this to be possible, studies such as randomized controlled trials (RCTs), which are widely considered to provide the best evidence of all forms of primary research, must be visible and have an impact on clinical practice guidelines. We further investigated the impact of publicly and commercially sponsored RCTs on clinical practice guidelines by measuring direct and indirect impactful citations and the time to guideline impact.</p><p><strong>Methods: </strong>We considered the sample from the IMPACT study, where a total of 691 RCTs (120 German investigator-initiated trials (IITs), 200 international IITs, 171 German industry-sponsored trials (ISTs) and 200 international ISTs) was sampled from registries (DFG-/BMBF-Websites, the German Clinical Trials Register, and from ClinicalTrials.gov) and followed prospectively. First, all eligible IITs were sampled. Then, ISTs were randomly selected while ensuring balance across certain trial characteristics. Next, the corresponding publications in the form of original research articles were identified. A search was then conducted for (1) systematic reviews (SRs) citing these articles and (2) clinical practice guidelines (CPGs) that cited either the original articles or the SRs. The methods and results of this effort were already published. In this investigation we aimed to better characterize the impact of RCTs in CPGs. Therefore, we identified all citations of the original articles and SRs in the citing CPGs and classified them into impactful and non-impactful. This allowed us to calculate an estimate for the guideline impact of a trial. In addition, we estimated the time-to-guideline-impact, defined as the time to a direct and indirect impactful citation in a CPG. Direct means that the publication of a trial was cited in the main text of a CPG. Indirect means that the publication was cited and included in the findings of a SR and the SR was cited in the main text of a CPG. We also investigated to what extent pre-defined study characteristics influenced the guideline impact using multivariable negative binomial regression as well as the time-to-guideline impact using multivariable Cox proportional hazards regression.</p><p><strong>Results: </strong>Overall, 22% of RCTs impacted a CPG. For international ISTs, only 15% of trials had an impact in CPGs. Overall, of the 405 associated guidelines, 331 were impacted. Larger trials were associated with more impactful main text citations in CPGs and earlier time-to-guideline impact, while international industry-sponsored trials were associated with smaller impact on CPGs and longer time-to-guideline impact. IITs funded by governmental bodies in Germany reached an impact on CPGs that is on par with German ISTs or international IITs and ISTs.</p><p><strong>Conclusion: </strong>This study demonstrated that a considerable n","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"80"},"PeriodicalIF":3.9,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felicia R Carey, Elaine Y Hu, Nicole Stamas, Amber Seelig, Lynne Liu, Aaron Schneiderman, William Culpepper, Rudolph P Rull, Edward J Boyko
{"title":"Comparison of health measures between survey self-reports and electronic health records among Millennium Cohort Study participants receiving Veterans Health Administration care.","authors":"Felicia R Carey, Elaine Y Hu, Nicole Stamas, Amber Seelig, Lynne Liu, Aaron Schneiderman, William Culpepper, Rudolph P Rull, Edward J Boyko","doi":"10.1186/s12874-025-02529-x","DOIUrl":"10.1186/s12874-025-02529-x","url":null,"abstract":"<p><strong>Background: </strong>Surveys are a useful tool for eliciting self-reported health information, but the accuracy of such information may vary. We examined the agreement between self-reported health information and medical record data among 116,288 military service members and veterans enrolled in a longitudinal cohort.</p><p><strong>Methods: </strong>Millennium Cohort Study participants who separated from service and registered for health care in the Veterans Health Administration (VHA) by September 18, 2020, were eligible for inclusion. Baseline and follow-up survey responses (2001-2016) about 39 medical conditions, health behaviors, height, and weight were compared with analogous information from VHA and military medical records. Medical record diagnoses were classified as one qualifying ICD code in any diagnostic position between October 1, 1999, and September 18, 2020. Additional analyses were restricted to medical record diagnoses occurring before survey self-report and using specific diagnostic criteria (two outpatient or one inpatient ICD code). Positive, negative, and overall (Youden's J) agreement was calculated for categorical outcomes; Bland-Altman plots were examined for continuous measures.</p><p><strong>Results: </strong>Among 116,288 participants, 71.8% self-reported a diagnosed medical condition. Negative agreement between self-reported and VHA medical record diagnoses was > 90% for most (80%) conditions, but positive agreement was lower (6.4% to 56.3%). Mental health conditions were more frequently recorded in medical records, while acute conditions (e.g., bladder infections) were self-reported at a higher frequency. Positive agreement was lower when analyses were restricted to medical record diagnoses occurring prior to survey self-report. Specific diagnostic criteria resulted in higher overall agreement.</p><p><strong>Conclusions: </strong>While negative agreement between self-reported and medical record diagnoses was high in this population, positive and overall agreement were not strong and varied considerably by health condition. Though the limitations of survey-reported health conditions should be considered, using multiple data sources to examine health outcomes in this population may have utility for research, clinical planning, or public health interventions.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"81"},"PeriodicalIF":3.9,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143728527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lydia Kakampakou, Jonathan Stokes, Andreas Hoehn, Marc de Kamps, Wiktoria Lawniczak, Kellyn F Arnold, Elizabeth M A Hensor, Alison J Heppenstall, Mark S Gilthorpe
{"title":"Simulating hierarchical data to assess the utility of ecological versus multilevel analyses in obtaining individual-level causal effects.","authors":"Lydia Kakampakou, Jonathan Stokes, Andreas Hoehn, Marc de Kamps, Wiktoria Lawniczak, Kellyn F Arnold, Elizabeth M A Hensor, Alison J Heppenstall, Mark S Gilthorpe","doi":"10.1186/s12874-025-02504-6","DOIUrl":"10.1186/s12874-025-02504-6","url":null,"abstract":"<p><p>Understanding causality, over mere association, is vital for researchers wishing to inform policy and decision making - for example, when seeking to improve population health outcomes. Yet, contemporary causal inference methods have not fully tackled the complexity of data hierarchies, such as the clustering of people within households, neighbourhoods, cities, or regions. However, complex data hierarchies are the rule rather than the exception. Gaining an understanding of these hierarchies is important for complex population outcomes, such as non-communicable disease, which is impacted by various social determinants at different levels of the data hierarchy. The alternative of analysing aggregated data could introduce well-known biases, such as the ecological fallacy or the modifiable areal unit problem. We devise a hierarchical causal diagram that encodes the multilevel data generating mechanism anticipated when evaluating non-communicable diseases in a population. The causal diagram informs data simulation. We also provide a flexible tool to generate synthetic population data that captures all multilevel causal structures, including a cross-level effect due to cluster size. For the very first time, we can then quantify the ecological fallacy within a formal causal framework to show that individual-level data are essential to assess causal relationships that affect the individual. This study also illustrates the importance of causally structured synthetic data for use with other methods, such as Agent Based Modelling or Microsimulation Modelling. Many methodological challenges remain for robust causal evaluation of multilevel data, but this study provides a foundation to investigate these.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"79"},"PeriodicalIF":3.9,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11929225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143691002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Vittoria Chiaruttini, Giulia Lorenzoni, Dario Gregori
{"title":"Bayesian dynamic borrowing in group-sequential design for medical device studies.","authors":"Maria Vittoria Chiaruttini, Giulia Lorenzoni, Dario Gregori","doi":"10.1186/s12874-025-02520-6","DOIUrl":"10.1186/s12874-025-02520-6","url":null,"abstract":"<p><strong>Background: </strong>The integration of historical data into ongoing clinical trials through Bayesian Dynamic Borrowing offers significant advantages, including reduced sample size, trial duration, and associated costs. However, challenges such as ensuring exchangeability between historical and current data and mitigating Type I error inflation remain critical. This study proposes a Bayesian group-sequential design incorporating a Self-Adaptive Mixture (SAM) prior framework to address these challenges in medical device trials.</p><p><strong>Methods: </strong>The SAM prior combines informative priors derived from historical data with weakly informative priors, dynamically adjusting the weight of historical information based on congruence with current trial data. The design includes interim analyses, with Bayesian decision rules leveraging futility and efficacy boundaries derived using the frequentist spending functions. Effective Sample Size calculations informed adjustments to sample size and allocation ratios between experimental and control arms at each interim. The methodology was evaluated using a motivating example from a cardiovascular device trial with a noninferiority hypothesis.</p><p><strong>Results: </strong>Four historical studies with substantial heterogeneity were incorporated. The SAM prior showed improved adaptation to prior-data conflicts compared to static methods, maintaining Type I error and Power at their nominal levels. In the motivating trial, the MAP prior was approximated as a mixture of beta distributions, facilitating congruence testing and posterior inference. Simulation studies confirmed the proposed design's efficiency under both congruent and incongruent scenarios.</p><p><strong>Conclusions: </strong>The proposed Bayesian Group-Sequential Design with SAM prior offers a robust, adaptive framework for medical device trials, balancing statistical rigor with clinical interpretability. This approach enhances decision-making and supports timely, cost-effective evaluations, particularly in dynamic contexts like medical device development.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"78"},"PeriodicalIF":3.9,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florence J Breslin, Erin L Ratliff, Zsofia P Cohen, Julie M Croff, Kara L Kerr
{"title":"Measuring adversity in the ABCD® Study: systematic review and recommendations for best practices.","authors":"Florence J Breslin, Erin L Ratliff, Zsofia P Cohen, Julie M Croff, Kara L Kerr","doi":"10.1186/s12874-025-02521-5","DOIUrl":"10.1186/s12874-025-02521-5","url":null,"abstract":"<p><strong>Background: </strong>Early life adversity (ELA) has substantial, lifelong impacts on mental and physical health and development. Data from the ABCD® Study will provide essential insights into these effects. Because the study lacks a unified adversity assessment, our objective was to use a critical, human-driven approach to identify variables that fit ELA domains measured in this study.</p><p><strong>Methods: </strong>We clarify best practices in measurement of adversity in the ABCD Study through the creation of adversity scores based on the well-established Adverse Childhood Experiences (ACEs) questionnaire and another inclusive of broader ELA. Variables previously used to measure adversity in the ABCD dataset were determined via literature review. We assessed each variable to identify its utility in measuring domains of adversity at baseline and follow-up time points and by individual completing the assessment (i.e., youth or caregiver). Variables were selected that align with decades of ELA measurement, and therefore, can be used by research teams as measures of ELA.</p><p><strong>Results: </strong>The literature review and critical analysis of items led to the development of three measures of ELA: an ACES-proxy score, a youth-reported ACEs-proxy score, and a broader ELA score (ELA<sup>+</sup>). We provide code using R to calculate these scores and their constituent domains for use in future ABCD adversity-related research.</p><p><strong>Conclusions: </strong>The ABCD Study is one of the largest longitudinal studies of youth development, with data available for secondary analysis. Our review of existing measures and development of a coding schema will allow examination of ELA using this dataset, informing our understanding of risk, resilience, and prevention.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"77"},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathan Bernard, Yoshimasa Sagawa, Nathalie Bier, Thomas Lihoreau, Lionel Pazart, Thomas Tannou
{"title":"Using artificial intelligence for systematic review: the example of elicit.","authors":"Nathan Bernard, Yoshimasa Sagawa, Nathalie Bier, Thomas Lihoreau, Lionel Pazart, Thomas Tannou","doi":"10.1186/s12874-025-02528-y","DOIUrl":"10.1186/s12874-025-02528-y","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) tools are increasingly being used to assist researchers with various research tasks, particularly in the systematic review process. Elicit is one such tool that can generate a summary of the question asked, setting it apart from other AI tools. The aim of this study is to determine whether AI-assisted research using Elicit adds value to the systematic review process compared to traditional screening methods.</p><p><strong>Methods: </strong>We compare the results from an umbrella review conducted independently of AI with the results of the AI-based searching using the same criteria. Elicit contribution was assessed based on three criteria: repeatability, reliability and accuracy. For repeatability the search process was repeated three times on Elicit (trial 1, trial 2, trial 3). For accuracy, articles obtained with Elicit were reviewed using the same inclusion criteria as the umbrella review. Reliability was assessed by comparing the number of publications with those without AI-based searches.</p><p><strong>Results: </strong>The repeatability test found 246,169 results and 172 results for the trials 1, 2, and 3 respectively. Concerning accuracy, 6 articles were included at the conclusion of the selection process. Regarding, revealed 3 common articles, 3 exclusively identified by Elicit and 17 exclusively identified by the AI-independent umbrella review search.</p><p><strong>Conclusion: </strong>Our findings suggest that AI research assistants, like Elicit, can serve as valuable complementary tools for researchers when designing or writing systematic reviews. However, AI tools have several limitations and should be used with caution. When using AI tools, certain principles must be followed to maintain methodological rigour and integrity. Improving the performance of AI tools such as Elicit and contributing to the development of guidelines for their use during the systematic review process will enhance their effectiveness.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"75"},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danielle K Nagy, Lauren C Bresee, Dean T Eurich, Scot H Simpson
{"title":"Evaluating methods to define place of residence in Canadian administrative data and the impact on observed associations with all-cause mortality in type 2 diabetes.","authors":"Danielle K Nagy, Lauren C Bresee, Dean T Eurich, Scot H Simpson","doi":"10.1186/s12874-025-02531-3","DOIUrl":"10.1186/s12874-025-02531-3","url":null,"abstract":"<p><strong>Purpose: </strong>An individual's location of residence may impact health, however, health services and outcomes research generally use a single point in time to define where an individual resides. While this estimate of residence becomes inaccurate when the study subject moves, the impact on observed associations is not known. This study quantifies the impact of different methods to define residence (rural, urban, metropolitan) on the association with all-cause mortality.</p><p><strong>Methods: </strong>A diabetes cohort of new metformin users was identified from administrative data in Alberta, Canada between 2008 and 2019. An individual's residence (rural/urban/metropolitan) was defined from postal codes using 4 different methods: residence defined at 1-year before first metformin (this served as the reference model), comparison 1- stable residence for 3 years before first metformin, comparison 2- residence as time-varying (during the outcome observation window), and comparison 3 - nested case control (residence closest to the index date after identifying cases and controls). Multivariable Cox proportional hazard and logistic regression models were constructed to examine the association between residence definitions and all-cause mortality.</p><p><strong>Results: </strong>We identified 157,146 new metformin users (mean age of 55 years and 57% male) and 8,444 (5%) deaths occurred during the mean follow up of 4.7 (SD 2.3) years. There were few instances of moving after first metformin; 2.6% of individuals moved to a smaller centre (metropolitan to urban or rural, or urban to rural) and 3.1% moved to a larger centre (rural to urban or metropolitan, or urban to metropolitan). The association between rural residence and all-cause mortality was consistent (aHR:1.18; 95%CI:1.12-1.24), regardless of the method used to define residence.</p><p><strong>Conclusions: </strong>The method used to define residence in a population of adults newly treated with metformin for type 2 diabetes has minimal impact on measures of all-cause mortality, possibly due to infrequent migration. The observed association between residence and mortality is compelling but requires further investigation and more robust analysis.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"76"},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sample size recalculation based on the overall success rate in a randomized test-treatment trial with restricting randomization to discordant pairs.","authors":"Caroline Elzner, Amra Pepić, Oke Gerke, Antonia Zapf","doi":"10.1186/s12874-024-02410-3","DOIUrl":"10.1186/s12874-024-02410-3","url":null,"abstract":"<p><strong>Background: </strong>Randomized test-treatment studies are performed to evaluate the clinical effectiveness of diagnostic tests by assessing patient-relevant outcomes. The assumptions for a sample size calculation for such studies are often uncertain.</p><p><strong>Methods: </strong>An adaptive design with a blinded sample size recalculation based on the overall success rate in a randomized test-treatment trial with restricting randomization to discordant pairs is proposed and evaluated by a simulation study. The results of the adaptive design are compared to those of the fixed design.</p><p><strong>Results: </strong>The empirical type I error rate is sufficiently controlled in the adaptive design as well as in the fixed design and the estimates are unbiased. The adaptive design achieves the desired theoretical power, whereas the fixed design tends to be over- or under-powered.</p><p><strong>Conclusions: </strong>It may be advisable to consider blinded recalculation of sample size in a randomized test-treatment study with restriction of randomization to discordant pairs in order to improve the conduct of the study. However, there are a number of study-related limitations that affect the implementation of the method which need to be considered.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"74"},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921670/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Sakhawat Hossain, Ravi Goyal, Natasha K Martin, Victor DeGruttola, Mohammad Mihrab Chowdhury, Christopher McMahan, Lior Rennert
{"title":"A flexible framework for local-level estimation of the effective reproductive number in geographic regions with sparse data.","authors":"Md Sakhawat Hossain, Ravi Goyal, Natasha K Martin, Victor DeGruttola, Mohammad Mihrab Chowdhury, Christopher McMahan, Lior Rennert","doi":"10.1186/s12874-025-02525-1","DOIUrl":"10.1186/s12874-025-02525-1","url":null,"abstract":"<p><strong>Background: </strong>Our research focuses on local-level estimation of the effective reproductive number, which describes the transmissibility of an infectious disease and represents the average number of individuals one infectious person infects at a given time. The ability to accurately estimate the infectious disease reproductive number in geographically granular regions is critical for disaster planning and resource allocation. However, not all regions have sufficient infectious disease outcome data; this lack of data presents a significant challenge for accurate estimation.</p><p><strong>Methods: </strong>To overcome this challenge, we propose a two-step approach that incorporates existing [Formula: see text] estimation procedures (EpiEstim, EpiFilter, EpiNow2) using data from geographic regions with sufficient data (step 1), into a covariate-adjusted Bayesian Integrated Nested Laplace Approximation (INLA) spatial model to predict [Formula: see text] in regions with sparse or missing data (step 2). Our flexible framework effectively allows us to implement any existing estimation procedure for [Formula: see text] in regions with coarse or entirely missing data. We perform external validation and a simulation study to evaluate the proposed method and assess its predictive performance.</p><p><strong>Results: </strong>We applied our method to estimate [Formula: see text]using data from South Carolina (SC) counties and ZIP codes during the first COVID-19 wave ('Wave 1', June 16, 2020 - August 31, 2020) and the second wave ('Wave 2', December 16, 2020 - March 02, 2021). Among the three methods used in the first step, EpiNow2 yielded the highest accuracy of [Formula: see text] prediction in the regions with entirely missing data. Median county-level percentage agreement (PA) was 90.9% (Interquartile Range, IQR: 89.9-92.0%) and 92.5% (IQR: 91.6-93.4%) for Wave 1 and 2, respectively. Median zip code-level PA was 95.2% (IQR: 94.4-95.7%) and 96.5% (IQR: 95.8-97.1%) for Wave 1 and 2, respectively. Using EpiEstim, EpiFilter, and an ensemble-based approach yielded median PA ranging from 81.9 to 90.0%, 87.2-92.1%, and 88.4-90.9%, respectively, across both waves and geographic granularities.</p><p><strong>Conclusion: </strong>These findings demonstrate that the proposed methodology is a useful tool for small-area estimation of [Formula: see text], as our flexible framework yields high prediction accuracy for regions with coarse or missing data.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"73"},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11917005/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving patient clustering by incorporating structured variable label relationships in similarity measures.","authors":"Judith Lambert, Anne-Louise Leutenegger, Anaïs Baudot, Anne-Sophie Jannot","doi":"10.1186/s12874-025-02459-8","DOIUrl":"10.1186/s12874-025-02459-8","url":null,"abstract":"<p><strong>Background: </strong>Patient stratification is the cornerstone of numerous health investigations, serving to enhance the estimation of treatment efficacy and facilitating patient matching. To stratify patients, similarity measures between patients can be computed from clinical variables contained in medical health records. These variables have both values and labels structured in ontologies or other classification systems. The relevance of considering variable label relationships in the computation of patient similarity measures has been poorly studied.</p><p><strong>Objective: </strong>We adapt and evaluate several weighted versions of the Cosine similarity in order to consider structured label relationships to compute patient similarities from a medico-administrative database.</p><p><strong>Materials and methods: </strong>As a use case, we clustered patients aged 60 years from their annual medicine reimbursements contained in the Échantillon Généraliste des Bénéficiaires, a random sample of a French medico-administrative database. We used four patient similarity measures: the standard Cosine similarity, a weighted Cosine similarity measure that includes variable frequencies and two weighted Cosine similarity measures that consider variable label relationships. We construct patient networks from each similarity measure and identify clusters of patients using the Markov Cluster algorithm. We evaluate the performance of the different similarity measures with enrichment tests based on patient diagnoses.</p><p><strong>Results: </strong>The weighted similarity measures that include structured variable label relationships perform better to identify similar patients. Indeed, using these weighted measures, we identify more clusters associated with different diagnose enrichment. Importantly, the enrichment tests provide clinically interpretable insights into these patient clusters.</p><p><strong>Conclusion: </strong>Considering label relationships when computing patient similarities improves stratification of patients regarding their health status.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"25 1","pages":"72"},"PeriodicalIF":3.9,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910865/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143633548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}