Amani Al Tawil, Sean McGrath, Robin Ristl, Ulrich Mansmann
{"title":"Addressing treatment switching in the ALTA-1L trial with g-methods: exploring the impact of model specification.","authors":"Amani Al Tawil, Sean McGrath, Robin Ristl, Ulrich Mansmann","doi":"10.1186/s12874-024-02437-6","DOIUrl":"10.1186/s12874-024-02437-6","url":null,"abstract":"<p><strong>Background: </strong>Treatment switching in randomized clinical trials introduces challenges in performing causal inference. Intention To Treat (ITT) analyses often fail to fully capture the causal effect of treatment in the presence of treatment switching. Consequently, decision makers may instead be interested in causal effects of hypothetical treatment strategies that do not allow for treatment switching. For example, the phase 3 ALTA-1L trial showed that brigatinib may have improved Overall Survival (OS) compared to crizotinib if treatment switching had not occurred. Their sensitivity analysis using Inverse Probability of Censoring Weights (IPCW), reported a Hazard Ratio (HR) of 0.50 (95% CI, 0.28-0.87), while their initial ITT analysis estimated an HR of 0.81 (0.53-1.22).</p><p><strong>Methods: </strong>We used a directed acyclic graph to depict the clinical setting of the ALTA-1L trial in the presence of treatment switching, illustrating the concept of treatment-confounder feedback and highlighting the need for g-methods. In a re-analysis of the ALTA-1L trial data, we used IPCW and the parametric g-formula to adjust for baseline and time-varying covariates to estimate the effect of two hypothetical treatment strategies on OS: \"always treat with brigatinib\" versus \"always treat with crizotinib\". We conducted various sensitivity analyses using different model specifications and weight truncation approaches.</p><p><strong>Results: </strong>Applying the IPCW approach in a series of sensitivity analyses yielded Cumulative HRs (cHRs) ranging between 0.38 (0.12, 0.98) and 0.73 (0.45,1.22) and Risk Ratios (RRs) ranging between 0.52 (0.32, 0.98) and 0.79 (0.54,1.17). Applying the parametric g-formula resulted in cHRs ranging between 0.61 (0.38,0.91) and 0.72 (0.43,1.07) and RRs ranging between 0.71 (0.48,0.94) and 0.79 (0.54,1.05).</p><p><strong>Conclusion: </strong>Our results consistently indicated that our estimated ITT effect estimate (cHR: 0.82 (0.51,1.22) may have underestimated brigatinib's benefit by around 10-45 percentage points (using IPCW) and 10-20 percentage points (using the parametric g-formula) across a wide range of model choices. Our analyses underscore the importance of performing sensitivity analyses, as the result from a single analysis could potentially stand as an outlier in a whole range of sensitivity analyses.</p><p><strong>Trial registration: </strong>Clinicaltrials.gov Identifier: NCT02737501 on April 14, 2016.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"314"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660711/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alice J Sitch, Jacqueline Dinnes, Jenny Hewison, Walter Gregory, Julie Parkes, Jonathan J Deeks
{"title":"Optimising research investment by simulating and evaluating monitoring strategies to inform a trial: a simulation of liver fibrosis monitoring.","authors":"Alice J Sitch, Jacqueline Dinnes, Jenny Hewison, Walter Gregory, Julie Parkes, Jonathan J Deeks","doi":"10.1186/s12874-024-02425-w","DOIUrl":"10.1186/s12874-024-02425-w","url":null,"abstract":"<p><strong>Background: </strong>The aim of the study was to investigate the development of evidence-based monitoring strategies in a population with progressive or recurrent disease. A simulation study of monitoring strategies using a new biomarker (ELF) for the detection of liver cirrhosis in people with known liver fibrosis was undertaken alongside a randomised controlled trial (ELUCIDATE).</p><p><strong>Methods: </strong>Existing data and expert opinion were used to estimate the progression of disease and the performance of repeat testing with ELF. Knowledge of the true disease status in addition to the observed test results for a cohort of simulated patients allowed various monitoring strategies to be implemented, evaluated and validated against trial data.</p><p><strong>Results: </strong>Several monitoring strategies ranging in complexity were successfully modelled and compared regarding the timing of detection of disease, the duration of monitoring, and the predictive value of a positive test result. The results of sensitivity analysis showed the importance of accurate data to inform the simulation. Results of the simulation were similar to those from the trial.</p><p><strong>Conclusion: </strong>Monitoring data can be simulated and strategies compared given adequate knowledge of disease progression and test performance. Such exercises should be carried out to ensure optimal strategies are evaluated in trials thus reducing research waste. Monitoring data can be generated and monitoring strategies can be assessed if data is available on the monitoring test performance and the test variability. This work highlights the data necessary and the general method for evaluating the performance of monitoring strategies, allowing appropriate strategies to be selected for evaluation. Modelling work should be conducted prior to full scale investigation of monitoring strategies, allowing optimal monitoring strategies to be assessed.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"315"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myeonggyun Lee, Andrea B Troxel, Sophia Kwon, George Crowley, Theresa Schwartz, Rachel Zeig-Owens, David J Prezant, Anna Nolan, Mengling Liu
{"title":"Partial-linear single-index Cox regression models with multiple time-dependent covariates.","authors":"Myeonggyun Lee, Andrea B Troxel, Sophia Kwon, George Crowley, Theresa Schwartz, Rachel Zeig-Owens, David J Prezant, Anna Nolan, Mengling Liu","doi":"10.1186/s12874-024-02434-9","DOIUrl":"10.1186/s12874-024-02434-9","url":null,"abstract":"<p><strong>Background: </strong>In cohort studies with time-to-event outcomes, covariates of interest often have values that change over time. The classical Cox regression model can handle time-dependent covariates but assumes linear effects on the log hazard function, which can be limiting in practice. Furthermore, when multiple correlated covariates are studied, it is of great interest to model their joint effects by allowing a flexible functional form and to delineate their relative contributions to survival risk.</p><p><strong>Methods: </strong>Motivated by the World Trade Center (WTC)-exposed Fire Department of New York cohort study, we proposed a partial-linear single-index Cox (PLSI-Cox) model to investigate the effects of repeatedly measured metabolic syndrome indicators on the risk of developing WTC lung injury associated with particulate matter exposure. The PLSI-Cox model reduces the dimensionality of covariates while providing interpretable estimates of their effects. The model's flexible link function accommodates nonlinear effects on the log hazard function. We developed an iterative estimation algorithm using spline techniques to model the nonparametric single-index component for potential nonlinear effects, followed by maximum partial likelihood estimation of the parameters.</p><p><strong>Results: </strong>Extensive simulations showed that the proposed PLSI-Cox model outperformed the classical time-dependent Cox regression model when the true relationship was nonlinear. When the relationship was linear, both the PLSI-Cox model and classical time-dependent Cox regression model performed similarly. In the data application, we found a possible nonlinear joint effect of metabolic syndrome indicators on survival risk. Among the different indicators, BMI had the largest positive effect on the risk of developing lung injury, followed by triglycerides.</p><p><strong>Conclusion: </strong>The PLSI-Cox models allow for the evaluation of nonlinear effects of covariates and offer insights into their relative importance and direction. These methods provide a powerful set of tools for analyzing data with multiple time-dependent covariates and survival outcomes, potentially offering valuable insights for both current and future studies.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"311"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiong Zou, Borui Chen, Yang Zhang, Xi Wu, Yi Wan, Changsheng Chen
{"title":"Mixed-effects neural network modelling to predict longitudinal trends in fasting plasma glucose.","authors":"Qiong Zou, Borui Chen, Yang Zhang, Xi Wu, Yi Wan, Changsheng Chen","doi":"10.1186/s12874-024-02442-9","DOIUrl":"10.1186/s12874-024-02442-9","url":null,"abstract":"<p><strong>Background: </strong>Accurate fasting plasma glucose (FPG) trend prediction is important for management and treatment of patients with type 2 diabetes mellitus (T2DM), a globally prevalent chronic disease. (Generalised) linear mixed-effects (LME) models and machine learning (ML) are commonly used to analyse longitudinal data; however, the former is insufficient for dealing with complex, nonlinear data, whereas with the latter, random effects are ignored. The aim of this study was to develop LME, back propagation neural network (BPNN), and mixed-effects NN models that combine the 2 to predict FPG levels.</p><p><strong>Methods: </strong>Monitoring data from 779 patients with T2DM from a multicentre, prospective study from the shared platform Figshare repository were divided 80/20 into training/test sets. The first 10 important features were modelled via random forest (RF) screening. First, an LME model was built to model interindividual differences, analyse the factors affecting FPG levels, compare the AIC and BIC values to screen the optimal model, and predict FPG levels. Second, multiple BPNN models were constructed via different variable sets to screen the optimal BPNN. Finally, an LME/BPNN combined model, named LMENN, was constructed via stacking integration. A 10-fold cross-validation cycle was performed using the training set to build the model and evaluate its performance, and then the final model was evaluated on the test set.</p><p><strong>Results: </strong>The top 10 variables screened by RF were HOMA-β, HbA1c, HOMA-IR, urinary sugar, insulin, BMI, waist circumference, weight, age, and group. The best-fitting random-intercept mixed-effects (lm22) model showed that each patient's baseline glucose levels influenced subsequent glucose measurements, but the trend over time was consistent. The LMENN model combines the strengths of LME and BPNN and accounts for random effects. The RMSE of the LMENN model ranges were 0.447-0.471 (training set), 0.525-0.552 (validation set), and 0.511-0.565 (test set). It improves the prediction performance of the single LME and BPNN models and shows some advantages in predicting FPG levels.</p><p><strong>Conclusions: </strong>The LMENN model built by integrating LME and BPNN has several potential applications in analysing longitudinal FPG monitoring data. This study provides new ideas and methods for further research in the field of blood glucose prediction.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"313"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junlong Ma, Heng Chen, Ji Sun, Juanjuan Huang, Gefei He, Guoping Yang
{"title":"Efficient analysis of drug interactions in liver injury: a retrospective study leveraging natural language processing and machine learning.","authors":"Junlong Ma, Heng Chen, Ji Sun, Juanjuan Huang, Gefei He, Guoping Yang","doi":"10.1186/s12874-024-02443-8","DOIUrl":"10.1186/s12874-024-02443-8","url":null,"abstract":"<p><strong>Background: </strong>Liver injury from drug-drug interactions (DDIs), notably with anti-tuberculosis drugs such as isoniazid, poses a significant safety concern. Electronic medical records contain comprehensive clinical information and have gained increasing attention as a potential resource for DDI detection. However, a substantial portion of adverse drug reaction (ADR) information is hidden in unstructured narrative text, which has yet to be efficiently harnessed, thereby introducing bias into the research. There is a significant need for an efficient framework for the DDI assessment.</p><p><strong>Methods: </strong>Using a Chinese natural language processing (NLP) model, we extracted 25,130 adverse drug reaction (ADR) records, dividing them into sets for training an automated normalization model. The trained models, in conjunction with liver function laboratory tests, were used to thoroughly and efficiently identify liver injury cases. Ultimately, we applied a case-control study design to detect DDI signals increasing isoniazid's liver injury risk.</p><p><strong>Results: </strong>The Logistic Regression model demonstrated stable and superior performance in classification task. Based on laboratory criteria and NLP, we identified 128 liver injury cases among a cohort of 3,209 patients treated with isoniazid. Preliminary screening of 113 drug combinations with isoniazid highlighted 20 potential signal drugs, with antibacterials constituting 25%. Sensitivity analysis confirmed the robustness of signal drugs, especially in cardiac therapy and antibacterials.</p><p><strong>Conclusion: </strong>Our NLP and machine learning approach effectively identifies isoniazid-related DDIs that increase the risk of liver injury, identifying 20 signal drugs, mainly antibacterials. Further research is required to validate these DDI signals.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"312"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophie Vanbelle, Christina Hernandez Engelhart, Ellen Blix
{"title":"A comprehensive guide to study the agreement and reliability of multi-observer ordinal data.","authors":"Sophie Vanbelle, Christina Hernandez Engelhart, Ellen Blix","doi":"10.1186/s12874-024-02431-y","DOIUrl":"10.1186/s12874-024-02431-y","url":null,"abstract":"<p><strong>Background: </strong>A recent systematic review revealed issues in regard to performing and reporting agreement and reliability studies for ordinal scales, especially in the presence of more than two observers. This paper therefore aims to provide all necessary information in regard to the choice among the most meaningful and most used measures and the planning of agreement and reliability studies for ordinal outcomes.</p><p><strong>Methods: </strong>This paper considers the generalisation of the proportion of (dis)agreement, the mean absolute deviation, the mean squared deviation and weighted kappa coefficients to more than two observers in the presence of an ordinal outcome.</p><p><strong>Results: </strong>After highlighting the difference between the concepts of agreement and reliability, a clear and simple interpretation of the agreement and reliability coefficients is provided. The large sample variance of the various coefficients with the delta method is presented or derived if not available in the literature to construct Wald confidence intervals. Finally, a procedure to determine the minimum number of raters and patients needed to limit the uncertainty associated with the sampling process is provided. All the methods are available in an R package and a Shiny application to circumvent the limitations of current software.</p><p><strong>Conclusions: </strong>The present paper completes existing guidelines, such as the Guidelines for Reporting Reliability and Agreement Studies (GRRAS), to improve the quality of reliability and agreement studies of clinical tests. Furthermore, we provide open source software to researchers with minimum programming skills.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"310"},"PeriodicalIF":3.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikkel Schou Andersen, Mikkel Seremet Kofoed, Asger Sand Paludan-Müller, Christian Bonde Pedersen, Tiit Mathiesen, Christian Mawrin, Birgitte Brinkmann Olsen, Bo Halle, Frantz Rom Poulsen
{"title":"CRIME-Q-a unifying tool for critical appraisal of methodological (technical) quality, quality of reporting and risk of bias in animal research.","authors":"Mikkel Schou Andersen, Mikkel Seremet Kofoed, Asger Sand Paludan-Müller, Christian Bonde Pedersen, Tiit Mathiesen, Christian Mawrin, Birgitte Brinkmann Olsen, Bo Halle, Frantz Rom Poulsen","doi":"10.1186/s12874-024-02413-0","DOIUrl":"10.1186/s12874-024-02413-0","url":null,"abstract":"<p><strong>Background: </strong>Systematic reviews within the field of animal research are becoming more common. However, in animal translational research, issues related to methodological quality and quality of reporting continue to arise, potentially leading to underestimation or overestimation of the effects of interventions or prevent studies from being replicated. The various tools and checklists available to ensure good-quality studies and proper reporting include both unique and/or overlapping items and/or simply lack necessary elements or are too situational to certain conditions or diseases. Currently, there is no tool available, which covers all aspects of animal models, from bench-top activities to animal facilities, hence a new tool is needed. This tool should be designed to be able to assess all kinds of animal studies such as old, new, low quality, high quality, interventional and noninterventional on. It should do this on multiple levels through items on quality of reporting, methodological (technical) quality, and risk of bias, for use in assessing the overall quality of studies involving animal research.</p><p><strong>Methods: </strong>During a systematic review of meningioma models in animals, we developed a novel unifying tool that can assess all types of animal studies from multiple perspectives. The tool was inspired by the Collaborative Approach to Meta Analysis and Review of Animal Data from Experimental Studies (CAMARADES) checklist, the ARRIVE 2.0 guidelines, and SYRCLE's risk of bias tool, while also incorporating unique items. We used the interrater agreement percentage and Cohen's kappa index to test the interrater agreement between two independent reviewers for the items in the tool.</p><p><strong>Results: </strong>There was high interrater agreement across all items (92.9%, 95% CI 91.0-94.8). Cohen's kappa index showed quality of reporting had the best mean index of 0.86 (95%-CI 0.78-0.94), methodological quality had a mean index of 0.83 (95%-CI 0.78-0.94) and finally the items from SYRCLE's risk of bias had a mean kappa index of 0.68 (95%-CI 0.57-0.79).</p><p><strong>Conclusions: </strong>The Critical Appraisal of Methodological (technical) Quality, Quality of Reporting and Risk of Bias in Animal Research (CRIME-Q) tool unifies a broad spectrum of information (both unique items and items inspired by other methods) about the quality of reporting and methodological (technical) quality, and contains items from SYRCLE's risk of bias. The tool is intended for use in assessing overall study quality across multiple domains and items and is not, unlike other tools, restricted to any particular model or study design (whether interventional or noninterventional). It is also easy to apply when designing and conducting animal experiments to ensure proper reporting and design in terms of replicability, transparency, and validity.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"306"},"PeriodicalIF":3.9,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142852496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aoife Whiston, K M Kidwell, S O'Reilly, C Walsh, J C Walsh, L Glynn, K Robinson, S Hayes
{"title":"The use of sequential multiple assignment randomized trials (SMARTs) in physical activity interventions: a systematic review.","authors":"Aoife Whiston, K M Kidwell, S O'Reilly, C Walsh, J C Walsh, L Glynn, K Robinson, S Hayes","doi":"10.1186/s12874-024-02439-4","DOIUrl":"10.1186/s12874-024-02439-4","url":null,"abstract":"<p><strong>Background: </strong>Physical activity (PA) is often the cornerstone in risk-reduction interventions for the prevention and treatment of many chronic health conditions. PA interventions are inherently multi-dimensional and complex in nature. Thus, study designs used in the evaluation of PA interventions must be adaptive to intervention components and individual capacities. A Sequential Multiple Assignment Randomised Trial (SMART) is a factorial design in a sequential setting used to build effective adaptive interventions. SMARTs represent a relatively new design for PA intervention research. This systematic review aims to examine the state-of-the-art of SMARTs used to develop PA interventions, with a focus on study characteristics, design, and analyses.</p><p><strong>Methods: </strong>PubMed, Embase, PsychINFO, CENTRAL, and CinAHL were systematically searched through May 2023 for studies wherein PA SMARTs were conducted. Methodological quality was assessed using the Cochrane Risk of Bias 2 Tool.</p><p><strong>Results: </strong>Twenty studies across a variety of populations - e.g., obesity, chronic pain, and cardiovascular conditions, were included. All PA SMARTs involved two decision stages, with the majority including two initial treatment options. PA interventions most commonly consisted of individual aerobic exercise with strategies such as goal setting, wearable technology, and motivational interviewing also used to promote PA. Variation was observed across tailoring variables and timing of tailoring variables. Non-response strategies primarily involved augmenting and switching treatment options, and for responders to continue with initial treatment options. For analyses, most sample size estimations and outcome analyses accounted for the SMART aims specified. Techniques such as linear mixed models, weighted regressions, and Q-learning regression were frequently used. Risk of bias was high across the majority of included studies.</p><p><strong>Conclusions: </strong>Individual-based aerobic exercise interventions supported by behaviour change techniques and wearable sensing technology may play a key role in the future development of SMARTs addressing PA intervention development. Clearer rationale for the selection of tailoring variables, timing of tailoring variables, and included measures is essential to advance PA SMART designs. Collaborative efforts from researchers, clinicians, and patients are needed in order to bridge the gap between adaptive research designs and personalised treatment pathways observed in clinical practice.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"308"},"PeriodicalIF":3.9,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Multivariate filter methods for feature selection with the <ns0:math><ns0:mrow><ns0:mi>γ</ns0:mi></ns0:mrow> </ns0:math> -metric.","authors":"Nicolas Ngo, Pierre Michel, Roch Giorgi","doi":"10.1186/s12874-024-02426-9","DOIUrl":"10.1186/s12874-024-02426-9","url":null,"abstract":"<p><strong>Background: </strong>The <math><mi>γ</mi></math> -metric value is generally used as the importance score of a feature (or a set of features) in a classification context. This study aimed to go further by creating a new methodology for multivariate feature selection for classification, whereby the <math><mi>γ</mi></math> -metric is associated with a specific search direction (and therefore a specific stopping criterion). As three search directions are used, we effectively created three distinct methods.</p><p><strong>Methods: </strong>We assessed the performance of our new methodology through a simulation study, comparing them against more conventional methods. Classification performance indicators, number of selected features, stability and execution time were used to evaluate the performance of the methods. We also evaluated how well the proposed methodology selected relevant features for the detection of atrial fibrillation, which is a cardiac arrhythmia.</p><p><strong>Results: </strong>We found that in the simulation study as well as the detection of AF task, our methods were able to select informative features and maintain a good level of predictive performance; however in a case of strong correlation and large datasets, the <math><mi>γ</mi></math> -metric based methods were less efficient to exclude non-informative features.</p><p><strong>Conclusions: </strong>Results highlighted a good combination of both the forward search direction and the <math><mi>γ</mi></math> -metric as an evaluation function. However, using the backward search direction, the feature selection algorithm could fall into a local optima and can be improved.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"307"},"PeriodicalIF":3.9,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Hu, Hai Yan, Ming Liu, Jing Gao, Lianhong Xie, Chunyu Zhang, Lili Wei, Yinging Ding, Hong Jiang
{"title":"Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records.","authors":"Ying Hu, Hai Yan, Ming Liu, Jing Gao, Lianhong Xie, Chunyu Zhang, Lili Wei, Yinging Ding, Hong Jiang","doi":"10.1186/s12874-024-02422-z","DOIUrl":"10.1186/s12874-024-02422-z","url":null,"abstract":"<p><strong>Background: </strong>Electronic medical records (EMR)-trained machine learning models have the potential in CVD risk prediction by integrating a range of medical data from patients, facilitate timely diagnosis and classification of CVDs. We tested the hypothesis that unsupervised ML approach utilizing EMR could be used to develop a new model for detecting prevalent CVD in clinical settings.</p><p><strong>Methods: </strong>We included 155,894 patients (aged ≥ 18 years) discharged between January 2014 and July 2022, from Xuhui Hospital, Shanghai, China, including 64,916 CVD cases and 90,979 non-CVD cases. K-means clustering was used to generate the clustering models with k = 2, 4, and 8 as predetermined number of clusters k = 2, 4, and 8. Bayesian theorem was used to estimate the models' predictive accuracy.</p><p><strong>Results: </strong>The overall predictive accuracy of the 2-, 4-, and 8-classification clustering models in the training set was 0.856, 0.8634, and 0.8506, respectively. Similarly, the predictive accuracy of the 2-, 4-, and 8-classification clustering models in the testing set was 0.8598, 0.8659, and 0.8525, respectively. After reducing from 19 dimensions to 2 dimensions by principal component analysis, significant separation was observed for CVD cases and non-CVD cases in both training and testing sets.</p><p><strong>Conclusion: </strong>Our findings indicate that the utilization of EMR data can support the development of a robust model for CVD detection through an unsupervised ML approach. Further investigation using longitudinal design is needed to refine the model for its applications in clinical settings.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"309"},"PeriodicalIF":3.9,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}