Anne-Laure Boulesteix, Mark Baillie, Dominic Edelmann, Leonhard Held, Tim P. Morris, Willi Sauerbrei
{"title":"Editorial for the special collection “Towards neutral comparison studies in methodological research”","authors":"Anne-Laure Boulesteix, Mark Baillie, Dominic Edelmann, Leonhard Held, Tim P. Morris, Willi Sauerbrei","doi":"10.1002/bimj.202400031","DOIUrl":null,"url":null,"abstract":"<p>Biomedical researchers are frequently faced with an array of methods they might potentially use for the analysis and/or design of studies. It can be difficult to understand the absolute and relative merits of candidate methods beyond one's own particular interests and expertise. Choosing a method can be difficult even in simple settings but an increase in the volume of data collected, computational power, and methods proposed in the literature makes the choice all the more difficult. In this context, it is crucial to provide researchers with evidence-supported guidance derived from appropriately designed studies comparing statistical methods in a neutral way, in particular through well-designed simulation studies.</p><p>While neutral comparison studies are an essential cornerstone toward the improvement of this situation, a number of challenges remain with regard to their methodology and acceptance. Numerous difficulties arise when designing, conducting, and reporting neutral comparison studies. Practical experience is still scarce and literature on these issues almost inexistent. Furthermore, authors of neutral comparison studies are often faced with incomprehension from a large part of the scientific community, which is more interested in the development of “new” approaches and evaluates the importance of research primarily based on the novelty of the presented methods. Consequently, meaningful comparisons of competing approaches (especially reproducible studies including publicly available code and data) are rarely available and evidence-supported state of the art guidance is largely missing, often resulting in the use of suboptimal methods in practice.</p><p>The final special collection includes 11 contributions of the first type and 12 of the second, covering a wide range of methods and issues. Our expectations were fully met and even exceeded! We thank the authors for these outstanding contributions and the many reviewers for their very helpful comments.</p><p>The papers from the first category explore a wide range of highly relevant biostatistical methods. They present interesting implementations of various neutrality concepts and methodologies aiming at more reliability and transparency, for example, study protocols.</p><p>The topics include methodology to analyze data from randomized trials, such as the use of baseline covariates to analyze small cluster-randomized trials with a rare binary outcome (Zhu et al.) and the characterization of treatment effect heterogeneity (Sun et al.). The special collection also presents comparison studies that explore a variety of modeling approaches in other contexts. These include the analysis of survival data with nonproportional hazards with propensity score–weighted methods (Handorf et al.), the impact of the matching algorithm on the treatment effect estimate in causal analyses based on the propensity score (Heinz et al.), statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases (Geroldinger et al.), and in vitro dose–response estimation under extreme observations (Fang and Zhou).</p><p>Three papers address variable selection and penalization in the context of regression models, each with a different focus. While Frommlet investigates the minimization of L<sub>0</sub> penalties in a high-dimensional context, Hanke et al. compare various model selection strategies to the best subset approach, and Luijken et al. compare full model specification and backward elimination when estimating causal effects on binary outcomes. Finally, the collection also includes papers addressing prediction modeling: Lohmann et al. compare the prediction performance of various model selection methods in the context of logistic regression, while Graf et al. compare linear discriminant analysis to several machine learning algorithms.</p><p>Four papers from the special collection address the challenge of simulating complex data and conducting large simulation studies toward the meaningful and efficient evaluation of statistical methods. Ruberg et al. present an extensive platform for evaluating subgroup identification methodologies, including the implementation of appropriate data generating models. Wahab et al. propose a dedicated simulator for the evaluation of methods that aim at providing pertinent causal inference in the presence of intercurrent events in clinical trials. Kelter outlines a comprehensive framework for Bayesian simulation studies including a structured skeleton for the planning, coding, conduct, analysis, and reporting of Bayesian simulation studies. The open science framework developed by Kodalci and Thas, which focuses on two-sample tests, allows the comparison of new methods to all previously submitted methods using all previously submitted simulation designs.</p><p>In contrast, Huang and Trinquart consider new ways to compare the performance of methods with a different type I error—a factor that complicates power interpretation. They propose a new approach by drawing an analogy to diagnostic accuracy comparisons, based on relative positive and negative likelihood ratios.</p><p>The special issue also includes various thought-provoking perspective articles discussing fundamental aspects of benchmarking methodology. Friedrich and Friede discuss the complementary roles of simulation-based and real data–based benchmarking. Heinze et al. propose a phases framework for methodological research, which considers how to make methods fit-for-use. Strobl and Leisch stress the need to give up the notion that one method can be broadly the “best” in comparison studies. Other articles address special aspects of the design of comparison studies. Pawel et al. discuss and demonstrate the impact of so-called “questionable research practices” in the context of simulation studies, Nießl et al. explain reasons for the optimistic performance evaluation of newly proposed methods through a cross-design validation experiment. Oberman and Vink focus on aspects to consider in the design of simulation experiments that evaluate imputation methodology. In a letter to the editor related to this article, Morris et al. note some issues with fixing a single complete data set rather than repeatedly sampling the data in such simulations.</p><p>Editing this Special Collection was extremely rewarding for us. Quite aside from the high quality of the submissions, we were heartened to see the biometrical community's interest in improving the quality of research comparing methods; it was of course a concern that we may receive no submissions! It is our hope that this Special Collection represents the start rather than the end of a conversation, and that readers find the articles as thought-provoking and practically useful as we have.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400031","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.202400031","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Biomedical researchers are frequently faced with an array of methods they might potentially use for the analysis and/or design of studies. It can be difficult to understand the absolute and relative merits of candidate methods beyond one's own particular interests and expertise. Choosing a method can be difficult even in simple settings but an increase in the volume of data collected, computational power, and methods proposed in the literature makes the choice all the more difficult. In this context, it is crucial to provide researchers with evidence-supported guidance derived from appropriately designed studies comparing statistical methods in a neutral way, in particular through well-designed simulation studies.
While neutral comparison studies are an essential cornerstone toward the improvement of this situation, a number of challenges remain with regard to their methodology and acceptance. Numerous difficulties arise when designing, conducting, and reporting neutral comparison studies. Practical experience is still scarce and literature on these issues almost inexistent. Furthermore, authors of neutral comparison studies are often faced with incomprehension from a large part of the scientific community, which is more interested in the development of “new” approaches and evaluates the importance of research primarily based on the novelty of the presented methods. Consequently, meaningful comparisons of competing approaches (especially reproducible studies including publicly available code and data) are rarely available and evidence-supported state of the art guidance is largely missing, often resulting in the use of suboptimal methods in practice.
The final special collection includes 11 contributions of the first type and 12 of the second, covering a wide range of methods and issues. Our expectations were fully met and even exceeded! We thank the authors for these outstanding contributions and the many reviewers for their very helpful comments.
The papers from the first category explore a wide range of highly relevant biostatistical methods. They present interesting implementations of various neutrality concepts and methodologies aiming at more reliability and transparency, for example, study protocols.
The topics include methodology to analyze data from randomized trials, such as the use of baseline covariates to analyze small cluster-randomized trials with a rare binary outcome (Zhu et al.) and the characterization of treatment effect heterogeneity (Sun et al.). The special collection also presents comparison studies that explore a variety of modeling approaches in other contexts. These include the analysis of survival data with nonproportional hazards with propensity score–weighted methods (Handorf et al.), the impact of the matching algorithm on the treatment effect estimate in causal analyses based on the propensity score (Heinz et al.), statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases (Geroldinger et al.), and in vitro dose–response estimation under extreme observations (Fang and Zhou).
Three papers address variable selection and penalization in the context of regression models, each with a different focus. While Frommlet investigates the minimization of L0 penalties in a high-dimensional context, Hanke et al. compare various model selection strategies to the best subset approach, and Luijken et al. compare full model specification and backward elimination when estimating causal effects on binary outcomes. Finally, the collection also includes papers addressing prediction modeling: Lohmann et al. compare the prediction performance of various model selection methods in the context of logistic regression, while Graf et al. compare linear discriminant analysis to several machine learning algorithms.
Four papers from the special collection address the challenge of simulating complex data and conducting large simulation studies toward the meaningful and efficient evaluation of statistical methods. Ruberg et al. present an extensive platform for evaluating subgroup identification methodologies, including the implementation of appropriate data generating models. Wahab et al. propose a dedicated simulator for the evaluation of methods that aim at providing pertinent causal inference in the presence of intercurrent events in clinical trials. Kelter outlines a comprehensive framework for Bayesian simulation studies including a structured skeleton for the planning, coding, conduct, analysis, and reporting of Bayesian simulation studies. The open science framework developed by Kodalci and Thas, which focuses on two-sample tests, allows the comparison of new methods to all previously submitted methods using all previously submitted simulation designs.
In contrast, Huang and Trinquart consider new ways to compare the performance of methods with a different type I error—a factor that complicates power interpretation. They propose a new approach by drawing an analogy to diagnostic accuracy comparisons, based on relative positive and negative likelihood ratios.
The special issue also includes various thought-provoking perspective articles discussing fundamental aspects of benchmarking methodology. Friedrich and Friede discuss the complementary roles of simulation-based and real data–based benchmarking. Heinze et al. propose a phases framework for methodological research, which considers how to make methods fit-for-use. Strobl and Leisch stress the need to give up the notion that one method can be broadly the “best” in comparison studies. Other articles address special aspects of the design of comparison studies. Pawel et al. discuss and demonstrate the impact of so-called “questionable research practices” in the context of simulation studies, Nießl et al. explain reasons for the optimistic performance evaluation of newly proposed methods through a cross-design validation experiment. Oberman and Vink focus on aspects to consider in the design of simulation experiments that evaluate imputation methodology. In a letter to the editor related to this article, Morris et al. note some issues with fixing a single complete data set rather than repeatedly sampling the data in such simulations.
Editing this Special Collection was extremely rewarding for us. Quite aside from the high quality of the submissions, we were heartened to see the biometrical community's interest in improving the quality of research comparing methods; it was of course a concern that we may receive no submissions! It is our hope that this Special Collection represents the start rather than the end of a conversation, and that readers find the articles as thought-provoking and practically useful as we have.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.