Editorial for the special collection “Towards neutral comparison studies in methodological research”

IF 1.3 3区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biometrical Journal Pub Date : 2024-02-17 DOI:10.1002/bimj.202400031

Anne-Laure Boulesteix, Mark Baillie, Dominic Edelmann, Leonhard Held, Tim P. Morris, Willi Sauerbrei

{"title":"Editorial for the special collection “Towards neutral comparison studies in methodological research”","authors":"Anne-Laure Boulesteix, Mark Baillie, Dominic Edelmann, Leonhard Held, Tim P. Morris, Willi Sauerbrei","doi":"10.1002/bimj.202400031","DOIUrl":null,"url":null,"abstract":"Biomedical researchers are frequently faced with an array of methods they might potentially use for the analysis and/or design of studies. It can be difficult to understand the absolute and relative merits of candidate methods beyond one's own particular interests and expertise. Choosing a method can be difficult even in simple settings but an increase in the volume of data collected, computational power, and methods proposed in the literature makes the choice all the more difficult. In this context, it is crucial to provide researchers with evidence-supported guidance derived from appropriately designed studies comparing statistical methods in a neutral way, in particular through well-designed simulation studies.While neutral comparison studies are an essential cornerstone toward the improvement of this situation, a number of challenges remain with regard to their methodology and acceptance. Numerous difficulties arise when designing, conducting, and reporting neutral comparison studies. Practical experience is still scarce and literature on these issues almost inexistent. Furthermore, authors of neutral comparison studies are often faced with incomprehension from a large part of the scientific community, which is more interested in the development of “new” approaches and evaluates the importance of research primarily based on the novelty of the presented methods. Consequently, meaningful comparisons of competing approaches (especially reproducible studies including publicly available code and data) are rarely available and evidence-supported state of the art guidance is largely missing, often resulting in the use of suboptimal methods in practice.The final special collection includes 11 contributions of the first type and 12 of the second, covering a wide range of methods and issues. Our expectations were fully met and even exceeded! We thank the authors for these outstanding contributions and the many reviewers for their very helpful comments.The papers from the first category explore a wide range of highly relevant biostatistical methods. They present interesting implementations of various neutrality concepts and methodologies aiming at more reliability and transparency, for example, study protocols.The topics include methodology to analyze data from randomized trials, such as the use of baseline covariates to analyze small cluster-randomized trials with a rare binary outcome (Zhu et al.) and the characterization of treatment effect heterogeneity (Sun et al.). The special collection also presents comparison studies that explore a variety of modeling approaches in other contexts. These include the analysis of survival data with nonproportional hazards with propensity score–weighted methods (Handorf et al.), the impact of the matching algorithm on the treatment effect estimate in causal analyses based on the propensity score (Heinz et al.), statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases (Geroldinger et al.), and in vitro dose–response estimation under extreme observations (Fang and Zhou).Three papers address variable selection and penalization in the context of regression models, each with a different focus. While Frommlet investigates the minimization of L0 penalties in a high-dimensional context, Hanke et al. compare various model selection strategies to the best subset approach, and Luijken et al. compare full model specification and backward elimination when estimating causal effects on binary outcomes. Finally, the collection also includes papers addressing prediction modeling: Lohmann et al. compare the prediction performance of various model selection methods in the context of logistic regression, while Graf et al. compare linear discriminant analysis to several machine learning algorithms.Four papers from the special collection address the challenge of simulating complex data and conducting large simulation studies toward the meaningful and efficient evaluation of statistical methods. Ruberg et al. present an extensive platform for evaluating subgroup identification methodologies, including the implementation of appropriate data generating models. Wahab et al. propose a dedicated simulator for the evaluation of methods that aim at providing pertinent causal inference in the presence of intercurrent events in clinical trials. Kelter outlines a comprehensive framework for Bayesian simulation studies including a structured skeleton for the planning, coding, conduct, analysis, and reporting of Bayesian simulation studies. The open science framework developed by Kodalci and Thas, which focuses on two-sample tests, allows the comparison of new methods to all previously submitted methods using all previously submitted simulation designs.In contrast, Huang and Trinquart consider new ways to compare the performance of methods with a different type I error—a factor that complicates power interpretation. They propose a new approach by drawing an analogy to diagnostic accuracy comparisons, based on relative positive and negative likelihood ratios.The special issue also includes various thought-provoking perspective articles discussing fundamental aspects of benchmarking methodology. Friedrich and Friede discuss the complementary roles of simulation-based and real data–based benchmarking. Heinze et al. propose a phases framework for methodological research, which considers how to make methods fit-for-use. Strobl and Leisch stress the need to give up the notion that one method can be broadly the “best” in comparison studies. Other articles address special aspects of the design of comparison studies. Pawel et al. discuss and demonstrate the impact of so-called “questionable research practices” in the context of simulation studies, Nießl et al. explain reasons for the optimistic performance evaluation of newly proposed methods through a cross-design validation experiment. Oberman and Vink focus on aspects to consider in the design of simulation experiments that evaluate imputation methodology. In a letter to the editor related to this article, Morris et al. note some issues with fixing a single complete data set rather than repeatedly sampling the data in such simulations.Editing this Special Collection was extremely rewarding for us. Quite aside from the high quality of the submissions, we were heartened to see the biometrical community's interest in improving the quality of research comparing methods; it was of course a concern that we may receive no submissions! It is our hope that this Special Collection represents the start rather than the end of a conversation, and that readers find the articles as thought-provoking and practically useful as we have.","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 2","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400031","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Journal","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.202400031","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Biomedical researchers are frequently faced with an array of methods they might potentially use for the analysis and/or design of studies. It can be difficult to understand the absolute and relative merits of candidate methods beyond one's own particular interests and expertise. Choosing a method can be difficult even in simple settings but an increase in the volume of data collected, computational power, and methods proposed in the literature makes the choice all the more difficult. In this context, it is crucial to provide researchers with evidence-supported guidance derived from appropriately designed studies comparing statistical methods in a neutral way, in particular through well-designed simulation studies.

While neutral comparison studies are an essential cornerstone toward the improvement of this situation, a number of challenges remain with regard to their methodology and acceptance. Numerous difficulties arise when designing, conducting, and reporting neutral comparison studies. Practical experience is still scarce and literature on these issues almost inexistent. Furthermore, authors of neutral comparison studies are often faced with incomprehension from a large part of the scientific community, which is more interested in the development of “new” approaches and evaluates the importance of research primarily based on the novelty of the presented methods. Consequently, meaningful comparisons of competing approaches (especially reproducible studies including publicly available code and data) are rarely available and evidence-supported state of the art guidance is largely missing, often resulting in the use of suboptimal methods in practice.

The final special collection includes 11 contributions of the first type and 12 of the second, covering a wide range of methods and issues. Our expectations were fully met and even exceeded! We thank the authors for these outstanding contributions and the many reviewers for their very helpful comments.

The papers from the first category explore a wide range of highly relevant biostatistical methods. They present interesting implementations of various neutrality concepts and methodologies aiming at more reliability and transparency, for example, study protocols.

The topics include methodology to analyze data from randomized trials, such as the use of baseline covariates to analyze small cluster-randomized trials with a rare binary outcome (Zhu et al.) and the characterization of treatment effect heterogeneity (Sun et al.). The special collection also presents comparison studies that explore a variety of modeling approaches in other contexts. These include the analysis of survival data with nonproportional hazards with propensity score–weighted methods (Handorf et al.), the impact of the matching algorithm on the treatment effect estimate in causal analyses based on the propensity score (Heinz et al.), statistical methods for analyzing longitudinally measured ordinal outcomes in rare diseases (Geroldinger et al.), and in vitro dose–response estimation under extreme observations (Fang and Zhou).

Three papers address variable selection and penalization in the context of regression models, each with a different focus. While Frommlet investigates the minimization of L₀ penalties in a high-dimensional context, Hanke et al. compare various model selection strategies to the best subset approach, and Luijken et al. compare full model specification and backward elimination when estimating causal effects on binary outcomes. Finally, the collection also includes papers addressing prediction modeling: Lohmann et al. compare the prediction performance of various model selection methods in the context of logistic regression, while Graf et al. compare linear discriminant analysis to several machine learning algorithms.

Four papers from the special collection address the challenge of simulating complex data and conducting large simulation studies toward the meaningful and efficient evaluation of statistical methods. Ruberg et al. present an extensive platform for evaluating subgroup identification methodologies, including the implementation of appropriate data generating models. Wahab et al. propose a dedicated simulator for the evaluation of methods that aim at providing pertinent causal inference in the presence of intercurrent events in clinical trials. Kelter outlines a comprehensive framework for Bayesian simulation studies including a structured skeleton for the planning, coding, conduct, analysis, and reporting of Bayesian simulation studies. The open science framework developed by Kodalci and Thas, which focuses on two-sample tests, allows the comparison of new methods to all previously submitted methods using all previously submitted simulation designs.

In contrast, Huang and Trinquart consider new ways to compare the performance of methods with a different type I error—a factor that complicates power interpretation. They propose a new approach by drawing an analogy to diagnostic accuracy comparisons, based on relative positive and negative likelihood ratios.

The special issue also includes various thought-provoking perspective articles discussing fundamental aspects of benchmarking methodology. Friedrich and Friede discuss the complementary roles of simulation-based and real data–based benchmarking. Heinze et al. propose a phases framework for methodological research, which considers how to make methods fit-for-use. Strobl and Leisch stress the need to give up the notion that one method can be broadly the “best” in comparison studies. Other articles address special aspects of the design of comparison studies. Pawel et al. discuss and demonstrate the impact of so-called “questionable research practices” in the context of simulation studies, Nießl et al. explain reasons for the optimistic performance evaluation of newly proposed methods through a cross-design validation experiment. Oberman and Vink focus on aspects to consider in the design of simulation experiments that evaluate imputation methodology. In a letter to the editor related to this article, Morris et al. note some issues with fixing a single complete data set rather than repeatedly sampling the data in such simulations.

Editing this Special Collection was extremely rewarding for us. Quite aside from the high quality of the submissions, we were heartened to see the biometrical community's interest in improving the quality of research comparing methods; it was of course a concern that we may receive no submissions! It is our hope that this Special Collection represents the start rather than the end of a conversation, and that readers find the articles as thought-provoking and practically useful as we have.

查看原文本刊更多论文

为 "在方法论研究中开展中性比较研究 "特辑撰写社论。

他们通过类比诊断准确性比较，提出了一种基于相对正负似然比的新方法。特刊还包括多篇发人深省的观点文章，讨论了基准制定方法的基本方面。Friedrich 和 Friede 讨论了基于模拟和基于真实数据的基准测试的互补作用。Heinze 等人提出了方法论研究的阶段性框架，其中考虑了如何使方法适合使用。Strobl 和 Leisch 强调，在比较研究中，需要放弃一种方法可以成为 "最佳 "方法的观念。其他文章探讨了比较研究设计的特殊方面。Pawel 等人讨论并证明了所谓的 "有问题的研究实践 "在模拟研究中的影响，Nießl 等人解释了通过交叉设计验证实验对新提出的方法进行乐观的性能评估的原因。Oberman 和 Vink 重点讨论了在设计评估估算方法的模拟实验时需要考虑的方面。在一封与本文相关的致编辑的信中，莫里斯等人指出了在此类模拟中固定单一完整数据集而不是重复采样数据的一些问题。除了高质量的投稿外，我们还欣喜地看到生物计量学界对提高比较方法研究质量的兴趣；当然，我们也担心可能收不到投稿！我们希望本特辑是对话的开始而不是结束，也希望读者和我们一样认为这些文章发人深省、切实有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrical Journal 生物-数学与计算生物学

CiteScore

3.20

自引率

5.90%

发文量

119

审稿时长

6-12 weeks

期刊介绍： Biometrical Journal publishes papers on statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Methodological developments should be motivated by an interesting and relevant problem from these areas. Ideally the manuscript should include a description of the problem and a section detailing the application of the new methodology to the problem. Case studies, review articles and letters to the editors are also welcome. Papers containing only extensive mathematical theory are not suitable for publication in Biometrical Journal.