{"title":"A general, flexible, and harmonious framework to construct interpretable functions in regression analysis.","authors":"Tianyu Zhan, Jian Kang","doi":"10.1093/biomtc/ujaf014","DOIUrl":null,"url":null,"abstract":"<p><p>An interpretable model or method has several appealing features, such as reliability to adversarial examples, transparency of decision-making, and communication facilitator. However, interpretability is a subjective concept, and even its definition can be diverse. The same model may be deemed as interpretable by a study team, but regarded as a black-box algorithm by another squad. Simplicity, accuracy and generalizability are some additional important aspects of evaluating interpretability. In this work, we present a general, flexible and harmonious framework to construct interpretable functions in regression analysis with a focus on continuous outcomes. We formulate a functional skeleton in light of users' expectations of interpretability. A new measure based on Mallows's $C_p$-statistic is proposed for model selection to balance approximation, generalizability, and interpretability. We apply this approach to derive a sample size formula in adaptive clinical trial designs to demonstrate the general workflow, and to explain operating characteristics in a Bayesian Go/No-Go paradigm to show the potential advantages of using meaningful intermediate variables. Generalization to categorical outcomes is illustrated in an example of hypothesis testing based on Fisher's exact test. A real data analysis of NHANES (National Health and Nutrition Examination Survey) is conducted to investigate relationships between some important laboratory measurements. We also discuss some extensions of this method.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujaf014","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
An interpretable model or method has several appealing features, such as reliability to adversarial examples, transparency of decision-making, and communication facilitator. However, interpretability is a subjective concept, and even its definition can be diverse. The same model may be deemed as interpretable by a study team, but regarded as a black-box algorithm by another squad. Simplicity, accuracy and generalizability are some additional important aspects of evaluating interpretability. In this work, we present a general, flexible and harmonious framework to construct interpretable functions in regression analysis with a focus on continuous outcomes. We formulate a functional skeleton in light of users' expectations of interpretability. A new measure based on Mallows's $C_p$-statistic is proposed for model selection to balance approximation, generalizability, and interpretability. We apply this approach to derive a sample size formula in adaptive clinical trial designs to demonstrate the general workflow, and to explain operating characteristics in a Bayesian Go/No-Go paradigm to show the potential advantages of using meaningful intermediate variables. Generalization to categorical outcomes is illustrated in an example of hypothesis testing based on Fisher's exact test. A real data analysis of NHANES (National Health and Nutrition Examination Survey) is conducted to investigate relationships between some important laboratory measurements. We also discuss some extensions of this method.
一个可解释的模型或方法有几个吸引人的特征,比如对抗性例子的可靠性,决策的透明性,以及沟通促进者。然而,可解释性是一个主观的概念,甚至它的定义也可以是多种多样的。同一个模型可能被一个研究小组认为是可解释的,但被另一个小组认为是黑盒算法。简洁性、准确性和概括性是评估可解释性的另外一些重要方面。在这项工作中,我们提出了一个通用的、灵活的、和谐的框架来构建回归分析中的可解释函数,重点是连续的结果。我们根据用户对可解释性的期望制定了一个功能框架。提出了一种基于Mallows的$C_p$-统计量的模型选择度量,以平衡近似性、概括性和可解释性。我们运用这种方法推导出适应性临床试验设计中的样本量公式,以展示一般工作流程,并解释贝叶斯Go/No-Go范式的操作特征,以展示使用有意义的中间变量的潜在优势。一个基于费雪精确检验的假设检验的例子说明了对分类结果的泛化。通过对NHANES (National Health and Nutrition Examination Survey)的实际数据分析,探讨了一些重要的实验室测量之间的关系。我们还讨论了该方法的一些扩展。
期刊介绍:
The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.