{"title":"不要对定量变量进行分组","authors":"Bendix Carstensen","doi":"10.1093/OSO/9780198841326.003.0010","DOIUrl":null,"url":null,"abstract":"This chapter explores the problems caused by categorizing quantitative variables (here termed continuous variables). Optimum decisions are made by applying a utility function to a predicted value. At the decision point, one can solve for the personalized cutpoint for predicted risk that optimizes the decision. Dichotomization on independent variables is completely at odds with making optimal decisions. To make an optimal decision, the cutpoint for a predictor would necessarily be a function of the continuous values of all the other predictors. Moreover, categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. Categorization of continuous variables using percentiles is particularly hazardous. To make a continuous predictor be more accurately modelled when categorization is used, multiple intervals are required.","PeriodicalId":177736,"journal":{"name":"Epidemiology with R","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Do not group quantitative variables\",\"authors\":\"Bendix Carstensen\",\"doi\":\"10.1093/OSO/9780198841326.003.0010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This chapter explores the problems caused by categorizing quantitative variables (here termed continuous variables). Optimum decisions are made by applying a utility function to a predicted value. At the decision point, one can solve for the personalized cutpoint for predicted risk that optimizes the decision. Dichotomization on independent variables is completely at odds with making optimal decisions. To make an optimal decision, the cutpoint for a predictor would necessarily be a function of the continuous values of all the other predictors. Moreover, categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. Categorization of continuous variables using percentiles is particularly hazardous. To make a continuous predictor be more accurately modelled when categorization is used, multiple intervals are required.\",\"PeriodicalId\":177736,\"journal\":{\"name\":\"Epidemiology with R\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology with R\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/OSO/9780198841326.003.0010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology with R","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/OSO/9780198841326.003.0010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This chapter explores the problems caused by categorizing quantitative variables (here termed continuous variables). Optimum decisions are made by applying a utility function to a predicted value. At the decision point, one can solve for the personalized cutpoint for predicted risk that optimizes the decision. Dichotomization on independent variables is completely at odds with making optimal decisions. To make an optimal decision, the cutpoint for a predictor would necessarily be a function of the continuous values of all the other predictors. Moreover, categorization assumes that the relationship between the predictor and the response is flat within intervals; this assumption is far less reasonable than a linearity assumption in most cases. Categorization of continuous variables using percentiles is particularly hazardous. To make a continuous predictor be more accurately modelled when categorization is used, multiple intervals are required.