{"title":"Survival Analysis and the EM Algorithm","authors":"B. Efron, T. Hastie","doi":"10.1017/CBO9781316576533.010","DOIUrl":"https://doi.org/10.1017/CBO9781316576533.010","url":null,"abstract":"Survival analysis had its roots in governmental and actuarial statistics, spanning centuries of use in assessing life expectencies, insurance rates, and annuities. In the 20 years between 1955 and 1975, survival analysis was adapted by statisticians for application to biomedical studies. Three of the most popular post-war statistical methodologies emerged during this period: the Kaplan–Meier estimates, the log-rank test,1 and Cox’s proportional hazards model, the succession showing increased computational demands along with increasingly sophisticated inferential justification. A connection with one of Fisher’s ideas on maximum likelihood estimation leads in the last section of this chapter to another statistical method “gone platinum”, the EM algorithm.","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125937009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random Forests and Boosting","authors":"B. Efron, T. Hastie","doi":"10.1017/CBO9781316576533.018","DOIUrl":"https://doi.org/10.1017/CBO9781316576533.018","url":null,"abstract":"","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132960056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale Hypothesis Testing and FDRs","authors":"B. Efron, T. Hastie","doi":"10.1017/CBO9781316576533.016","DOIUrl":"https://doi.org/10.1017/CBO9781316576533.016","url":null,"abstract":"","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133524403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Jackknife and the Bootstrap","authors":"F. inJ, Xitao Fan, Lin Wang","doi":"10.1017/9781108914062.014","DOIUrl":"https://doi.org/10.1017/9781108914062.014","url":null,"abstract":"The jackknife and bootstrap methods are becoming more popular in research. Although the two approaches have similar goals and employ similar strategies, information is lacking with regard to the comparability of their results. This study systematically investigate,1 the issue for a canonical correlation analysis, using data from four random samples from the National Education Longitudinal Study of 1988. Some conspicuous discrepancies are observed mainly under small sample size conditions, and this raises some concern when researchers need to choose between the two for their small samples. Due to the lack of theoretical sampling distributions in canonical analysis, it is unclear which method had superior performance. It is suggested that Monte Carlo simulation is needed for this kind of comparison. It is also suggested that caution is warranted in generalizing the results to other statistical techniques, since the validity of such generalizations is uncertain. (Contains 6 tables and 18 references.) (Author/SLD)*********************************************************************** Reproductions supplied by EDRS are the best that can be made * from the original document. *********************************************************************** * U S DEPAKTMENT OF EDUCATION Ofl)ce of Educid)onsf ROSCIrCh and improvernni EDUCATIONAL RE SOURCES INFORMATION CENTER (ERIC) TP.s docurnem! Pis beer) reproduced as ,ece.ved from Ihe person or ordsrutstoon or.grnat.op .1 o Manor changes hve been made to approve reproduct)on dowdy Po)nts of rrew opostons slated .ri pus docu. map? do not neCessanly rety.senl ottrcrli OEM 1)03.1.01, or pohcy Jackknife and Bootstrap 1 PERMISSION TO REPRODUCE THIS MA TERIAL HAS BEEN GRANTED BY X 1_1/96 F inJ TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC) HOW COMPARABLE ARE THE JACKKNIFE AND BOOTSTRAP RESULTS: AN INVESTIGATION FOR A CASE OF CANONICAL CORRELATION ANALYSIS Xitao Fan Utah State University","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127191417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Modeling and the Lasso","authors":"B. Efron, T. Hastie","doi":"10.1017/CBO9781316576533.017","DOIUrl":"https://doi.org/10.1017/CBO9781316576533.017","url":null,"abstract":"The amount of data we are faced with keeps growing. From around the late 1990s we started to see wide data sets, where the number of variables far exceeds the number of observations. This was largely due to our increasing ability to measure a large amount of information automatically. In genomics, for example, we can use a high-throughput experiment to automatically measure the expression of tens of thousands of genes in a sample in a short amount of time. Similarly, sequencing equipment allows us to genotype millions of SNPs (single-nucleotide polymorphisms) cheaply and quickly. In document retrieval and modeling, we represent a document by the presence or count of each word in the dictionary. This easily leads to a feature vector with 20,000 components, one for each distinct vocabulary word, although most would be zero for a small document. If we move to bi-grams or higher, the feature space gets really large. In even more modest situations, we can be faced with hundreds of variables. If these variables are to be predictors in a regression or logistic regression model, we probably do not want to use them all. It is likely that a subset will do the job well, and including all the redundant variables will degrade our fit. Hence we are often interested in identifying a good subset of variables. Note also that in these wide-data situations, even linear models are over-parametrized, so some form of reduction or regularization is essential. In this chapter we will discuss some of the popular methods for model selection, starting with the time-tested and worthy forward-stepwise approach. We then look at the lasso, a popular modern method that does selection and shrinkage via convex optimization. The LARs algorithm ties these two approaches together, and leads to methods that can deliver paths of solutions. Finally, we discuss some connections with other modern big-and widedata approaches, and mention some extensions. Forward Stepwise Regression Stepwise procedures have been around for a very long time. They were originally devised in times when data sets were quite modest in size, in particular in terms of the number of variables. Originally thought of as the poor cousins of “best-subset” selection, they had the advantage of being much cheaper to compute (and in fact possible to compute for large p).We will review best-subset regression first.","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117246099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parametric Models and Exponential Families","authors":"B. Efron, T. Hastie","doi":"10.1017/CBO9781316576533.006","DOIUrl":"https://doi.org/10.1017/CBO9781316576533.006","url":null,"abstract":"","PeriodicalId":430973,"journal":{"name":"Computer Age Statistical Inference, Student Edition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}