{"title":"Parametric estimation and inference","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0011","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0011","url":null,"abstract":"If it is reasonable to assume that the data are generated by a fully parametric model, then maximum-likelihood approaches to estimation and inference have many appealing properties. Maximum-likelihood estimators are obtained by identifying parameters that maximize the likelihood function, which can be done using calculus or using numerical approaches. Such estimators are consistent, and if the costs of errors in estimation are described by a squared-error loss function, then they are also efficient compared with their consistent competitors. The sampling variance of a maximum-likelihood estimate can be estimated in various ways. As always, one possibility is the bootstrap. In many models, the variance of the maximum-likelihood estimator can be derived directly once its form is known. A third approach is to rely on general properties of maximum-likelihood estimators and use the Fisher information. Similarly, there are many ways to test hypotheses about parameters estimated by maximum likelihood. This chapter discusses the Wald test, which relies on the fact that the sampling distribution of maximum-likelihood estimators is normal in large samples, and the likelihood-ratio test, which is a general approach for testing hypotheses relating nested pairs of models.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126641844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semiparametric estimation and inference","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0010","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0010","url":null,"abstract":"Nonparametric and semiparametric statistical methods assume models whose properties cannot be described by a finite number of parameters. For example, a linear regression model that assumes that the disturbances are independent draws from an unknown distribution is semiparametric—it includes the intercept and slope as regression parameters but has a nonparametric part, the unknown distribution of the disturbances. Nonparametric and semiparametric methods focus on the empirical distribution function, which, assuming that the data are really independent observations from the same distribution, is a consistent estimator of the true cumulative distribution function. In this chapter, with plug-in estimation and the method of moments, functionals or parameters are estimated by treating the empirical distribution function as if it were the true cumulative distribution function. Such estimators are consistent. To understand the variation of point estimates, bootstrapping is used to resample from the empirical distribution function. For hypothesis testing, one can either use a bootstrap-based confidence interval or conduct a permutation test, which can be designed to test null hypotheses of independence or exchangeability. Resampling methods—including bootstrapping and permutation testing—are flexible and easy to implement with a little programming expertise.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131414637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Encountering data","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0002","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0002","url":null,"abstract":"Statistics is concerned with using data to learn about the world. In this book, concepts for reasoning from data are developed using a combination of math and simulation. Using a running example, we will consider probability theory, statistical estimation, and statistical inference. Estimation and inference will be considered from three different perspectives.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115111230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"R and exploratory data analysis","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0003","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0003","url":null,"abstract":"R is a powerful, free software package for performing statistical tasks. It will be used to simulate data, analyze data, and make data displays. More details about R are given in Appendix B.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116562190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Postlude: models and data","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0013","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0013","url":null,"abstract":"Becoming a well-rounded data analyst requires more than the skills covered in this book. This postlude sketches some ways in which the types of thinking covered here can be extended to real problems in data analysis. Different ways of evaluating the assumptions of linear regression are considered, including plotting, hypothesis tests, and out-of-sample prediction. If the assumptions are not met, simple linear regression can be extended in various ways, including multiple regression, generalized linear models, and mixed models (among many other possibilities). This postlude concludes with a short discussion of the themes of the book: probabilistic models, methodological pluralism, and the value of elementary statistical thinking.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prelude","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0001","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0001","url":null,"abstract":"There are two traditional ways to learn statistics. One way is to pass over the mathematical underpinnings and focus on developing relatively shallow knowledge about a wide variety of statistical procedures. Another is to spend years learning the mathematics necessary for traditional mathematical approaches to statistics. For many people who need to analyze data, neither of these paths is sufficient. The shallow-but-wide approach fails to provide students with the foundation that allows for confidence and creativity in analyzing modern datasets, and many researchers—though possibly motivated to learn math—do not have the background to start immediately on a traditional mathematical approach. This book exists to help researchers jump between tracks, providing motivated students whose knowledge of mathematics may be incomplete or rusty with a serious introduction to statistics that allows further study from more mathematical sources. This is done by focusing on a single statistical technique that is fundamental to statistical practice—simple linear regression—and supplementing the exposition with ample simulations conducted in the statistical programming language R. The first half of the book focuses on preliminaries, including the use of R and probability theory, whereas the second half covers statistical estimation and inference from semiparametric, parametric, and Bayesian perspectives.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129904577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of random variables","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0006","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0006","url":null,"abstract":"In this chapter, the behavior of random variables is summarized using the concepts of expectation, variance, and covariance. The expectation is a measurement of the location of a random variable’s distribution. The variance and its square root, the standard deviation, are measurements of the spread of a random variable’s distribution. Covariance and correlation are measurements of the extent of linear relationship between two random variables. The chapter also describe two important theorems that describe the distribution of means of samples from a distribution. As the sample size becomes larger, the distribution of the sample mean becomes bunched more tightly around the expectation—this is the law of large numbers—and the distribution of the sample mean approaches the shape of a normal distribution—this is the central limit theorem. Finally, a model describing a linear relationship between two random variables is considered, and the properties of those two random variables are analyzed.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115874137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of point estimators","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0008","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0008","url":null,"abstract":"Point estimation is the attempt to identify a value associated with some underlying process or population using data. The unknown number that is the target of estimation is called an estimand. An estimator is a function that takes in data and produces an estimate. In this chapter, estimators are evaluated according to a number of criteria. An unbiased estimator is one whose expected value is equal to the estimand—in lay terms, it is accurate. Low-variance estimators, which are precise, are also evaluated. Consistent estimators converge to the estimand as the number of data collected approaches infinity. Mean squared error is the expected squared difference between the estimator and the estimand. Efficient estimators are those that converge to the estimand relatively quickly—i.e., fewer data are necessary to get close to the right answer. An optional section discusses statistical decision theory, which is a general framework for evaluating estimators. Finally, some ideas of robustness are discussed. A robust estimator is one that can still provide useful information even if the model is not quite right or the data are contaminated.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121397401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interval estimation and inference","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0009","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0009","url":null,"abstract":"Interval estimation is the attempt to define intervals that quantify the degree of uncertainty in an estimate. The standard deviation of an estimate is called a standard error. Confidence intervals are designed to cover the true value of an estimand with a specified probability. Hypothesis testing is the attempt to assess the degree of evidence for or against a specific hypothesis. One tool for frequentist hypothesis testing is the p value, or the probability that if the null hypothesis is in fact true, the data would depart as extremely or more extremely from expectations under the null hypothesis than they were observed to do. In Neyman–Pearson hypothesis testing, the null hypothesis is rejected if p is less than a pre-specified value, often chosen to be 0.05. A test’s power function gives the probability that the null hypothesis is rejected given the significance level γ, a sample size n, and a specified alternative hypothesis. This chapter discusses some limitations of hypothesis testing as commonly practiced in the research literature.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124716415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The line of best fit","authors":"M. Edge","doi":"10.1093/oso/9780198827627.003.0004","DOIUrl":"https://doi.org/10.1093/oso/9780198827627.003.0004","url":null,"abstract":"One way to visualize a set of data on two variables is to plot them on a pair of axes. A line that “best fits” the data can then be drawn as a summary. This chapter considers how to define a line of “best” fit—there is no sole best choice. The most commonly chosen line to summarize the data is the “least-squares” line—the line that minimizes the sum of the squared vertical distances between the points and the line. One reason for the least-squares line’s popularity is convenience, but, as will be seen later, it is also related to some key ideas in statistical estimation. The derivations of expressions for the intercept and slope of the least-squares line are discussed.","PeriodicalId":192186,"journal":{"name":"Statistical Thinking from Scratch","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128127419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}