Sampling

P. E. Pfeifer
{"title":"Sampling","authors":"P. E. Pfeifer","doi":"10.1201/b16842-6","DOIUrl":null,"url":null,"abstract":"This note explains the basics of sampling. It defines and discusses the concepts of random sampling, the law of averages, and the central limit theorem. It covers the sampling of both continuous uncertain quantities (where the sample is summarized by the sample average and sample standard deviation) and categorical variables (where the sample is summarized by the sample proportion). The note carefully explains that the results of random sampling from an infinite population are equivalent to repeated and independent outcomes of an underlying probability distribution. Excerpt UVA-QA-0513 Rev. Mar. 14, 2017 SAMPLING The word sampling probably brings to mind a large collection of items from which a small number of items will be selected and measured. We inspect units from yesterday's production and grade their quality. We poll potential voters in an upcoming election and find out how they plan to vote. We capture fish from a lake and measure their length. We study a subset of companies in an industry and summarize their financial performance. We survey customers from our universe of customers and monitor their satisfaction. In the language of sampling, the large collection of items is called the population and the smaller number of items actually selected and measured is called the sample. Because the number of items in the population can be very large, and the costs of sampling nontrivial, a complete sampling of the population (a census) is usually not economical. The challenges become how to select a useful sample and how to interpret and use the information contained in the sample, recognizing that it provides an imperfect picture of the population. This note explains how samples behave so that we can accurately interpret the results of a sample. Our interpretation of a sample begins with an understanding of the method used to collect the sample. For the sample to reflect the population from which it was drawn, the sample must be chosen in a certain way. The most common method for collecting a sample that will accurately reflect the population is called random sampling. A random sample is one in which each item in the population has an equal chance of being included in the sample. For yesterday's production, randomness requires that we take our sample at randomly chosen times throughout the day. For the fish in the lake example, it will be very difficult to collect a random sample unless every size of fish is equally likely to be caught (a highly unlikely assumption). If the sampling is not done randomly, it is difficult if not impossible to interpret the sample results. If large fish are wiser and less likely to be caught, the fish we catch will not be a random sample of the population of fish. The lengths of the fish in our sample will thus be biased: The average length of the fish in the sample will tend to understate the average length of the fish in the lake. In addition to samples that were collected randomly, this note will consider samples collected from very large or infinite populations. As long as the size of the sample is small relative to the size of the population, the size of the population is irrelevant to interpreting the sample results. Only if the sampling is accomplished without replacement and the sample size accounts for a notable portion of the population will the size of the population affect the interpretation of the sample results. . . .","PeriodicalId":158767,"journal":{"name":"EduRN: Other Social Sciences Education (Topic)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EduRN: Other Social Sciences Education (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/b16842-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This note explains the basics of sampling. It defines and discusses the concepts of random sampling, the law of averages, and the central limit theorem. It covers the sampling of both continuous uncertain quantities (where the sample is summarized by the sample average and sample standard deviation) and categorical variables (where the sample is summarized by the sample proportion). The note carefully explains that the results of random sampling from an infinite population are equivalent to repeated and independent outcomes of an underlying probability distribution. Excerpt UVA-QA-0513 Rev. Mar. 14, 2017 SAMPLING The word sampling probably brings to mind a large collection of items from which a small number of items will be selected and measured. We inspect units from yesterday's production and grade their quality. We poll potential voters in an upcoming election and find out how they plan to vote. We capture fish from a lake and measure their length. We study a subset of companies in an industry and summarize their financial performance. We survey customers from our universe of customers and monitor their satisfaction. In the language of sampling, the large collection of items is called the population and the smaller number of items actually selected and measured is called the sample. Because the number of items in the population can be very large, and the costs of sampling nontrivial, a complete sampling of the population (a census) is usually not economical. The challenges become how to select a useful sample and how to interpret and use the information contained in the sample, recognizing that it provides an imperfect picture of the population. This note explains how samples behave so that we can accurately interpret the results of a sample. Our interpretation of a sample begins with an understanding of the method used to collect the sample. For the sample to reflect the population from which it was drawn, the sample must be chosen in a certain way. The most common method for collecting a sample that will accurately reflect the population is called random sampling. A random sample is one in which each item in the population has an equal chance of being included in the sample. For yesterday's production, randomness requires that we take our sample at randomly chosen times throughout the day. For the fish in the lake example, it will be very difficult to collect a random sample unless every size of fish is equally likely to be caught (a highly unlikely assumption). If the sampling is not done randomly, it is difficult if not impossible to interpret the sample results. If large fish are wiser and less likely to be caught, the fish we catch will not be a random sample of the population of fish. The lengths of the fish in our sample will thus be biased: The average length of the fish in the sample will tend to understate the average length of the fish in the lake. In addition to samples that were collected randomly, this note will consider samples collected from very large or infinite populations. As long as the size of the sample is small relative to the size of the population, the size of the population is irrelevant to interpreting the sample results. Only if the sampling is accomplished without replacement and the sample size accounts for a notable portion of the population will the size of the population affect the interpretation of the sample results. . . .
抽样
本笔记解释了抽样的基础知识。它定义并讨论了随机抽样、平均律和中心极限定理的概念。它既包括连续不确定量的抽样(用样本平均值和样本标准差来总结样本),也包括分类变量的抽样(用样本比例来总结样本)。注释仔细地解释了从无限总体中随机抽样的结果等同于潜在概率分布的重复和独立的结果。采样这个词可能会让人想到大量的项目,从中选择少量的项目并进行测量。我们检查了昨天生产的产品,并对其质量进行了分级。我们对即将到来的选举中的潜在选民进行调查,了解他们计划如何投票。我们从湖中捕获鱼并测量它们的长度。我们研究了一个行业中的一部分公司,并总结了它们的财务表现。我们从我们的客户中调查客户,并监测他们的满意度。在抽样的语言中,大量的项目被称为总体,而实际选择和测量的较少数量的项目被称为样本。由于总体中的项目数量可能非常大,而且抽样的成本非常高,因此对总体进行完整抽样(人口普查)通常是不经济的。面临的挑战是如何选择一个有用的样本,以及如何解释和使用样本中包含的信息,认识到它提供了一个不完美的总体图像。本笔记解释了样品的行为,以便我们能够准确地解释样品的结果。我们对样本的解释始于对收集样本的方法的理解。为了使样本能够反映抽取样本的总体情况,必须以某种方式选择样本。收集能准确反映总体情况的样本的最常用方法称为随机抽样。随机样本是指总体中每个项目被纳入样本的机会均等的样本。对于昨天的生产,随机性要求我们在一天中随机选择时间取样。对于湖中的鱼来说,除非每种大小的鱼都有可能被捕获(这是一个极不可能的假设),否则很难收集随机样本。如果抽样不是随机进行的,即使不是不可能,也很难解释抽样结果。如果大鱼更聪明,更不容易被捕获,我们捕获的鱼就不会是鱼类种群中的随机样本。因此,我们样本中鱼的长度将是有偏差的:样本中鱼的平均长度将倾向于低估湖中鱼的平均长度。除了随机收集的样本外,本文还将考虑从非常大或无限总体中收集的样本。只要样本规模相对于总体规模较小,那么总体规模与解释样本结果无关。只有在没有替换的情况下完成抽样,并且样本量占总体的显著部分时,人口的大小才会影响样本结果的解释. . . .
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信