Sampling error is the difference between a sample statistic(the mean, variance, or standard deviation of the sample) and its corresponding population parameter(the true mean, variance, or standard deviation of the population).app
For example, the sampling error for the mean is as follows:less
sampling errors of the mean = sample mean - population mean = x-μ
Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same likelihood of being included in the sample.dom
Stratified random sampling uses a classification system to separate the population into smaller groups based on one or more distinguishing characteristics. From each subgroup, or stratum, a random sample is taken and the results are pooled. The size of the samples from each stratum is based on the size of the stratum relative to the population.ide
It is important to recognize that the sample statistic itself is a random variable and, therefore, has a probability distribution. The sampling distribution of the sample statistic is a probability distribution of all possible sample statistics computed from a set of equal-size samples that were randomly drawn from the same population.ui
The central limit theorem states that for simple random samples of size n from a population with a mean μ and a finite variance σ^2, the sampling distribution of the sample mean x approaches a normal probability distribution with mean μ and a variance equal to (σ^2/n) as the sample size becomes large.this
The central limit theorem is extremely useful because the normal distribution is relatively easy to apply to hypothesis testing and to the construction of confidence intervals. Specific inferences about the population mean can be made from the sample mean, regardless of the population's distribution, as long as the sample size if "sufficiently large" which usually means n>=30.code
Important properties of the central limit theorem include the following:orm
The standard error of the sample mean is the standard deviation of the distribution of the sample means.blog
When the standard deviation of the population, σ, is known,, the standard error of the sample mean is calculated as σ/(n^0.5)ci
Practically speaking, the population's standard deviation is almost never known. Instead, the standard error of the sample mean must be estimated by dividing the standard deviation of the sample mean by n^0.5 - s/(n^0.5)
Point estimates are single (sample) values used to estimate population parameters. The formula used to compute the point estimate is called the estimator. For example, the sample mean, x, is an estimator of the population mean μ and is computed as (∑X)/n
The value generated with this calculation for a given sample is called the point estimate of the mean.
A confidence interval is a range of values in which the population parameter is expected to lie.
Student's t-distribution, or simply the t-distribution, is a bell-shaped probability distribution that is symmetrical about its mean. It is the appropriate distribution to use when constructing confidence intervals based on small samples (n<30) from populations with unknown variance and a normal, or approximately normal, distribution.
It may also be appropriate to use the t-distribution when the population variance is unknown and the sample size is large enough that the central limit theorem assure that the sampling distribution is approximately normal.
t-distribution has the following properties:
The degree of freedom for tests based on sample means are n-1 because, given the mean, only n-1 observations can be unique.
As the number of degrees of freedom increases without bound, the t-distribution converges to the standard normal distribution (z-distribution). The thickness of the tails relative to those of the z-distribution is important in hypothesis testing because thicker tails mean more observations away from the center of the distribution (more outliers). Hence, hypothesis testing using the t-distribution makes it more difficult to reject null relative to hypothesis testing using the z-distribution.
Confidence interval estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1-α. Here, alpha, α, is called the level of significance for the confidence interval, and the probability 1-α is referred to as the degree of confidence.
Confidence intervals are usually constructed by adding or subtracting an appropriate value from the point estimate. In general, confidence intervals take on the following form:
point estimate +/- (reliability factor * standard error)
Unlike the standard normal distribution, the reliability factors for the t-distribution depend on the sample size, so we cannot rely on a commonly used set of reliability factors. Instead, reliability factors for the t-distribution have to be looked up in a table of Student's t-distribution.
Owing to the relatively fatter tails of the t-distribution, confidence intervals constructed using t-reliability factors will be more conservative(wider) than those constructed using z-reliability factors.
We can do this because the central limit theorem assures us that the distribution of the sample mean is approximately normal when the sample is large.
It is also acceptable to use the z-statistic, although use of the t-statistic is more conservative.
So, all else equal, make sure you have a sample of at least 30, and the larger, the better.