Hypothesis testing is the statistical assessment of a statement or idea regarding a population.express
A hypothesis is a statement about the value of a population parameter level developed for the purpose of testing a theory or belief. Hypotheses are stated in terms of the population parameters to be tested, like the population mean, μ.app
Hypothesis testing procedures, based on sample statistics and probability theory, are used to determine whether a hypothesis is a reasonable statement and should not be rejected or if it is an unreasonable statement and should be rejected.less
The null hypothesis (H0), is the hypothesis that the researcher wants to reject. It is the hypothesis that is actually tested and is the basis for the selection of the test statistics.
The null is generally stated as a simple statement about a population parameter.dom
The null hypothesis always includes the "equal to" condition.ide
The alternative hypothesis(Ha), is what is concluded if there is sufficient evidence to reject the null hypothesis. It is usually the alternative hypothesis that you are really trying to assess. Why? Since you can never really prove anything with statistics, when the null hypothesis is discredited, the implication is that the alternative hypothesis is valid.ui
原假設 (null hypothesis): 基於一種假設的系統模型,在這種假設下咱們認爲觀測到的效應是有偶然因素形成的,即不是統計上顯著的(statistically significant).this
A two-tailed test (雙邊檢驗)for the population mean may be structured as:lua
Since the alternative hypothesis allows for values above and below the hypothesized parameter, a two-tailed test uses two critical values (or rejection points).url
The general decision rule for a two-tailed test is:idea
A one-tailed test (單邊檢驗)of the population mean, the null and alternative hypotheses are either:
The appropriate set of hypothesis depends on whether we believe the population mean, μ, to be greater than (upper tail) or less than (lower tail) the hypothesized value, μ0.
The most common null hypothesis will be an "equal to" hypothesis. Combined with a "not equal to" alternative,, this will require a two-tailed test. The alternative if often the hoped-for hypothesis. When the null is that a coefficient is equal to zero, we hope to reject it and show the significance of the relationship.
When the null is less than or equal to, the (mutually exclusive) alternative if framed as greater than, and a one-tailed test is appropriate. If we are trying to demonstrate that a return is greater than the risk-free rate, this would be the correct formulation. We will have set up the null and alternative hypothesis so that rejection of the null will lead to acceptance of the alternative, our goal in performing the test.
Hypothesis testing involves two statistics: the test statistic calculated from the sample data and the critical value of the test statistic. The value of the computed test statistic relative to the critical value is a key step in assessing the validity of a hypothesis.
A test statistic is calculated by comparing the point estimate of the population parameter with the hypothesized value of the parameter (i.e., the value specified in the null hypothesis).
The test statistic is the difference between the sample statistic and the hypothesized value, scaled by the standard error of the sample statistic.
The standard error of the sample statistic is the adjusted standard deviation of the sample. When the sample statistic is the sample mean, the standard error of the sample statistic for sample size n, is calculated as:
when the population standard deviation, σ, is known, or
when the population standard deviation, σ, is not known. In this case, it is estimated using the standard deviation of the sample, s.
A test statistic is a random variable that may follow one of several distributions, depending on the characteristics of the sample and the population.
The significance level (顯著水平) is the probability of making a Type I error(rejecting the null when it is true) and is designated by the Geek letter alpha (α). For instance, a significance level of 5% (α=0.05) means there is 5% chance of rejecting a true null hypothesis. When conducting hypothesis test, a significance level must be specified in order to identify the critical values needed to evaluate the test statistic.
I類錯誤: 也稱假陽性(false positive),指的是咱們接受了一個本質爲假的假設。也就是說,咱們認爲某個效應具備統計顯著性,可是實際上該效應倒是由偶然因素產生的。
II類錯誤:也稱假陰性(false negative),指的是咱們推翻了一個本質爲真的假設。也就是說,咱們將某個效應歸結爲隨即產生的,但實際上真實存在。
假設檢驗中最多見的方法是爲p值選擇一個閾值(α), 一旦p值小於這個閾值,咱們就推翻原假設。一般狀況下,咱們選擇5%做爲閾值(α)。
對於這類假設檢驗,咱們能夠獲得出現假陽性的精確機率,指責該機率就是α值。
咱們解釋下緣由,首先回顧假陽性和p值的定義:假陽性是值接受了一個不成立的假設,p值是指假設不成立時出現測量效應的機率。
二者結合起來,咱們的問題是:若是選擇α爲顯著性閾值,當假設不成立時,出現該測量效應的機率會是多少呢?答案就是α.
咱們能夠經過下降閾值來控制假陽性。例如若是咱們設置閾值爲1%,那麼出現假陽性的機率就等於1%
可是下降假陽性也是有代價的。閾值的下降會致使判斷效應確實存在的標準提升,這樣推翻有效假設的可能性就會變大,即咱們更有可能接受原假設。
通常來講,I類錯誤和II類錯誤之間存在一種權衡,同時下降這兩種錯誤的惟一方法是增長樣本數量(或者,在某些狀況降低低測量偏差).
While the significance level of a test is the probability of rejecting the null hypothesis when it is true, the power of a test is the probability of correctly rejecting the null hypothesis when it is false. The power of a test is actually one minus probability of making a Type II error (1-P(Type II Error)). In other words, the probability of rejecting the null when it is false(power of the test) equals one minus the probability of not rejecting the null when it is false (Type II error). When more than one test statistic may be used, the power of the test for the competing test statistic may be useful in deciding which test statistic to use. Ordinarily, we wish to use the test statistic that provides the most powerful test among all possible tests.
Sample size and the choice of significance level(Type 1 error probability) will together determine the probability of a Type II error. The relation is not simple, however, and calculating the probability of a Type II error in practice is quite difficult. Decreasing the significance level(probability of a Type I error) from 5% to 1%,, for example, will increase the probability of failing to reject a false null(Type II error) and therefore reduce the power of the test. Conversely, for a given sample size, we can increase the power of a test only with the cost that the probability of rejecting a true null (Type I error) increase. For a given significance level, we can decrease the probability of a Type II error and increase the power of a test, only by increasing the sample size.
For two-tailed tests, the p-value is the probability that lies above the positive value of the computed test statistic plus the probability that lies below thee negative value of the computed test statistic.
p值(p-value): 在原假設下,出現直觀效應的機率
The decision for a hypothesis test is to either reject the null hypothesis or fail to reject the null hypothesis. Note that it is statistically incorrect to say "accept" the null hypothesis; it can only be supported or rejected.
The decision rule for rejecting or failing to reject the null hypothesis is based on the distribution of the test statistic. For example, if the test statistic follows a normal distribution, the decision rule is based on critical values determined from the standard normal distribution(z-distribution). Regardless of the appropriate distribution, it must be determined if a one-tailed or two-tailed hypothesis test is appropriate before a decision rule(reject rule) can be determined.
A decision rule is specific and quantitative. Once we have determined whether a one- or two-tailed test is appropriate, the significance level we require, and the distribution of the test statistic, we can calculate the exact critical value for the test statistic. Then we have a decision rule of the following form: if the test statistic is (greater, less than) the value X, reject the null.
A confidence interval is a range of values within which the researcher believes the true population parameter may lie.
A confidence interval is determined as:
The interpretation of a confidence interval is that for a level of confidence of 95%, for example, there is a 95% probability that the true population parameter is contained in the interval.
From the previous expression, we see that a confidence interval and a hypothesis test are linked by the critical value. For example, a 95% confidence interval uses a critical value associated with a given distribution at the 5% level of significance. Similarly, a hypothesis test would compare a test statistic to a critical value at the 5% level of significance. To see this relationship more clearly, the expression for the confidence interval can be manipulated and restated as:
-critical value <= test statistic <= +critical value
This is the range within which we fail to reject the null for a two-tailed hypothesis test at a given level of significance.
See "Degree of Freedom in Statistics"
In statistics, the number of degree of freedom is the number of values in the final calculation of a statistic that are free to vary.
The number of independent ways by which a dynamic system can move without violating any constraint imposed on it, is called degree of freedom. In other words, the degree of freedom can be defined as the minimum number of independent coordinates that can specify the position of the system completely.
Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter is equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (i.e. the sample variance has N-1 degree of freedom, since it is computed from N random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean).
See here for another example.
Consider, for example, the statistic S^2.
To calculate the S^2 of a random sample, we must first calculate the mean of that sample and then compute the sum of the several squared deviations from that mean. While there will be n such squared deviations, only (n-1) of them are, in fact, free to assume any value whatsoever. This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample. All of the other (n-ve any values whatsoever. For these reasons, the statisti1) squared deviations from the mean can, theoretically, hac S^2 is said to have only (n-1) degrees of freedom.
白話解釋下這個例子,由於sample mean肯定的狀況下,N個數中間只有N-1個數是能夠是任意值,而剩下的一個數的取值取決於另外N-1個數(由於要使得sample mean肯定),因此自由度只能是N-1.
From Baidu Zhidao
自由度(degree of freedom, df)在數學中可以自由取值的變量個數,若有3個變量x、y、z,但x+y+z=18,所以其自由度等於2。在統計學中,自由度指的是計算某一統計量時,取值不受限制的變量個數。一般df=n-k。其中n爲樣本含量,k爲被限制的條件數或變量個數,或計算某一統計量時用到其它獨立統計量的個數。自由度一般用於抽樣分佈中。
When hypothesis testing, the choice between using a critical value based on the t-distribution or the z-distribution depends on sample size, distribution of the population, and whether or not the variance of the population is known.
The t-test is a widely used hypothesis test that employs a test statistic that is distributed according to a t-distribution.
Use the t-test if the population variance is unknown and either the following conditions exist:
If the sample is small and the distribution is non-normal, we have no reliable statistical test.
The computed value for the test statistic based on the t-distribution is referred to as the t-statistic. For hypothesis tests of a population mean, a t-statistic with n-1 degree of freedom is computed as:
To conduct a t-test, the t-statistic is compared to a critical t-value at the desired level of significance with the appropriate degree of freedom.
In the real world, the underlying variance of the population is rarely known, so the t-test enjoys widespread application.
The z-test is the appropriate hypothesis test of the population mean when the population is normally distributed with known variance. The computed test statistic used with the z-test is referred to as the z-statistic. The z-statistic for a hypothesis test for a population mean is computed as follows:
To test a hypothesis, the z-statistic is compared to the critical z-value corresponding to the significance of the test.
Critical z-values for the most common levels of significance are displayed in the following figure,
When the sample size is large and the population variance is unknown, the z-statistic is:
This is acceptable if the sample size is large, although the t-statistic is the more conservative measure when the population variance is unknown.
t-檢驗和z-檢驗都是用來檢驗整體的均值的(population mean)。
The chi-square test is used for hypothesis tests concerning the variance of a normally distributed population. Letting σ^2 represent the true population variance, and σ0^2 represent the hypothesized variance, the hypotheses for a two-tailed test of a single population variance are structured as:
The hypothesis for one-tailed test is structured as:
Hypothesis testing of the population variance requires the use of a chi-square distributed test statistic, denoted as . The chi-square distribution is asymmetrical and approaches the normal distribution in shape as the degree of freedom increase.
The chi-square test statistic, , with n-1 degrees of freedom, is computed as:
Similar to other hypothesis tests, the chi-square test compares the test statistic to a critical chi-square value at a given level of significance and n-1 degrees of freedom. Note that since the chi-square distribution is bounded below by zero, chi-square values cannot be negative.
Testing the equality of the variances of two normally distributed populations, based on two independent random samples.
The hypotheses concerned with the equality of the variances of two populations are tested with an F-distributed test statistic. Hypothesis testing using a test statistic that follows an F-distribution is referred to as the F-test. The F-test is used under the assumption that the populations from which samples are drawn are normally distributed and that the samples are independent.
If we let σ1^2 and σ2^2 represent the variances of normal Population 1 and Population 2, respectively, the hypotheses for the two-tailed F-test of differences in the variances can be structured as:
and the one-sided test structures can be specified as:
The test statistic for the F-test is the ratio of the sample variances. The F-statistic is computed as:
Note that n1-1 and n2-1 are the degrees of the freedom used to identify the appropriate critical value from the F-table.
Always put the larger variance in the numerator. Following this convention means we only have to consider the critical value for the right-hand tail.
The F-distribution is right-skewed and is truncated at zero on the left-hand side. The shape of the F-distribution is determined by two separate degrees of freedom, the numerator degrees of freedom, df1, and the denominator degrees of freedom, df2. The rejection region is in the right-side tail of the distribution. This will always be the case as long as the F-statistic is computed with the largest sample variance in the numerator.