The fact that the normal distribution is one of the pivotal statistical distributions is virtually not subject to debate. Many statistical models and tests depend on the normality assumption to be valid. One of the reasons why the normal distribution has gained such importance is the central limit theorem, which states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution. In other words, the sampling distribution of any statistic will be normal or nearly normal, if the sample size is large enough. There are varying opinions of how much ‘large enough’ is, in general it is agreed that a number greater than 30 is large enough.
When the sample size is small (not large enough as defined above) or when one of the parameters of the normal distribution i.e. the variance is unknown (the other parameter is the mean), as is always the case, the t-distribution is normally useful in testing hypothesis, thus the popular Student’s t-test.
Student’s t-test is a method of testing hypotheses about the mean of a small sample drawn from a normally distributed population when the population standard deviation is unknown. The t-distribution is a family of curves in which the number of degrees of freedom – the number of independent observations in the sample minus one – specifies a particular curve. As the sample size increases, the t-distribution approaches the symmetric bell shape of the standard normal distribution.
The first kind of null hypothesis that the t-test can be applied states that there is no effective difference between the observed sample mean and the hypothesized or stated population mean; that any measured difference is due only to chance. Normally a t-test may be either two-sided/two-tailed, hypothesising that the means are not equivalent, or one-sided, stating that observed mean is larger or smaller than the hypothesized mean. The test-statistic is calculated as the difference between the sample and hypothesised means divided by the sample standard deviation.
The second statement of the null hypothesis where the application of the t-distribution or t-test is used posits that two independent random samples have the same mean, or that the difference between the two means is zero. In this scenario the means of the T test-statistic is calculated as the difference between the two estimated means divided by the standard deviation of the difference in means. The later is calculated as the square root of the sum of the two sample variances each divided by the sample size of each group.
The third use is in testing the hypothesis about regression coefficients. The null hypothesis in this case is that the regression coefficient is equal to zero. In other words, the interest in it testing whether the effect (quantified by the regression coefficient) of the explanatory variable on the independent variable can as well be zero meaning not influence. The t-statistic is calculated by dividing the estimated coefficient by its standard error. The resulting ratio tells us how many standard-error units the coefficient is away from zero.
Once the test-statistics are calculated in each of the scenarios presented above, they are compared to critical value determined by the t-distribution. If the observed t-statistics is larger than the critical value the null hypothesis is rejected. The critical value depends on the significance level of the test – the probability of erroneously rejecting the null hypothesis. In most cases a p-value can be calculated from the test-statistic and the appropriate t-distribution and then used as the basis for rejecting the null hypothesis.