Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more means. ANOVA is used to test general, omnibus, rather than specific differences among means. The null hypothesis is that all the population group means are equal versus the alternative that at least one of the population means differs from the others.

Depending on the problem at hand there are several classes of ANOVA. For instance, if there is only one variable or factor that defines groups e.g. age which can be categorised in to several levels for instance <1, 1-5 and >=5 years, then we will talk of one-way ANOVA if the means of a particular quantity is to be compared across the three levels of age. If there are two factors that group observations then the comparison of the means in those groups will be done by two-way ANOVA. Other variants of ANOVA include, repeated measures ANOVA, nested designs and Latin square design.

This article describes the steps in performing one-way ANOVA. Some definition of terms is appropriate in describing how to calculate the test-statistic and consequently test the null hypothesis:

Degrees of freedom (DF): This is the number of values in the final calculation of a statistic (e.g. a mean) that are free to vary. Imagine a set of four numbers, whose mean is 10. There are lots of sets of four numbers with a mean of 10 but for any of these sets you can freely pick the first three numbers. The fourth and last number is out of your hands as soon as you pick the first three; because you have to ensure that the last number is picked such that the mean of the four is 10. In this case it will be said that the set has 3 degrees of freedom; the number elements in the set that are allowed to vary freely.

The DF for the variable (e.g. Age group) in ANOVA is calculated by taking the number of group levels (called k) minus 1 (i.e. k – 1). The DF for Error is found by taking the total sample size, N, minus k (i.e. Nk). This is because we lose one degree of freedom every time we estimate each of the k group means. The DF for Total is found by N – 1; we lose on degree of freedom when we estimate the grand mean, the mean of all samples independent of the group they are in.

Sum of Squares (SS): The between-group SS, or SSB, is a measure of the variation in the data between the groups. It is the sum of squared deviations from the overall mean of each of the k group means. The error SS, or SSE, or within-group sum of squares (SSW) is the sum of squared deviations of each observation from its group mean. The total SS, or SST, is the sum of squared deviation of each observation from the grand/overall mean. These values are additive such that SST = SSB + SSW.

F-statistic: This is the test statistic used for ANOVA. It is calculated Mean Square (MS) for the factor/variable (MSR = SSB/DF for the variable) divided by the MS of the error (MSE=SSW/DF for error). The F-statistic is a ratio of the variability between groups compared to the variability within the groups. If this ratio is larger than 1 there is more variability between groups than within groups.

p-value: The meaning of p-value has been discussed elsewhere. In the context of ANOVA, it is the probability of obtaining an F-statistic greater than that observed if the null hypothesis was true. The null hypothesis is that all the group population means are equal versus the alternative that at least one is not equal. This probability is obtained by comparing the calculated F-statistic to the theoretical F-distribution. If the p-value is less than the conventional critical value of 0.05 then there is sufficient evidence to reject the null hypothesis. In the case the null hypothesis is rejected, further pairwise tests have to be conducted to determine which particular group means are significantly different. These are called post-hoc tests.

For the validity of the results, some assumptions have to be checked to hold before the ANOVA technique is applied. These are:

  • Each level of the factor is applied to a sample.
  • The population from which the sample was obtained must be normally distributed.
  • The samples must be independent.
  • The variances of the population must be equal.

There are alternative methods or modifications of the base case ANOVA, which are applicable when some of these assumptions are violated but those are not discussed in this article.