How to: Statistics – ANOVA

One thing that regularly stumps scientists is the handling of data. We seem to be very good at generating obscene amounts of it, but representing it meaningfully can be a little off putting if you don’t happen to be a bioinformatician. In previous tutorials we looked at hypothesis testing using variations of the t-Test, and we continue the series by comparing more than 2 samples sets with ANOVA.

Of course, the calculations involved can be done on a simple calculator but your task will be made much easier with the use of spreadsheet software (Excel) or more specialized tools which are available as most high schools and universities (Minitab, Prism, SigmaPlot, etc.). Each will have their own tutorials on carrying out these tests and so this article will not be heavily technical but rather focus on the correct application of statistical testing.

Test When To Use An Example
1 sample t-test Tests if the mean of a single population is equal to hypothesized value A lecturer claims that the mean time taken to complete a quiz is 1 hour. From a sample data set, can we reject this claim?
2 sample t-test Tests if the difference between means of two independent populations is equal to hypothesized value Does the mean quiz score of female students differ significantly from the mean quiz score of male students?
paired t-test Tests if the difference between means of dependent or paired observations is equal to a hypothesized value The mean response time of adults before and after they have consumed alcohol. Is the difference significant enough to conclude that alcohol affects response time?
ANOVA Tests for statistical difference among means for more than two populations Studying effectiveness of three types
of pain reliever:
aspirin vs. tylenol vs. ibuprofen

Analysis Of Variance

Biologist_and_statistician_Ronald_Fisher

ANOVA dude Ronald Fisher (Flickr)

The different forms of t-Tests are powerful tools to determine statistical significance, as we have discussed in previous tutorials, but a little itty bitty problem quickly arises as we dive into hypothesis testing: What if we have more than 2 sample groups!?

For example if 5 independent populations are involved, being restricted to t-Tests means that 10 separate calculations would have to be performed, comparing each mean with the others. Not only would this take forever, but also increases the risk of Type 1 error by inflating the p-value, hence incorrectly rejecting the null hypothesis.

ANOVA provides the solution to test if the means of several groups are equal, and therefore performs like a supercharged t-Test!

anova2_orig

ANOVA is a single test to determine the significance of the difference between the means of three or more groups.

ANOVA is about looking at the ‘signal’ relative to the ‘noise’ between the variances of the groups. We want to see if the between-group variance (signal), is comparable to the
within-group variance (noise).

Example

A scientist wants to determine the effectiveness of three types of pain reliever (e.g. Aspirin vs. Tylenol vs. Ibuprofen), and collects data on three groups ranking their change in pain level before and after receiving treatment.

  • The null hypothesis (H0): There is no difference between the means (μ1 = μ2 = μ3)
  • The alternative hypothesis (H1): There is a significant difference in at least one of the means

In this case there are three independent samples sets with ‘effectiveness’ the dependent variable. In a typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none). Rejecting the null hypothesis would imply that different treatments result in different effects.

As with hypothesis testing, a significance value (α) has to be chosen in order to determine the confidence level of the test. A level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. From this a p-value can be generated.

  • P-value ≤ α: The differences between some of the means are statistically significant (the null hypothesis is rejected).
  • P-value > α: The differences between the means are not statistically significant (there is insufficient evidence to reject the null hypothesis).

And that concludes our quick analysis and application guide to ANOVA! Join us next time for more statistics fun 🙂 In the meantime, wouldn’t it be a great idea to check out one of our other tutorials?