One thing that regularly stumps scientists is the handling of data. We seem to be very good at generating obscene amounts of it, but representing it meaningfully can be a little off putting if you don’t happen to be a bioinformatician. Let’s wet our toes with a simple One Sample t-test to see how we can easily incorporate statistical analysis into our work.
Of course, the calculations involved can be done on a simple calculator but your task will be made much easier with the use of spreadsheet software (Excel) or more specialized tools which are available as most high schools and universities (Minitab, Prism, SigmaPlot, etc.). Each will have their own tutorials on carrying out these tests and so this article will not be heavily technical but rather focus on the correct application of statistical testing.
Main Points to Remember
The t-test is a form of hypothesis testing – to use a set of sample data to test a hypothesis for the entire population. It is used when the population standard deviation (σ) is unknown and the sample size is small (n<30). In real-world samples we don't usually have a basis for knowing σ.
Since the t-distribution becomes equivalent to the normal distribution (bell curve) when the sample size is large, the correct practice is:
- If σ known, use normal distribution.
- If σ unknown:
- If n>30, use normal distribution.
- If n<30, use t-distribution.
|Test||When To Use||An Example|
|1 sample t-test||Tests if the mean of a single population is equal to hypothesized value||A lecturer claims that the mean time taken to complete a quiz is 1 hour. From a sample data set, can we reject this claim?|
|2 sample t-test||Tests if the difference between means of two independent populations is equal to hypothesized value||Does the mean quiz score of female students differ significantly from the mean quiz score of male students?|
|paired t-test||Tests if the difference between means of dependent or paired observations is equal to a hypothesized value||The mean response time of adults before and after they have consumed alcohol. Is the difference significant enough to conclude that alcohol affects response time?|
A one-sample t-test can determine whether μ (mu, the population mean) is equal to a hypothesized mean. The test uses s (sample standard deviation) to estimate σ (sigma, the population standard deviation). If the difference between x̅ (x bar, sample mean) and the hypothesized mean is large relative to s, then μ is unlikely to be equal.
For example, you want to determine whether the mean time for completing an online quiz is statistically different from the lecturer’s claim of 1 hour. μ in this case represents the mean time taken by the entire group of students to finish the quiz.
Now μ is either equal to 1 hour or it is not. Therefore the possibilities can be encompassed within two hypotheses:
- The null hypothesis (H0): μ is equal to 1 hour.
- The alternative hypothesis (H1): μ is not equal to 1 hour.
Using software to generate this data will yield several key parameters such as the sample mean, sample standard deviation, a confidence interval, T-statistic and p-value. A sample data set (n=7) has been generated below:
Test of mu = 1 vs not = 1 for N=7
|Variable||n||Mean||St Dev||95% CI||T-statistic||p-value|
The key parameter here is the p-value (probability value), and answers the question ‘What is the probability that the sample mean calculated fulfills the null hypothesis, taking into account sample size and standard deviation?’
The confidence interval is usually defined before hypothesis testing. With a 95% confidence interval for μ, you can be 95% confident that the returned range of values contains μ. Generally confidence intervals of 95% are used unless otherwise stated – sometimes α (alpha, the significant level) is used to describe this (for 95% CI, α = 0.05).
If the p-value is larger than α then the null hypothesis cannot be rejected (0.089 > 0.05). In this case, there is not enough evidence to suggest that the lecturer’s claim of the online quiz taking 1 hour is false.
It is important to note that using the t-test for hypothesis testing requires the adoption of certain assumptions about the data being analyzed. If these assumptions cannot be met, then the conclusions obtained from the test cannot be validated. The assumptions for a one-sample t-test are:
- The sample must be random.
- Sample data must be continuous.
- Sample data should be normally distributed (although this assumption is less critical when the sample size is 30 or more).
I hope that this tutorial has been useful, happy t-testing!