8  the problem with t-test

Let’s go back to the example of the independent samples t-test.

We sampled 10 boys and 14 girls, age 12, and asked:

Are 12-year old girls significantly taller than 12-year old boys?

We then went about answering this question by talking about the means of each sample, and if the differences between the means were large enough to be considered significant.

The whole machinery behind the t-test is based on the normality assumption.

8.1 the normality assumption

Two possible interpretations come to mind.

  1. The assumption is that the height of men and women in the population is normally distributed. From these idealized populations we draw samples.
  2. The t-test effectively compares the difference between the means of the two samples, and the variability within each sample. Because of the Central Limit Theorem, the means of the samples will approach a normal distribution as the sample size increases. In this interpretation, the normality assumption is about the distribution of the means of the samples, and not the distribution of the population.

In the context of the t-test, the above is a distinction without a difference. Even if the population is not normally distributed, the means of the samples will be normally distributed as long as the sample size is large enough. We then use the t-test and go on with our lives.

8.2 other statistical tests

The Central Limit Theorem dictates that the means will be normally distributed, but it does not apply to other statistics, such as:

  • the median
  • the variance
  • the skewness
  • the maximum
  • the Interquartile Range (IQR)
  • etc.

In this case, the t-test can’t be relied upon, and we need another solution.