Episode #8 of the course Introduction to statistics by Polina Durneva
Yesterday, we talked about degrees of freedom that determine the freedom of observations to vary (i.e., if we have 15 observations, the degrees of freedom are 15 – 1 = 14). The degrees of freedom are quite important in various statistical tests, including the t-test. Let’s discuss what the t-test is and why it is important.
Applications of T-Test
T-test is used to evaluate the mean value of a certain group or the mean values of several groups. There are three main applications of the t-test. First, you can use the t-value to compare the average values between two distinct groups. For instance, you might want to know the difference in average age between men and women to complete their bachelor’s degree. To estimate this difference, you can find the average age of women and men when they complete their bachelor’s degree and set the t-test to check if the difference in the average age is statistically significant.
Second, you can use the t-test to compare the change in the average value for one group over time. For instance, you can conduct the t-test to know if the average age of children entering the first grade in 2018 is statistically different from the average age of children entering the first grade in 2008.
Third, you can use the t-test to compare the average value of your sample group to a hypothesized average. For example, you think that the average waiting time in a bank is more than ten minutes. You will calculate the t-value based on data in your sample and evaluate the strength of your claim using this test.
Calculation of T-Value
To conduct the t-test, you need to be able to calculate the t-value. There are two main components required for such calculation. First, you need to find the difference between your sample mean and your hypothesized mean (or another group’s mean). Let’s assume that you want to check if the waiting time in bank X is more than ten minutes on average. You get the random sample of waiting times in bank X over a period and find that the average waiting time is 20 minutes. The difference between your sample mean and your hypothesized mean is 10.
Second, you need to evaluate your sample’s variation. To do so, you need to find the standard deviation in your sample and divide the standard deviation by the squared root of your sample size. Let’s assume that the standard deviation in your sample is 2 and the size of your sample is 36. The variation within your sample will be: 2 / 36 = 2 / 6 = 1 / 3.
Finally, you need to divide the difference between your sample mean and your hypothesized mean by the variation in your sample: t-value = 10 / (1 / 3) = 30.
To reject null hypothesis, your t-value should be relatively large. In our case, the t-value is 30, meaning that the difference between our sample mean and our hypothesized mean is 30 times the variation within our sample. This is definitely a statistically significant result, and therefore, we can state that the waiting time in bank X is more than ten minutes.
However, if you want to know if your t-value is big enough to reject null hypothesis, you need to calculate the p-value, which is the probability that your sample mean is random, and use degrees of freedom for this calculation. Nowadays, we have powerful statistical software (STATA, R, Minitab, and others) that can be used to calculate the t-values and corresponding p-values.
That’s it for today! Tomorrow, we will discuss the Chi-Square test.
See you soon,
How to Lie with Statistics by Darrell Huff
Share with friends