Episode #9 of the course Introduction to statistics by Polina Durneva
In the previous lesson, we talked about t-test, which can be used to analyze the mean value of one group or several groups. Today, we will discuss another important statistical test, called the chi-square test.
In statistics, there are two types of chi-square tests. The first, called the goodness of fit test, determines how well your sample represents the entire population. The second type, called the test for independence, evaluates the relationship between two categorical variables. In this lesson, we will talk about the second type of chi-square test.
Conducting a Chi-Square Test
To calculate the chi-square, denoted as χ2, we need to know the observed frequency (O) and the expected frequency (E) such that χ2 = ∑((O – E)2 / E).
To better understand how the chi-square test is conducted, let’s proceed to an example. Let’s say that you have a random sample and want to find out if there is any relationship between gender (male and female) and eye color (green, blue, and brown). Our dataset consists of 200 observations. The table below summarizes our observations (or observed counts, O):
To find the expected counts (E), we need to calculate several probabilities. Let’s calculate the expected frequency for males with blue eyes. How can we do that? Well, the probability for a person to be a male in our sample can be found by dividing the number of males by the total number of people, or 90 / 200 = 9 / 20. Then, we want to know how many people with blue eyes are males. To do so, we need to multiply the probability of being a male by the total count of people with blue eyes, or (9 / 20) * 75 = 33.75. This is the expected frequency for males with blue eyes. The table below illustrates the expected frequency for all categories (the calculations are performed in the similar manner):
Calculation of Chi-Square Test
Let’s use the values from the tables above to calculate the chi-square value: χ2 = (25 – 33.75)2 / 33.75 + (50 – 41.25)2 / 41.25 + (35 – 33.75)2 / 33.75 + (40 – 41.25)2 / 41.25 + (30 – 22.5)2 / 22.5 + (20 – 27.5)2 / 27.5 = 8.754209.
To interpret this chi-square, we need to consider the degrees of freedom and the p-value. In our case, we have three different types of eye colors (blue, brown, and green), and therefore, the degrees of freedom are 3 – 1 = 2. To find the p-value, we need to use the chi-square tables whose values are based on the degrees of freedom and the probability values. This table was composed by statisticians to indicate the minimum chi-square required for different degrees of freedom to reject null hypothesis. The chi-square table can be found in any statistics book and various online sources.
For our case, when df = 2 and the p-value = 0.05 (let’s use the p-value for the sake of simplicity), the chi-square value should be at least 1.38629. Since our calculated chi-square value significantly exceeds 1.38629, we can state that there is a relationship between gender and eye color in our sample. (Please note that all the values in our sample are made up and do not present any real significance, as they were simply used to demonstrate the concept.)
That’s it for today. Tomorrow, we will discuss another statistical concept called linear regression and finish our course.
See you soon,
Statistics by David Freedman, Robert Pisani, Roger Purves
Share with friends