Degrees of Freedom
Episode #7 of the course Introduction to statistics by Polina Durneva
Today, we will discuss degrees of freedom and their importance in various statistical tests.
Informal Explanation of Degrees of Freedom
Consider the following example. You have five countries that you would like to visit in the future: Italy, Australia, Japan, Brazil, and France. After thinking carefully about your budget, you estimate that you will be able to afford going to one unique country a year, meaning that you will visit all five countries in five years.
In the first year, you decide to go to Japan. If you go to Japan in the first year, you will have only four options left for the next four years. In the second year, you decide to go to Australia, meaning that for the next three years, you have only three options left. In the third year, you decide to go to Italy, meaning that you have only two options left for the following two years. After you make a decision on what country to visit in your fourth year, you do not have any options to choose among. For instance, if you decide to go to Brazil, then in your fifth year, you can only go to France. If you decide to go to France first, then in your fifth year, you can go only to Brazil. Thus, you have freedom to choose what country you want to go to only during the first four years, or 5 – 1 = 4. This is the main idea behind the degrees of freedom.
In statistical analysis, the degrees of freedom illustrate the number of data points that can vary and be calculated using the following formula: df = n – 1, where n is the number of observations.
Statistical Explanation of Degrees of Freedom
To calculate the degrees of freedom, you subtract one from the total number of observations in your dataset. But can we use a statistical example to better understand the meaning of the degrees of freedom?
Let’s assume you have a sample dataset that consists of 15 values. You do not know what these values are, but you do know the average value of these 15 numbers. Let’s assume that this average value is 45. The sum of all values is the number of observations times the average of all observations, or 15 * 45 = 675. What are these 15 values? We can choose any first 14 values in any way we want, but we cannot choose the 15th value. For instance, the first 14 values can be 2, 7, 9, 14, 15, 23, 27, 31, 39, 42, 50, 50, 56, and 80. The sum of these values is 2 + 7 + 9 + 14 + 15 + 23 + 27 + 31 + 39 + 42 + 50 + 50 + 56 + 80 = 445, meaning that the last value in our sample dataset is 675 – 445 = 230. If we were to choose other 14 values, then the last 15th value would be calculated in the similar manner. Thus, for our 15 values, or observations, we have 15 – 1 = 14 degrees of freedom.
Usage of Degrees of Freedom
We can use degrees of freedom for statistical inference in t-tests, Chi-Square tests, and regressions. Statisticians use t-tests to estimate if the mean value of a population is different from a certain value. Chi-Square tests are mostly used to explore any significant relationships between two qualitative variables. Finally, regressions are used to estimate any significant relationships among various quantitative and qualitative variables.
We will discuss each of these statistical methods in the following lessons.
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman
Share with friends