Battle of the Inference: Inference or Prediction?

06.02.2017 |

Episode #7 of the course An introduction to data science by Roger Peng


In this lesson, you’ll learn to distinguish between inferential and prediction questions. Understanding whether you’re answering an inferential question versus a prediction question is an important concept, because the type of question you’re answering can greatly influence the modeling strategy you pursue. If you do not clearly understand which type of question you are asking, you may end up using the wrong type of modeling approach and ultimately make the wrong conclusions from your data. The purpose of this lesson is to show you what can happen when you confuse one question for another. The key things to remember are:

  1. For inferential questions, the goal is typically to estimate an association between a predictor of interest and the outcome. There is usually only a handful of predictors of interest (or even just one); however, there are typically many potential confounding variables to consider. They key goal of modeling is to estimate an association while making sure you appropriately adjust for any potential confounders. Often, sensitivity analyses are conducted to see if associations of interest are robust to different sets of confounders.

  2. For prediction questions, the goal is to identify a model that best predicts the outcome. Typically we do not place any a priori importance on the predictors, so long as they are good at predicting the outcome. There is no notion of “confounder” or “predictors of interest” because all predictors are potentially useful for predicting the outcome. Also, we often do not care about “how the model works” or telling a detailed story about the predictors. The key goal is to develop a model with good prediction skill and estimate a reasonable error rate from the data.

In any data analysis, you want to ask yourself, “Am I asking an inferential question or a prediction question?” This should be cleared up before any data are analyzed, as the answer to the question can guide the entire modeling strategy. Framing the question right and applying the appropriate modeling strategy can play a large role in the kinds of conclusions you draw from the data.


Tomorrow, you’ll learn about the principles of interpreting your results.


Recommended book

“Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking” by Foster Provost, Tom Fawcett


Share with friends