How to Clean Your Survey Data

26.07.2018 |

Episode #7 of the course How to conduct a market research survey by Nick Freiling


After your survey is fielded (which we covered yesterday), it’s time to clean your survey data. Here’s how it’s done!

Cleaning data refers to the process of ensuring you keep only those responses you can count on being true reflections of your target market’s feelings. This means discarding responses that suffer from data quality flags mentioned below.

To clean your data, download your survey dataset into Excel (you’ll need to be just a bit familiar with basic Excel formulas to look for these flags). Once you’ve identified the responses you want to discard, make a note of their unique ID (usually assigned by the survey platform) and filter them out of the reports generated by your survey platform.



These are respondents who took your survey too quickly. Identifying responses like this is based on the median time spent taking your survey.

A rule of thumb here is to discard responses from anyone who completed your survey in less than half the median time. There are some exceptions, like if your survey includes a logic branch that had certain respondents answering just a few questions. But in general, anyone going more than twice as fast as the average respondent is likely someone who sped through the survey without giving the questions much thought.

You can identify speeders by downloading your survey data into Excel, then subtracting the “time completed” from the “time started.” Most survey platforms I’ve used record this information. If yours doesn’t, you may have to skip this flag, but be sure to check for the following two.

In general, not more than 10% of your survey sample should be discarded for speeding.



These are people who picked the same answer to every (or most) multiple-choice question in your survey. For example, say that you asked four open-ended questions about price (the Van Westendorp question set we discussed in a previous lesson). A flatliner answered the same thing for each of these four questions (say, $10).

If you notice that more than 10% of your survey respondents are flagging on flatlining, you may want to look more closely at the questions you’re including in your scan for flatliners. It may be that some respondents only appear to be flatlining, when they are, in fact, giving honest answers. This may have to do with the way you’ve asked your questions (for example, if you placed student, 18-22 years, unmarried, and no kids all as the first answer options to questions asking about employment, age, marital status,and children). You can only determine this by looking at respondents’ individual answers—but don’t bother with this if less than 10% of your survey sample is flatlining.


Gibberish and Contradictory Answers

These types of responses can be harder to spot. They require you to look, line by line, at answers to open-ended questions in order to identify ones that 1) are gibberish (i.e., dk3i8sw) and/or 2) don’t correspond to other answers in that row. For example, if someone says they are single at the beginning of the survey, then mention their “wife” or “husband” in a later open-ended question, delete that respondent. They are not being honest, and you want data that you can stand on.

Typically, your data cleaning shouldn’t result in discarding more than 15% of your responses. If you’re worried you’re throwing out too many, take a closer look at the ones you’re throwing out. Consider keeping a few that give other indications of being good, honest answers or the ones that flagged on only one of the three criteria listed above.

Next, we’ll explore how to analyze your data to extract true insights—ones that will empower you to find investors and launch a product that consumers love.


Recommended book

Zero to One: Notes on Startups, or How to Build the Future by Peter Thiel, Blake Masters


Share with friends