How Important is Online Research Data Quality, Anyway?

As part of the TrueSample Product Strategy team, I’m constantly looking to the future to try to conceptualize what the next generation of data quality technology and solutions will look like. I’m thinking about which changes within our industry (like mobile surveys or social media sampling) will result in new data quality concerns that our TrueSample product roadmap will need to creatively address. And I’m interacting with those end clients who are committed to data quality and have already made it a critical and integral component of their research operations.

But occasionally I am reminded that not everyone is ready to move forward on the data quality issue. In my mind, we’ve solved the biggest burning issues with data quality: we’ve found a way to decrease the risk of making misinformed business decisions by removing fake, duplicate and unengaged panelists from research. And we’ve even invented ways to improve survey design so as to not drive good-intentioned respondents to bad behavior when responding to surveys. But I know that not everyone agrees that we’re ready to move into a new phase of research quality and it is with them in mind that I write this blog post.

Here are three phrases that I occasionally hear from those folks, let’s call them the “non-believers”: 1. “Quality isn’t really an issue with my research” 2. “Removing fake, duplicate or unengaged respondents doesn’t necessarily improve data quality anyway” 3. “Paying more for a data quality solution doesn’t make sense for me”

Let me address each one of these in turn:

Objection #1: “Quality isn’t really an issue with my research.” This is the phrase that shocks me the most. Data quality became a concern for many big research buyers way, way back in 2006. With statistics from The ARF, comScore and Peanut Labs clearly indicating that the universe of online survey takers is not nearly as big as we once thought it was, it became clear that none of us were surveying the people we thought we were. In addition, vocal buyers started citing research projects that went terribly awry leading to millions of lost dollars in bad business decisions. So, if that isn’t enough to scare folks into realizing that quality may be a concern with online surveys, then there’s the cold, hard facts: a. Typically, 18-21% of the panelists in a given panel provide names and addresses that cannot be found in the real world* b. On average, 3-5% of the panelists on any one panel are duplicates, meaning they joined the same panel multiple times using different identity information* c. If you are surveying respondents from more than one panel, approximately 23% of the panelists will overlap* d. Approximately 1-3% of people in a given survey speed or straightline through the survey*

Still think that your research is immune to fake, duplicate or unengaged respondents?

Objection #2: “Removing fake, duplicate or unengaged respondents doesn’t necessarily improve data quality anyway.” Some “non-believers” question whether fake, duplicate and unengaged respondents are providing unreliable data. They might say that these respondents’ answers are just as useful and legitimate as the responses from those respondents who are considered to be “valid” by TrueSample.

In reply, I offer two arguments. First, there’s the common-sense argument which says that if respondents are lying about their identities, then they are probably lying in response to survey questions as well. And if respondents are speeding through surveys, then they are not reading the questions thoroughly or providing thoughtful responses.

Then there’s the more scientific argument based upon on evidence. MarketTools has conducted multiple studies that show that the responses from fake, duplicate and unengaged respondents are different from responses from other respondents. The studies showed that respondents who were invalidated by TrueSample consistently gave higher ratings than the mean. They also showed that if your survey has 30% invalidated respondents taking it, you have twice the risk of making the wrong decision from the data you collect than if you had removed those fake, duplicate and unengaged respondents. [See the white paper: What Impact do Bad Respondents have on Business Results?]

That’s enough proof for me.

Objection #3: “Paying more for a data quality solution doesn’t make sense for me.” I suppose this statement makes sense if it’s made by someone whose research does not impact large business decisions or have a significant financial impact. I mean, if you’re just surveying your co-workers to find out if they like your new “2011 cats in hats calendar” then maybe you’re OK with unreliable data. But for anyone whose reputation, budget or job are dependent upon the quality and reliability of his or her research results, then this seems like a no-brainer. You get what you pay for. And if you’re not willing to assume twice the risk of making a wrong decision, then you definitely need a data quality solution in place.

Maybe it sounds too simple, but I firmly believe that if we use a data quality technology to remove fake, duplicate and unengaged respondents from our research, then we have three less things to worry about in our research. We shouldn’t accept anything less than “clean” research and once we do that, we can move forward to focus on bigger and better issues.

Leave a Reply