As marketers, we talk a lot about testing and experimentation.
When it comes to testing, we all like to think we’re doing it scientifically, but are we? Or are we just kidding ourselves with mere pseudo-science to bring a feeling of validity to what we’re doing?
In most cases, sadly, I believe it to be the latter.
Let me take you back.
About 5 years ago, I was writing a piece for a very well-respected conversion rate optimisation blog talking about some tests that we’d run in Google Ads.
We’d made every effort to test as scientifically as possible. We had changed a single variable. We tested to statistical significance, setting a defined sample size at which we would conclude our test.
We’d even created a script to help us split the traffic more evenly.
So far, so good. However, once the testing boffins read my post, it’s safe to say they had a few choice things to say about my supposed “scientific” testing approach.
Their critique largely focused on the sample becoming polluted and not controlling all of the variables.
More on that in a moment.
It was pretty humbling, but it really got me thinking not just about how we set up tests, but the context in which the tests are run.
If we think back to science lessons at school, whenever you’d run an experiment, you had a control and a test variable.
The number one rule with any test was that you just changed one thing; that way, you could be sure that if the outcome changed, the change caused the result.
The problem with ad testing is that the context in which the ad is shown changes, and there is very little you can do to control that.
If you’re showing ads in Google’s search results, every time an auction happens, different results are returned.
If you’re showing your ads across the country, there will be different competitors in different regions, so the ads that you’re appearing against will change depending on the location of the searcher.
The offers that your competitors are running will affect your performance. If a competitor comes in with a much better offer or discount, your CTR will drop significantly, which would have nothing to do with the change you’ve made to your ad copy.
As different competitors bid differently throughout the day, your ad will appear in different positions, affecting your CTR and the traffic you receive.
These external factors result in a huge amount of sample pollution, rendering your scientific testing approach risky at best and, at worst, useless.
Marketers apply complex scientific models like statistical significance and testing power to polluted samples, giving them a fantastically accurate determination of a completely inaccurate data set.
Something you’d hardly call scientific.
What is the solution if you want to keep on testing and reduce the variables as much as possible?
The best solution comes from the world of conversion rate optimisation. To reduce the impact of external pollution, such as a competitor changing an offer, you should keep the testing period as short as possible.
You still want to generate a lot of data to conclude the test with some certainty. However, in reality, this restricts the number of people who can run ad tests to those with high-volume accounts with a lot of click and conversion data.
To help you keep your testing period as short as possible, here are some things I’ve picked up that you can use;
Use consolidated ad structures; these are structures that drive a lot of volume through smaller numbers of ads, allowing you to get more data faster and test with more certainty.
Run your tests in the highest volume areas of your accounts. For example, if you run local campaigns, then you’d want to choose your highest volume locations.
When it comes to being scientific in your marketing testing, controlling the context in which the ads are shown and the variables that they create is vitally important.
Like conversion rate optimisation, scientific ad testing is really reserved for advertisers with large volumes of data that can run tests in very short periods of time, getting enough data to draw a conclusion that they can be confident in while limiting the amount of pollution in their sample set.
In reality, for those who have a small amount of click volume (if testing CTR) or conversion volume (if testing towards revenue or CVR), testing in a scientific manner is prohibitive due to the sample being too polluted by external factors.
So is scientific testing actually scientific?
When it comes back to answering the question then, is marketing testing “scientific”?
Claiming testing is scientific, in most cases, then, isn’t strictly true, or at least not in the hard science “standards of experimentation”.
But maybe it can with “soft sciences”.
Psychology is a science, but it’s often called a “soft science”. One of the reasons is the methodology.
Compare it to, say, physics or chemistry, a “hard science”. If we conduct an experiment to determine how iron rusts, we know that each piece of iron will be the same. Thus, the experiment can be rigorous and can also be replicated.
Within psychological experiments, it is impossible to control all variables, much like marketing, because every individual is a variable. We all have our own unique qualities. This makes the replication of our experiments far harder to achieve.
Therefore, we can run tests to the standards that would hold up to softer science standards, and by following the recommendations and keeping the sample sizes as unpolluted as possible, there is still plenty of success to be had.