The biases quietly shaping your CRO program
Apr 28, 2026 3 min Written by Lori Cantwell

We ran a test where we added a “quantity left” indicator to a specific set of product detail pages, on items where the scarcity was real and the stock needed to move. Fourteen-day run. Two primary metrics defined up front (add-to-cart and click-through to checkout). Three guardrails. A decision threshold agreed with the client before the test went live.
The results came in at a 7% lift on add-to-cart (90% probability to beat control) and 5% on click-through (92%). Guardrails held clean. Revenue across the tested products up 45%. The client called it. The rollout happened.
That test is useful here not because it won, but because of how little room it left for bias to enter. That’s the thing about experimentation bias. Many of the decisions that matter most aren’t made when the data comes in. They’re made weeks earlier, in how the hypothesis is written and what gets agreed before anyone sees a number. By the time you’re reading results, the game is mostly over.
The biases worth watching in a CRO program aren’t always the ones in textbooks. They can be the small, systematic tilts that shape what a program learns over time, things like what gets recorded, what gets rolled out, what gets quietly forgotten about.
Here are a couple that I would argue matter most, and where they tend to hide within your CRO program.
Confirmation bias is one that creeps in very often, and no one is immune. Sometimes it shows up in the obvious way like a hypothesis written so it can only confirm what someone already believes. But the sneakier version is what happens after the data comes in. A primary metric lands flat, a secondary nudges up, and the secondary gets promoted to the finding without ever having to clear the bar the primary was held to.
Here’s the distinction that, in my opinion, matters more. Learning from a secondary metric is good. Shipping on one based on that alone isn’t. If the secondary is pointing at something more interesting, that’s genuinely useful, but the new belief has to earn its own evidence. It becomes the hypothesis for the next test, with its own primary metric and threshold. It doesn’t get to inherit the current test’s momentum just because something moved.
The Quantity Left test above is the clean version of how to keep this out. Everything the result would be judged against (i.e., metrics, guardrails, duration, decision threshold) was named before the test went live. There was nowhere for a retrofitted narrative to sneak in.
A quick note on thresholds: 90% probability to beat control is lower than the 95% default many CRO programs use, and it’s worth being direct about why that was the right call in that context rather than pretending the default is a law. Thresholds are a risk decision, not a universal standard. A time-boxed inventory play with aligned guardrails and a client who understood the trade-off is a different risk profile from a permanent sitewide change. The threshold matched the decision being made.
And that’s really what can defeat confirmation bias; discipline about what gets agreed before the test starts. The secondary can still teach you something. It just shouldn’t pretend to be the thing it wasn’t set up to measure.
One of the most underrated and least talked about biases, in my opinion, is segmentation. It shapes more decisions than people may realize.
Here’s how it plays out. A test reads as neutral overall, and the overall average is flat, so it gets filed that way. But no one segmented the results. A business decision was made that the result was flat, so the team moved on. They missed that mobile was up 14% and desktop down 4%. This changes the result, and thus the business decision. Both are signals. But now no one knows, because the average is what hides them.
Segmenting by device, by new versus returning, and by acquisition source at minimum turns “the test was neutral” into something actionable roughly a third of the time in my experience. The overall result is rarely the most interesting thing in the data.
This was a simple example, but segmentation can get quite deep. Overall averages are a starting point, not a finding. To really understand the numbers, you need to segment when you’re doing your analysis.
A lot of conversations about bias in CRO tend to focus on how to read results more carefully. That’s the wrong half of the problem. By the time you’re reading results, almost every decision that bias could influence has already been made.
The programs with the lowest bias exposure tend to share a small set of habits. Hypotheses written before data is reviewed. Primary metrics and thresholds named before the test goes live. Neutral results documented with the same weight as wins. Segments checked by default, not on request.
None of it is exciting and some of it will fall through the cracks under time pressure. Experience helps. It’s how we run programs and after enough of them you start to see these patterns further out and plan around them.
But the real point is that the quality of a CRO program is decided before any test runs, not after. Bias isn’t something you catch in the data. It’s something you design out of the setup.
Building a bias-free CRO program takes more than just discipline. It takes a roadmap. Let’s look under the hood of your current strategy together. Complete our growth assessment to uncover new opportunities for optimization and start making data-driven decisions with total confidence.
Explore our latest expertise on innovation, design, and technology, or connect with us directly to see how we can help accelerate your digital transformation.