← All writing
AI · ProductJune 2026· 8 min read

Build the clone before you build it

Why every product team should stand up a full AI simulation of a feature first — synthetic users, in-team A/B tests, self-evals — and arrive at the real build already knowing the answers.

Pier Stein

Pier Stein

Product · Growth · Investment Products · AI

I keep meeting teams that argue about a feature for three weeks, ship it, and then discover in production all the things they could have known on day two. The artefact they argue over is a document. Nobody can touch it, nobody can use it, so everybody projects their own version of it and calls the disagreement strategy. I have stopped working this way. Before I commit real engineering to anything now, I build a parallel clone of it in simulation, with AI, and I learn everything I can there first. By the time the real build starts, most of the hard questions are already answered.

The clone is the new spec

A specification is a promise about a thing that does not exist yet. A clone is the thing, just disposable. AI has collapsed the cost of standing one up from a quarter of engineering time to an afternoon. I can wire a working facsimile of a feature, with plausible data and real interaction, faster than the team can schedule the meeting to debate it. That changes what a product decision even is. You are no longer reasoning about a description of behaviour, you are reasoning against behaviour you can poke. The quality of the conversation goes up immediately, because everyone is finally arguing about the same object.

This is not a prototype in the old sense, a clickable shell that fakes the depth. The clone runs the actual logic well enough to surprise you. That is the whole point. You want it to do something you did not predict, because that is the edge case you would otherwise have found in production, at the worst possible moment, in front of real users.

You can test extraordinarily well inside it

Once the clone exists, the testing you can do is genuinely better than what most teams manage post-launch. I run A/B tests inside my own team against two versions of the clone and watch which one people actually reach for. I exercise real product judgement against a live thing instead of litigating a hypothetical. Disagreements that would have taken a fortnight of opinion get settled in an hour of use, because the artefact answers back.

The part that still feels slightly unfair is synthetic personas. I build simulated users that represent my real segments, the anxious first-time investor, the lapsed user, the elderly caller who does not trust the phone, and I run them through the clone and collect their reactions. On top of that I run self-reflective evals, where the system critiques its own behaviour and flags where it was confusing, slow, or wrong. You get a feedback loop that runs overnight and surfaces a hundred small failures before a single real person has seen the thing.

The obvious objection, answered honestly

A simulation is not reality. Synthetic users are not real users, they cannot be surprised in the way a real human is surprised, and they will happily agree with a bad idea if you build them carelessly. This is true and I will not wave it away. But it misreads what the clone is for. The clone is not there to replace the real launch. It is there to compress learning and de-risk the build. You still ship for real, you still measure real behaviour, you still let the market be the final judge. The difference is that you arrive at the real build already knowing most of the answers, instead of discovering them at full cost with real users as your test subjects.

Put plainly, the synthetic phase is for the failures you can predict if you only bother to look. The real launch is for the failures you genuinely cannot. Spending your launch budget rediscovering the predictable ones is the actual waste.

Where the value really lands: epics and acceptance criteria

Here is the part engineers feel most. When I have lived inside a clone for a few days, I can write an epic that is not a wish, it is a map. Every edge case I tripped over is a criterion. Every moment the synthetic personas got confused is an acceptance test. Every self-eval failure is a definition of done. The hand-off stops being negotiate-as-you-go and becomes here is exactly what good looks like, here is how we will know we got there. That is the difference between a feature that is fine and one that is genuinely amazing, and almost all of it is discovered in simulation rather than in the bug tracker.

The economics nobody priced in

The reflex objection is that this is extra work up front. It is, and it still makes you faster, because rework is the real tax on software. Building the wrong thing well, then rebuilding it, is the single most expensive thing a team does, and it almost never appears as a line item. When you build to a high standard on the first real try, because you already know what good looks like, you stop paying that tax. You ship better and you ship sooner, which sounds like a contradiction until you have done it, and then it is just obviously how it should always have worked.

This is not theory for me. It is how I build Invest, a Revolut-grade investing simulator, and Call My Agent, a phone AI for the elderly, both solo. A team of one cannot afford rework, cannot afford a fortnight of opinion, cannot afford to learn in production. So I learn in the clone, feed it back, and build the real thing once, properly. The method is not a luxury for big teams with slack to spare. It is the reason one person can ship production-grade AI products at all, and I think within a couple of years it stops being my edge and starts being the baseline everyone is held to.