The Last Mile From Analysis (Answers) to a Data Mart (Assets)

Jun 21, 2026

An analyst’s notebook is not a pipeline — and the gap is wider than it looks

Most useful analysis dies in a notebook. Someone writes a query that answers a real question, the answer ships in a deck, and the query is never seen again. When the same question comes back a month later, someone rewrites it — slightly differently, with a slightly different number.

The fix is well understood: turn that query into a durable asset in a data mart, scheduled, validated, and reusable. The problem is that the distance between “a query that returns the right answer” and “a pipeline asset an organization can depend on” is much larger than it appears, and the people best placed to close it — the analysts who understand the business logic — are usually the least incentivized to do so.

I spent a stretch recently converting a piece of validated analysis into a proper mart table, with an AI coding assistant as a pair. It changed my view of where the real barriers are, what AI actually moves, and — importantly — what it doesn’t.

The technical barrier: a wall made of small cliffs

A working analytical query and a production asset are different kinds of object. Crossing between them means absorbing a stack of concerns that have nothing to do with the question you set out to answer:

One-shot becomes idempotent. A SELECT you run once becomes a write that must produce the same result every time it runs, overwrite cleanly on re-run, and target exactly one partition per run.
Implicit schema becomes an explicit contract. Column names, types, and order stop being incidental and become a contract that has to stay in sync across the SQL, a schema spec, and a DDL definition — with automated checks that fail the build if they drift.
“Run it now” becomes scheduling. You inherit rolling date macros, dev/prod parametrization, and a surprising number of ways to get the date arithmetic subtly wrong.
Standalone becomes dependent. The asset has upstreams, and the platform wants you to declare them — readiness signals, ordering, what waits on what.
“Trust me” becomes CI (Continuous Integration). Naming rules, schema-match checks, dependency checks, catalog generation — a gauntlet your change must pass before anyone will look at it.
One run becomes a backfill. Now you must reason about history: how far back, in what order, and whether re-running is safe.

None of these is hard in isolation. Each is a small cliff. Stacked together, they’re a wall — and every one is an opportunity to ship something subtly wrong.

The incentive barrier: the one nobody puts in the diagram

The technical barrier is the one people talk about. The incentive barrier is the one that actually keeps the asset from getting built.

Analysts are measured on answers and on speed, not on maintainable assets. Productionizing a query is slow, its payoff is deferred, and the credit largely accrues to whoever uses the asset later. The platform itself has a learning curve, and that cost is paid by the individual for a benefit that is mostly collective. The “correct” alternative — hand it to a data engineering team — has its own friction: a queue, a spec, a handoff in which the very business context that made the analysis correct gets lossy, and the analyst demoted to a ticket-filer chasing their own request.

So the rational analyst, most of the time, just keeps the query in the notebook. The asset never gets built. The knowledge stays ephemeral. This is not a failure of diligence; it’s a predictable response to the incentives. The barrier was never only “can they?” — it’s also “is it worth it to them?” And often, honestly, it isn’t.

Where AI moves the line

This is the part that surprised me. The assistant was most valuable not as a code generator but as a translator and navigator — it collapsed the part of the wall that is pure tax.

Boilerplate and compliance. Scaffolding the asset to the repo’s conventions, keeping the schema/DDL/spec in lockstep, generating the dependency wiring, and grinding the change through the CI gates until they were green. The “translation tax” from analysis idiom to engineering idiom dropped sharply.
Platform navigation. The tribal knowledge — how deploys work, which date-macro forms silently misbehave, what a readiness marker actually does — is normally extracted from an engineer’s calendar. Here it was a conversation.
Tight validation loops. Replicate the trusted number, diff, hypothesize the cause of a mismatch, fix, re-run. Fast.
Operational scaffolding. When a backfill ran, the assistant watched the outputs land and flagged — quickly — that something was off.

The effect on the incentive math is the real story. When the fixed cost of productionizing drops, more analyses clear the bar where benefit exceeds cost. Work that wasn’t worth a handoff becomes worth doing yourself. The analyst can own more of the last mile — for localized, individual or small-team needs — without a full throw-over-the-wall.

Where AI did not — and could not — carry the weight

Here’s the part I want to be honest about, because the temptation is to stop at the previous section.

Every consequential moment in the project was a matter of judgment and skepticism, not typing — and those were mine to supply.

A backfill “succeeded” and wrote a tidy set of partitions. They were all empty. Understanding why required knowing that the table was anchored on a single point in a synchronized billing cycle, so most calendar dates legitimately have no rows — the empties were correct, not a bug. The assistant helped me run that down quickly, but only after I distrusted a green checkmark and asked the question.

Two tables looked joinable on their partition date. They aren’t — the same entity is anchored on different events in each, so the only correct join key is the entity id, and joining on the date would have silently produced almost nothing. That’s a contract-level fact about the data model, not something to infer from syntax.

An upstream turned out to be a view with no completion signal, which meant the “obvious” way to wire a dependency would have made the job wait forever — and the right response depended on understanding the difference between scheduled runs and backfills. And when I floated a redesign that would have made backfilling trivially easy, the right call was to not do it, because it would have traded away a clean one-row-per-entity contract and invited double-counting. That’s an architectural value judgment.

The pattern is consistent: AI accelerates generation and investigation; the human supplies the right questions, the skepticism, and the trade-off decisions about contracts, semantics, and scale. The moments that mattered were someone saying “this number looks wrong” or “but these segments don’t behave the same” — and then the tooling helped chase it down.

And note where the guardrails came from. The schema checks, the dependency rules, the review, the conventions the assistant dutifully satisfied — engineers built those. The AI operated within that frame; it did not invent it. Take the frame away and the same assistant will happily generate a swamp of plausible, subtly wrong, unmaintained assets.

Not replacement — redistribution

I don’t think any of this means an organization needs less data engineering. It means the boundary of what an analyst can responsibly own has moved.

The roughly 80% that is boilerplate and platform-navigation becomes self-serve. The 20% that is architecture, shared contracts, reliability, and scale stays with engineers — and with the analyst’s own judgment, now better-informed because the tooling made it cheap to investigate. The healthy version of this is fewer trivial tickets in the engineering queue, more analyses becoming durable assets, and engineers freed for the genuinely hard platform work that makes analyst self-serve safe in the first place. The better the guardrails, the more an analyst can be trusted to do alone — so investing in the platform matters more, not less.

The failure version is just as easy to reach: hand people a code generator without the guardrails or the skepticism, and you get a lake of confident, broken pipelines that nobody owns. The win is conditional.

So the honest summary is small and, I think, durable. The barrier to turning analysis into engineering was never only technical — it was also about incentives, and AI lowers both. An analyst can now cross more of that last mile on their own. But the bridge still rests on engineering foundations they didn’t build, and on judgment no model supplied for them.

ZhiZhi Gewu

Discussion about this post

Ready for more?