Data Engineering in the Age of Agentic AI

The landscape of enterprise data is shifting fast — AI agents are rewriting the rules of pipeline engineering, governance, and trust. We publish what we learn building at the frontier so your team can move faster and decide with confidence.

No spam. Unsubscribe anytime.

The Foundation Behind Reliable AI Agent Analytics

The Foundation Behind Reliable AI Agent Analytics

Haydn StraussHaydn Strauss4 min readAnalyticsPublished June 16, 2026

Anthropic recently published how it runs self-service analytics on Claude. One result caught my eye: context + skills took its analytics agent from 21% accuracy to consistently above 95%.

Highlighting that generating SQL is the easy part, the hard part is everything underneath it: canonical datasets, a semantic layer, lineage, maintained skills, and provenance on every answer.

That jump came from the foundation, not a bigger model. With the context right, the agent on top matters much less.

Why Agents Alone Fail

In addition to cost, three context problems keep coming up.

  • Entity ambiguity. "Active users" or "revenue" has several definitions in the warehouse. The agent picks one and writes correct SQL against the wrong data.

  • Staleness. The definition was right when written. Then the pipeline changed and the skill was never updated.

  • Retrieval failure. The right definition exists somewhere, but the agent can't find it, or grabs the wrong version.

Two of these, staleness and retrieval, can't be fixed easily by prompting alone. They need the context to be a versioned, owned asset wired to the pipeline it describes.

Anthropic tried the shortcut of handing the agent the raw query corpus, and accuracy barely moved. As they put it: "The information was there, the agent saw it, and it still didn't use it."

From Belvedere Pipeline to Flue Agents: A Skeptical Pick of the 2026 World Cup Winner

From Belvedere Pipeline to Flue Agents: A Skeptical Pick of the 2026 World Cup Winner

Haydn StraussHaydn Strauss9 min readAnalysisPublished June 9, 2026

We build AI systems for a living. In production today, that means LangGraph wired into Belvedere: governed pipelines, human approvals, audit trails, provenance, etc.

But for this project, I wanted to kick the tires on something new: Flue, the agent framework from the Astro team. I’ve long been a fan of Astro for web development, so when they released Flue, I wanted to take it for a spin.

Initially I went down the path of adding Flue to background tasks (bug ticket sync, feedback triage, opportunity discovery, etc), but then decided to build something a little more fun, an 'agentic analyst org' powered by Flue.

It's completely free to use if you want to head to https://www.belvederelabs.ai/ and try it out with your data.

Flue Analyst Org | Belvedere Labs: drop a CSV and get an analyst's answer in a couple of minutes.

The data: 3,759 international matches assembled by Belvedere

We did not hand-roll the dataset. We built it in Belvedere, the same governed pipeline system we use in production.

The pipeline ("World Cup International Match Dataset") is eight nodes and seven edges in the canvas that was assembled by connecting our source API and using the following prompt:

"Pull senior men's international football fixtures and per-match statistics from the API-Football REST API, then computes leak-free pre-match features with all available statistics"

OpenTofu vs. Terraform: Choosing an IaC Control Plane for Belvedere

OpenTofu vs. Terraform: Choosing an IaC Control Plane for Belvedere

Zech CranniganZech Crannigan5 min readProductPublished June 2, 2026

OpenTofu and Terraform solve nearly the same day-to-day problem. For Belvedere's current stack, AWS, EKS, Kubernetes, Helm, and Git-based CI/CD, the normal workflow is effectively equivalent: write HCL, use providers and modules, plan changes, apply through controlled automation.

The meaningful differences are licensing, governance, managed-service alignment, and how each tool handles state and plan artifacts.

For Belvedere's greenfield infrastructure work, OpenTofu fit the constraints we cared about most: open-source licensing, tool-layer state and plan encryption, active community governance, and portability across commercial, regulated, and customer-controlled environments.

Terraform remains a mature tool with the larger ecosystem, deeper HCP Terraform integration, and Terraform Stacks for teams centered on HashiCorp's managed control plane. Our decision was narrower: Belvedere was starting fresh, and OpenTofu gave us the Terraform-style workflow without taking on Terraform's current licensing and vendor-alignment tradeoffs.

Why This Matters

Infrastructure-as-code is not just deployment scripting. It becomes the control plane for cloud accounts, Kubernetes clusters, network boundaries, IAM, secrets-adjacent configuration, and recovery.

How Political and Organizational Friction Corrupts the Data Mesh

How Political and Organizational Friction Corrupts the Data Mesh

Keith SchumacherKeith Schumacher7 min readBest PracticesPublished May 20, 2026

The Data Mesh promised a revolution: domain teams owning their own analytical data as high-quality, self-serve products. No more central bottlenecks. No more months-long waits for the right dataset. Just autonomous domains delivering trustworthy data that consumers could actually use.

Yet in practice, many organizations watch their mesh fracture—not because the technology fails, but because the surrounding organizational structure and politics quietly erode its utility. Processes emerge that look like “governance” on paper but function as guardrails limiting what data consumers can discover, interpret, or act upon. The result is a decentralized architecture that, in reality, recentralizes control in subtler ways.

This isn’t accidental. It’s the predictable outcome of misaligned incentives and communication structures. In this post we’ll examine why Data Mesh was created, how organizational theory explains its common fractures, the specific mechanisms that limit consumer power, and—most importantly—how agentic data engineering can realign roles so the mesh finally delivers on its promise.

Why Data Mesh Was Introduced in the First Place

By the late 2010s, enterprises were drowning in data lakes and warehouses. Central analytics teams had become bottlenecks: every new consumer request required pipeline changes, schema approvals, and months of engineering time. Data quality suffered. Delivery slowed. Domains closest to the source data had the deepest knowledge but no ownership or incentive to maintain analytical products.

Introducing Belvedere: A Control Plane for Reliable Data in the Age of AI

Introducing Belvedere: A Control Plane for Reliable Data in the Age of AI

Brian FrutcheyBrian Frutchey5 min readData EngineeringPublished March 31, 2026

Everyone understands the power of data and its never-ending growth. Yet the amount of time we have to get the data we need stays the same. The "attention economy" is the result… and the most easily available data biases our decisions. The harder it is to gather and interpret data ourselves the more we allow third parties to control what we see. It is no secret that this leaves us open to manipulation. Consumers need to be put back in charge of their own destiny! But general data processing interfaces haven't changed much in decades (all hail SQL) and hoping a question-answering AI won’t hallucinate or be biased itself is asking for trouble – even if you can afford the tokens to process all the relevant data. So how can we reduce the months of work historically required to generate a new, reliable data product to minutes?

Clear Fracture was founded to remove the barriers between users and the right data. We believe AI agents are a huge part of the answer, allowing attention to be scaled through compute. Not enough hours in a day to research deeply? Task an agent to do that work on your behalf. Can't extract a needed insight fast enough from a mountain of data? Direct a swarm of agents to chew through the data at cloud scale to deliver the insight in moments. Worried your opinion or decision is biased? Have agents argue amongst themselves until they reach consensus from all the perspectives represented in available data. When you are in charge of your own AI army, it becomes your armor against manipulation and short-sightedness.