From the ClearFracture Team

Data Engineering in the Age of Agentic AI

The landscape of enterprise data is shifting fast — AI agents are rewriting the rules of pipeline engineering, governance, and trust. We publish what we learn building at the frontier so your team can move faster and decide with confidence.

No spam. Unsubscribe anytime.

Clear Fracture’s Belvedere™ Assessed “Awardable” for Department of War work in the CDAO’s Tradewinds Solutions Marketplace

2 min readPress ReleasePublished July 27, 2026

FOR IMMEDIATE RELEASE

Vienna, VA — July 27, 2026 — Clear Fracture LLC, developer of Belvedere™, the Agentic Data Manager, today announced that it has achieved “Awardable” status through the Chief Digital and Artificial Intelligence Office’s (CDAO) Tradewinds Solutions Marketplace.

The Tradewinds Solutions Marketplace is the premier offering of Tradewinds, the Department of War’s (DoW’s) suite of tools and services designed to accelerate the procurement and adoption of Artificial Intelligence (AI)/Machine Learning (ML), data, and analytics capabilities.

Belvedere puts AI agents to work as data engineers. Analysts and mission owners describe what they need in plain language; Belvedere’s agents discover the source data, design the transformations, and compile them into governed, production-ready pipelines that run on the organization’s existing infrastructure. The agents build the pipeline; they are not the pipeline. Every pipeline they produce is transparent, auditable, and repeatable, and it runs as ordinary code, keeping operations cost-efficient at mission scale.

“Mission teams lose too much time wiring data together by hand, and the systems that result are hard to trust and hard to maintain,” said Brian Frutchey, Chief Technology Officer of Clear Fracture. “Belvedere’s agents do that engineering work in the open. Every pipeline they build can be inspected, audited, and run again tomorrow. Awardable status through Tradewinds gives DoW customers a direct path to put that capability on contract.”

Read whole article

A Write-Audit-Publish (WAP) Skill for Agentic Data Pipelines

Haydn Strauss4 min readData EngineeringPublished July 14, 2026

AI agents are great at building data pipelines that look like they work until you dig into the results.

Write-audit-publish (WAP) helps fix that. Stage the data, audit it against a declared contract, and only publish once every clause passes. Netflix popularized this pattern in 2017.

A pipeline that finishes successfully is not the same as one whose output is correct.

We’ve built a number of internal skills to make our own data pipelines safer, and this one felt useful enough to release as a free WAP skill for coding agents.

The first test was on Netflix’s Top 10 dataset. The initial run stopped at the gate. Our contract said every film should have “N/A” as the season title, but the agent found nine rows that didn’t match. The contract was wrong, not the data. We fixed it, started a fresh run, and the second attempt published cleanly, with the total reconciling to exactly 185,656,120,000 hours viewed.

We ran it again on an NFL play-by-play pipeline (converting play description strings into structured stat tables). It caught a parser bug that left 1,723 completed passes without matching receptions, exactly the kind of thing a "successful" run hides.

Below, we dig a bit more into how the skill works. Give it a read, or point your coding agent at this URL and try it yourself.

Read whole article

The Foundation Behind Reliable AI Agent Analytics

Haydn Strauss4 min readAnalyticsPublished June 16, 2026

Anthropic recently published how it runs self-service analytics on Claude. One result caught my eye: context + skills took its analytics agent from 21% accuracy to consistently above 95%.

Highlighting that generating SQL is the easy part, the hard part is everything underneath it: canonical datasets, a semantic layer, lineage, maintained skills, and provenance on every answer.

That jump came from the foundation, not a bigger model. With the context right, the agent on top matters much less.

Why Agents Alone Fail

In addition to cost, three context problems keep coming up.

Entity ambiguity. "Active users" or "revenue" has several definitions in the warehouse. The agent picks one and writes correct SQL against the wrong data.
Staleness. The definition was right when written. Then the pipeline changed and the skill was never updated.
Retrieval failure. The right definition exists somewhere, but the agent can't find it, or grabs the wrong version.

Two of these, staleness and retrieval, can't be fixed easily by prompting alone. They need the context to be a versioned, owned asset wired to the pipeline it describes.

Anthropic tried the shortcut of handing the agent the raw query corpus, and accuracy barely moved. As they put it: "The information was there, the agent saw it, and it still didn't use it."

Read whole article

From Belvedere Pipeline to Flue Agents: A Skeptical Pick of the 2026 World Cup Winner

Haydn Strauss9 min readAnalysisPublished June 9, 2026

We build AI systems for a living. In production today, that means LangGraph wired into Belvedere: governed pipelines, human approvals, audit trails, provenance, etc.

But for this project, I wanted to kick the tires on something new: Flue, the agent framework from the Astro team. I’ve long been a fan of Astro for web development, so when they released Flue, I wanted to take it for a spin.

Initially I went down the path of adding Flue to background tasks (bug ticket sync, feedback triage, opportunity discovery, etc), but then decided to build something a little more fun, an 'agentic analyst org' powered by Flue.

It's completely free to use if you want to head to https://www.belvederelabs.ai/ and try it out with your data.

Flue Analyst Org | Belvedere Labs: drop a CSV and get an analyst's answer in a couple of minutes.

The data: 3,759 international matches assembled by Belvedere

We did not hand-roll the dataset. We built it in Belvedere, the same governed pipeline system we use in production.

The pipeline ("World Cup International Match Dataset") is eight nodes and seven edges in the canvas that was assembled by connecting our source API and using the following prompt:

"Pull senior men's international football fixtures and per-match statistics from the API-Football REST API, then computes leak-free pre-match features with all available statistics"

Read whole article

OpenTofu vs. Terraform: Choosing an IaC Control Plane for Belvedere

Zech Crannigan5 min readProductPublished June 2, 2026

OpenTofu and Terraform solve nearly the same day-to-day problem. For Belvedere's current stack, AWS, EKS, Kubernetes, Helm, and Git-based CI/CD, the normal workflow is effectively equivalent: write HCL, use providers and modules, plan changes, apply through controlled automation.

The meaningful differences are licensing, governance, managed-service alignment, and how each tool handles state and plan artifacts.

For Belvedere's greenfield infrastructure work, OpenTofu fit the constraints we cared about most: open-source licensing, tool-layer state and plan encryption, active community governance, and portability across commercial, regulated, and customer-controlled environments.

Terraform remains a mature tool with the larger ecosystem, deeper HCP Terraform integration, and Terraform Stacks for teams centered on HashiCorp's managed control plane. Our decision was narrower: Belvedere was starting fresh, and OpenTofu gave us the Terraform-style workflow without taking on Terraform's current licensing and vendor-alignment tradeoffs.

Why This Matters

Infrastructure-as-code is not just deployment scripting. It becomes the control plane for cloud accounts, Kubernetes clusters, network boundaries, IAM, secrets-adjacent configuration, and recovery.

Read whole article

How Political and Organizational Friction Corrupts the Data Mesh

Keith Schumacher7 min readBest PracticesPublished May 20, 2026

The Data Mesh promised a revolution: domain teams owning their own analytical data as high-quality, self-serve products. No more central bottlenecks. No more months-long waits for the right dataset. Just autonomous domains delivering trustworthy data that consumers could actually use.

Yet in practice, many organizations watch their mesh fracture—not because the technology fails, but because the surrounding organizational structure and politics quietly erode its utility. Processes emerge that look like “governance” on paper but function as guardrails limiting what data consumers can discover, interpret, or act upon. The result is a decentralized architecture that, in reality, recentralizes control in subtler ways.

This isn’t accidental. It’s the predictable outcome of misaligned incentives and communication structures. In this post we’ll examine why Data Mesh was created, how organizational theory explains its common fractures, the specific mechanisms that limit consumer power, and—most importantly—how agentic data engineering can realign roles so the mesh finally delivers on its promise.

Why Data Mesh Was Introduced in the First Place

By the late 2010s, enterprises were drowning in data lakes and warehouses. Central analytics teams had become bottlenecks: every new consumer request required pipeline changes, schema approvals, and months of engineering time. Data quality suffered. Delivery slowed. Domains closest to the source data had the deepest knowledge but no ownership or incentive to maintain analytical products.

Read whole article

Introducing Belvedere: A Control Plane for Reliable Data in the Age of AI

Brian Frutchey5 min readData EngineeringPublished March 31, 2026

Everyone understands the power of data and its never-ending growth. Yet the amount of time we have to get the data we need stays the same. The "attention economy" is the result… and the most easily available data biases our decisions. The harder it is to gather and interpret data ourselves the more we allow third parties to control what we see. It is no secret that this leaves us open to manipulation. Consumers need to be put back in charge of their own destiny! But general data processing interfaces haven't changed much in decades (all hail SQL) and hoping a question-answering AI won’t hallucinate or be biased itself is asking for trouble – even if you can afford the tokens to process all the relevant data. So how can we reduce the months of work historically required to generate a new, reliable data product to minutes?

Clear Fracture was founded to remove the barriers between users and the right data. We believe AI agents are a huge part of the answer, allowing attention to be scaled through compute. Not enough hours in a day to research deeply? Task an agent to do that work on your behalf. Can't extract a needed insight fast enough from a mountain of data? Direct a swarm of agents to chew through the data at cloud scale to deliver the insight in moments. Worried your opinion or decision is biased? Have agents argue amongst themselves until they reach consensus from all the perspectives represented in available data. When you are in charge of your own AI army, it becomes your armor against manipulation and short-sightedness.

Read whole article