Alex Merced's Data, Dev and AI Blog

Fabric Agentic Analytics and Lakehouse Schema Design

By Alex Merced | Mon Jun 22 2026 09:00:00 GMT+0000 (Coordinated Universal Time) | 15 min read | fabric, lakehouse-schema, agentic-analytics

Cover Image

Fabric Agentic Analytics and Lakehouse Schema Design

Microsoft Fabric agentic analytics is a reminder that schemas, semantic models, and governed lakehouse design now shape AI behavior. For Microsoft Fabric users and lakehouse architects, the useful question is what changes in production and what simply sounds current.

agent-ready schema design architecture map

Why the Better the Schema Contract, the Less an Agent Has to Guess

The better the schema contract, the less an analytics agent has to guess.

I look at this through a simple production lens: what becomes clearer, safer, faster, or easier to operate once the pattern is real? That matters because agentic analytics changes the failure mode. A human analyst may notice when a table feels stale, a metric looks odd, or a query scans too much data. An agent can move through those steps quickly and confidently unless the platform gives it clear boundaries.

The contracts that already mattered now matter more. Storage layout, catalog behavior, semantic definitions, identity, quality, and cost controls have to be explicit enough for both people and software to rely on them.

What the Announcements Support

The strongest public sources for this topic are the Microsoft Build 2026 blog and the Microsoft Fabric agentic analytics update. I use those to ground the architecture and avoid treating every June 2026 announcement as a shipped production capability.

The evidence supports a few grounded points:

  • Fabric announcements reinforce the need for semantic catalogs and agent-ready analytical models.
  • Schema quality affects whether agents can choose the right tables and relationships.
  • The lesson applies beyond any one vendor stack.

That is enough to write a useful technical article. It is not enough to make sweeping claims. I separate released behavior, design direction, and market language because readers can work with careful distinctions. They cannot operate slogans.

The Architecture Pattern

An agentic analytics stack needs clean schemas, documented relationships, governed semantic objects, and execution policies. The model may translate intent, but the platform supplies the meaning. Fabric puts that conversation in front of Microsoft data teams, but the design principle is broader: agent behavior improves when schemas and semantics are explicit.

For this architecture, the five layers worth keeping distinct are storage, catalog, execution, semantics, and agent interface. Storage holds durable files and snapshots. The catalog resolves identity and access. The execution layer plans and runs work. The semantic layer gives business meaning. The agent interface controls which questions and actions are allowed.

The point is not to collapse those layers into one box. The point is to make the handoffs clear. If a user or agent asks for an answer, the platform should be able to explain which table, which snapshot, which definition, which identity, and which policy shaped the result.

Operating workflow for agent-ready schema design

A Production-Shaped Example

A team can test readiness with one domain, such as sales operations. Give the agent certified tables, relationship rules, metric definitions, and forbidden columns. Then ask period comparisons, territory questions, and exception cases. The failure modes will usually reveal missing schema descriptions or ambiguous business definitions.

The test should include a happy path and at least three failure paths. One failure should involve access. One should involve stale or incompatible data. One should involve an operational problem: retry behavior, latency, file layout, or unsupported engine behavior. Those failures are where the platform either earns trust or loses it.

The test should also produce evidence. Save query IDs, snapshot IDs when relevant, policy decisions, validation output, and rollback proof. If the team cannot find that evidence after the pilot, the rollout is not ready for a wider audience.

What To Measure

Start with four specific measurements:

  • Join path accuracy under agent queries: Track whether agent queries use the intended relationships or invent joins based on column name similarity. Bad join paths produce plausible-looking but wrong results. They are the most common silent failure in agent-driven analytics.
  • Metric certification coverage before agent exposure: Measure what percentage of the business metrics an agent can access are formally certified with an owner, a definition, an exclusion list, and a freshness expectation. Uncertified metrics in the agent surface are a data quality risk.
  • Forbidden-column access attempts: Count how many times an agent query attempts to access columns that should not be exposed, such as raw identifiers, intermediate calculation fields, or sensitive attributes. High counts signal that the semantic layer is not governing the access surface.
  • Relationship clarity tested across domain boundaries: Specifically test whether agent queries that cross domain boundaries, such as sales joined to finance or product joined to support, produce correct results. Cross-domain joins are where schema ambiguity shows up most clearly.

Measurements should connect to decisions. A metric that nobody uses is telemetry decoration.

Common Failure Modes

The first mistake is treating schema design as a feature checklist. A checklist can confirm that a capability exists, but it rarely explains how the capability behaves under pressure. The better move is to write the operating contract. Name the owner, supported versions, allowed operations, denied paths, and recovery steps before widening usage.

The second mistake is ignoring boundaries. Most lakehouse incidents appear where systems meet: catalog to engine, engine to storage, semantic model to query, or agent tool to policy. Boundary testing should be part of the rollout. If the happy path is the only path tested, the team has not tested production.

The third mistake is hiding tradeoffs behind category language. Words like agent-ready and governed can be useful, but they are not a substitute for mechanics. Translate every category phrase into behavior. What changes in access, latency, correctness, cost, or auditability? If the answer is vague, the design is still vague.

The most expensive failures are usually quiet. The query works, but it joins on the wrong key. The metric is defined, but the exclusion rule was not applied. The agent returns a number, but it used a raw column that an analyst would have known to avoid. Those failures do not show up as red alerts. They show up as eroded trust.

Guardrails for Agentic Use

Agents should start with certified views, narrow tools, and explicit tool descriptions. They should not receive broad table discovery just because a human analyst has it. A tool should say what it can answer, what it cannot answer, what freshness it requires, and whether it can write or only read.

I would also require a visible refusal path. If data is stale, the agent should say so. If the question crosses a policy boundary, the agent should stop. If the engine cannot perform the operation safely, the workflow should fail with a useful message. A graceful refusal is not a bad user experience. It is a trust feature.

Operational Checklist

Area Question to answer Why it matters
Ownership Who owns the table, catalog path, semantic definition, and agent tool? Incidents need named owners, not shared confusion.
Scope Which operations are allowed for this rollout? A narrow launch is easier to test and govern.
Identity Which human, service, or agent identities are mapped to access? Machine-speed access needs machine-enforced policy.
Evidence Which logs, query IDs, snapshots, or validation outputs are saved? Trust needs a trail.
Rollback What happens when the workflow is wrong or unsafe? Recovery is part of the design, not an afterthought.

This checklist is intentionally plain. If it feels too basic, that is usually a good sign. The hard part is not inventing a complex control framework. The hard part is answering the simple questions before users and agents depend on the system.

Open lakehouse operating model for agent-ready schema design

What Engineers Should Verify

Engineers should verify exact versions, APIs, identities, allowed operations, failure behavior, and audit records. Claiming "agent-ready schema support" is not enough. The team needs to know which schema patterns were tested, which join paths were validated, and which columns are explicitly excluded from the agent surface.

The practical test is simple: can another engineer reproduce the setup, trigger a bad join or a missing-definition failure, understand the logs, and fix the schema without asking the original author for help? If not, the schema documentation is not ready for broader adoption.

The setup should also be tested under realistic query diversity. Many schema designs behave well when queries follow a predictable pattern. They behave differently when agents combine filters, periods, territories, and exceptions in unexpected sequences. Production readiness is about the second case.

What Data Owners Should Verify

Data owners should verify meaning. The data owner should confirm the approved datasets, definitions, exclusions, freshness expectations, and sensitivity rules. Technical correctness is not enough if the business definition is ambiguous.

This review should happen before agent exposure. Once an agent can answer a question, people will treat the answer as part of the operating rhythm. That makes weak definitions more dangerous. If a revenue metric excludes credits, delayed invoices, or test accounts, that rule should be part of the semantic contract.

Data owners should also decide when a warning is required. Some stale data can still be useful with a visible caveat. Some stale data should block the answer. That is a business decision, not only a platform decision.

What Executives Should Hear

The summary should be plain: agentic analytics improves when schemas are explicit and semantic models are well-governed. The Fabric announcement validates a direction that teams across every platform should be moving toward regardless of which vendor stack they use.

The value is not novelty. The value is reducing the distance between governed business definitions and reliable agent answers. That shows up in fewer wrong joins, clearer ownership of business logic, better reuse of certified metrics, and safer agent behavior when questions get complex.

Executives should also hear the limit. No semantic model removes the need for data ownership, quality checks, access control, and cost management. Agentic systems raise the bar for those disciplines because they can ask more questions and take more steps in less time.

A Good First Rollout

Start small. Pick one domain, one trusted dataset, one semantic path, one or two identities, and one measurable workflow. Run it manually first. Then run it through the intended platform path. Then run it through the agent or automation layer.

Define success with three outcomes. The first is correctness: the answer or operation matches the expected business meaning. The second is safety: denied paths fail clearly. The third is operability: logs and evidence are good enough for another team member to understand what happened.

After that, expand by adding another domain, another agent pattern, another tool, or another operation. Expanding one dimension at a time is slower than a dramatic launch, but it creates fewer mysteries.

Deeper Design Notes

The design work should begin with the nouns in the workflow. Name the tables, snapshots, metrics, identities, tools, owners, and failure states. When those nouns are vague, every later discussion becomes vague too. A team can spend weeks debating architecture diagrams while still failing to say which dataset is authoritative or which identity is allowed to perform the operation. Start with the concrete objects and build outward.

The next step is to define the dimensions that actually change behavior: relationships, certified metrics, table descriptions, semantic models. These are not decorative details. They determine whether the system is predictable under real use. If one of those dimensions is unknown, write it down as an open risk instead of letting it hide inside a polished diagram.

Review Questions Worth Asking

The first question is simple: what must be true before this workflow is allowed to run unattended? That question usually exposes missing assumptions about schema documentation, relationship rules, and forbidden columns. A manual workflow can rely on an analyst knowing which columns to avoid. An automated workflow needs those rules enforced in the schema.

The second question: what should the system refuse to do? Refusal paths are a sign of maturity. They show that the schema design understands its own limits. A tool that always returns something is not necessarily helpful. Sometimes the best answer is that the join path is ambiguous, the metric is not certified for this use case, or the column is not approved for agent access.

The third question: who gets paged or notified when the workflow fails? If the answer is unclear, the rollout is not operationally ready. Agentic systems do not remove ownership. They make ownership more important because they can encounter edge cases faster than a human-only workflow.

A Realistic Pilot Shape

A realistic pilot should look like a sales operations domain exposed to an analytics agent through governed schema objects. That scenario is narrow enough to test and broad enough to reveal platform boundaries. It should include one normal path, one stale or invalid data path, one denied access path, and one bad join path. The point is not to make the pilot impressive. The point is to make it diagnostic.

The pilot should run in stages. First, run the workflow manually and capture the expected answer or state transition. Second, run it through the engine, catalog, or semantic path that production will use. Third, expose only the narrow tool or automation layer needed for the agentic version. If the results differ, stop and understand why before adding scope.

Include a repetition test. Run the same workflow repeatedly across a few days or a few schema changes. Many systems work once. Fewer systems remain correct after metric definition updates, relationship changes, or a new domain added to the agent surface.

Metrics That Should Drive the Next Decision

The first metrics to watch are join path accuracy, metric certification coverage, forbidden-column access attempts, and relationship clarity across domain boundaries. Those measurements should lead to a decision about whether to expand scope, fix schema gaps, hold, or roll back. If the team cannot name the decision connected to a metric, the metric is probably not useful yet.

A second group of metrics should track trust: how often did the workflow return warnings? How often did it refuse correctly? How often did users accept the answer without asking a human to re-check it? Those signals matter because trust is not created by a successful query alone.

A third group should track cost and maintenance. Faster answers are good, but not if they create schema drift, certification gaps, noisy alerts, or hidden data quality work. The goal is sustainable speed with clear meaning and policy.

What I Would Cut From a Weak Rollout

Cut broad table discovery first. Agents should not explore raw data estates as if every table is equally safe and equally meaningful. Start with certified views or narrow tools, then expand when the team has evidence.

Cut write access until read paths, audit records, and refusal behavior are boring. Writes add concurrency, rollback, and ownership questions. Some workflows need them, but they should not be the default just because the agent can call a tool.

Cut unsupported cross-domain queries from the first phase. Single-domain questions are easier to validate. Add cross-domain queries when the base case is already clean and well-tested.

The Practical Standard

The practical standard is not perfection. It is explainability under pressure. When a user challenges an answer, an operator should be able to explain which data was used, which relationships were joined, which definition shaped the result, and which evidence was saved. When a workflow fails, the failure should point to the layer that needs attention.

That standard is demanding, but it is realistic. It lets teams adopt agentic analytics without pretending that models make schema design disappear. The model is one participant. The schema contract is what makes the result usable.

My Recommendation

Take Fabric agentic analytics seriously as a signal, but do not limit the lesson to one vendor. The useful bar is simple: can the team explain the contract, test the behavior, enforce the policy, measure the result, and recover from failure?

If the answer is yes, the pattern deserves a production pilot. If the answer is no, the next step is not a bigger announcement. The next step is a narrower contract and a better test.

The portability point matters here. If agentic analytics depends on clean schemas and governed semantics, teams benefit from applying those principles over open data rather than trapping the lesson inside one workspace or proprietary format. Open lakehouse platforms like Dremio extend that lesson: the schema discipline that makes Fabric agents reliable should also make any open-data agent reliable.

For a deeper foundation on lakehouse and AI architecture, explore Alex Merced's books on data lakehouses and AI at books.alexmerced.com. To try a governed, open, agent-ready lakehouse in practice, start a free trial of Dremio's Agentic Lakehouse at dremio.com/get-started.

Check out these related posts:

Composable Semantic Layers for Analytical Agents

Mon Jun 22 2026 09:00:00 GMT+0000 (Coordinated Universal Time)

📬 Join the Mailing List

Get updates directly to your inbox.

Subscribe Now

Menu

Search