Stark Informatics
Home · Solutions · Data Agents

Data Agents

Fabric's generally-available conversational analytics layer. Ask questions in plain English; the agent grounds in your Lakehouses, Warehouses, semantic models, KQL databases, and ontologies — read-only, Purview-respecting.

GAWorkload · Data Science / Fabric IQ· 10 min read

What it is

A Fabric Data Agent is a configurable, governed Q&A surface powered by Azure OpenAI Assistants. A user types a question in natural language; the agent picks the right data source, generates the appropriate query (SQL, DAX, or KQL), executes it, and returns a grounded answer.

Critically: the agent runs read-only, under the user's own identity, and respects Microsoft Purview data-loss-prevention, sensitivity labels, and tenant policies.

How it works

Three layers:

  1. Question parsing. The LLM understands the question and consults the configured instructions and example queries.
  2. Source selection & tool invocation. The agent decides which configured source can answer (semantic model? KQL DB? ontology?) and generates the right query.
  3. Execution & grounding. The query runs as the user. The answer is rendered in plain language with the data shown.

Data sources

One Data Agent can connect up to five data sources. Mix and match:

  • Lakehouses — for raw and curated table access
  • Warehouses — for T-SQL-shaped data
  • Power BI semantic models — best for already-modeled metrics
  • KQL databases (including Eventhouse-backed) — for time-series and high-cardinality
  • Ontologies (Fabric IQ) — for semantically-rich, agentic reasoning
  • Microsoft Graph — for people, calendar, and Teams context
i
Choose semantic models for trustworthy metrics. Asking a Data Agent to compute revenue from a raw Lakehouse table risks subtle errors. Point it at the curated semantic model where revenue is defined once.

Configuration that matters

  • Instructions (up to 15,000 chars). Tell the agent which source to use for which question type, define organizational terminology, and constrain off-topic responses.
  • Example queries (few-shot pairs). The single biggest accuracy lever. Provide 5–20 question/SQL or question/KQL pairs per source.
  • Table selection per source. Don't expose every table — pick the small set that answers most questions, in the right shape.

Deployment surfaces

  • Inside Fabric — chat surface in the workspace for power users.
  • Microsoft 365 Copilot — surfaces in Outlook, Teams, and Excel.
  • Copilot Studio — embed in custom apps as a custom skill.
  • Foundry agents — orchestrate with other Azure AI Foundry agents.

Best practices

  • Curate before you connect. Build the semantic model and ontology first; an agent on raw data fails the trust test.
  • Evaluate continuously. Maintain a fixed Q&A test set; measure pass rate after every change to instructions.
  • Limit scope per agent. A "Finance" agent and a "Operations" agent each focused beat one omnibus agent every time.
  • Log everything. Conversation logs become training data for your next instruction revision.

Common pitfalls

!
Treating the Data Agent as a chatbot. It's a Q&A engine. Out-of-policy "advice" requests aren't its job and will hallucinate.
!
Skipping example queries. Without few-shot examples the agent does its best with the LLM's prior. With examples it generates the SQL your team already wrote.

Build your first production Data Agent

Our Finance Data Agent accelerator gets you from "this is cool" to "this answered a real question for the CFO" in two weeks.

See the accelerator Talk to us