Stark Informatics
Home · Solutions · Data Science

Data Science

The Fabric workload for ML and applied AI: notebooks, MLflow-tracked experiments, model registry, AutoML, plus the runtime for governed Data Agents. The path from prototype to production model.

GAWorkload · Data Science· 9 min read

What it is

Data Science in Fabric layers ML capabilities on top of the Spark and Lakehouse foundation. Notebooks for development, MLflow for tracking, a model registry for promotion, and AutoML for fast baselines. Models trained here are first-class citizens — they can be invoked from notebooks, pipelines, T-SQL via PREDICT, KQL via inline ML, or surfaced through Data Agents.

Experiments

An Experiment item groups MLflow runs. Each run tracks parameters, metrics, artifacts, and the model itself. Side-by-side comparison and metric charts come built in. Use one experiment per modeling task (e.g., "Customer Churn") and one run per training pass.

ML Models & registry

A Model item is the registered, versioned model. Versions can be promoted (e.g., "Staging" → "Production") with role-based approval. Models are stored in OneLake as MLflow artifacts — portable across Fabric, Azure ML, and external runtimes.

AutoML

Fabric's AutoML covers tabular forecasting, classification, and regression. It runs multiple algorithms with sensible defaults and surfaces the best model. Excellent for baselines; rarely the production model — use the AutoML result as a benchmark, then build the real model with that as the floor.

Serving in production

  • Batch scoring via Spark notebook on a schedule — the most common pattern.
  • T-SQL PREDICT against a registered ONNX model — for in-warehouse scoring.
  • KQL inline ML for time-series scoring inside KQL queries.
  • Real-time scoring via Azure ML endpoint, called from a Fabric pipeline or Activator.
  • Surfaced through Data Agents — agents can call models as tools.

Best practices

  • Train on Gold-tier data, not raw. Same governance as analytics.
  • Track every run. Untracked experiments are unreproducible experiments.
  • Promote with intent. "Production" should require explicit approval.
  • Monitor drift. Compare scored distributions to training distributions on a schedule.

From notebook to production model

Our ML Fastlane accelerator covers forecasting, classification, and embedding patterns with MLflow tracking and a feature-store layout in OneLake.

See the accelerator