Microsoft Fabric Notebooks — Spark, T-SQL, Python

What it is

A Notebook is a Fabric item that holds code, markdown, and visualizations. The same notebook can mix Spark and T-SQL cells. Notebooks run against Fabric Spark pools or against the Warehouse / Lakehouse SQL engine. They're the primary engineering surface inside Fabric — where transformations are written, ML models are trained, and ad-hoc analysis happens.

Languages & runtimes

PySpark (Python) — the default
Spark SQL — for set-based transformations
Scala and R — for Spark workloads in those languages
T-SQL — when querying a Warehouse or Lakehouse SQL endpoint
KQL — when targeting a KQL database

Environments

Environments package Python/R libraries, Spark configuration, and resource requirements. One environment can be reused by many notebooks — the right place to standardize on package versions.

Pipeline orchestration

Production notebooks are called from Data Factory pipelines with parameters: load_date, tenant_id, environment. The pipeline handles scheduling, dependencies, retries; the notebook does the work. Keep that separation clean.

Best practices

Source-control everything. Git integration is mandatory, not optional.
Parameterize. Use %%configure and pipeline parameters; never hard-code paths.
Idempotency. A re-run should never duplicate rows. Use Delta MERGE.
Small notebooks. One notebook per logical task. Long, sprawling notebooks become unmaintainable.
Use environments. Don't %pip install inside notebooks for production runs.

Common pitfalls

Treating notebooks as scripts. The interactive feel is misleading. Once a notebook runs in a pipeline, it's production code. Apply the same standards.

Notebooks

What it is

Languages & runtimes

Environments

Pipeline orchestration

Best practices

Common pitfalls

Related items

Notebook patterns that scale