What it is
A Notebook is a Fabric item that holds code, markdown, and visualizations. The same notebook can mix Spark and T-SQL cells. Notebooks run against Fabric Spark pools or against the Warehouse / Lakehouse SQL engine. They're the primary engineering surface inside Fabric — where transformations are written, ML models are trained, and ad-hoc analysis happens.
Languages & runtimes
- PySpark (Python) — the default
- Spark SQL — for set-based transformations
- Scala and R — for Spark workloads in those languages
- T-SQL — when querying a Warehouse or Lakehouse SQL endpoint
- KQL — when targeting a KQL database
Environments
Environments package Python/R libraries, Spark configuration, and resource requirements. One environment can be reused by many notebooks — the right place to standardize on package versions.
Pipeline orchestration
Production notebooks are called from Data Factory pipelines with parameters: load_date, tenant_id, environment. The pipeline handles scheduling, dependencies, retries; the notebook does the work. Keep that separation clean.
Best practices
- Source-control everything. Git integration is mandatory, not optional.
- Parameterize. Use
%%configureand pipeline parameters; never hard-code paths. - Idempotency. A re-run should never duplicate rows. Use Delta MERGE.
- Small notebooks. One notebook per logical task. Long, sprawling notebooks become unmaintainable.
- Use environments. Don't
%pip installinside notebooks for production runs.