Microsoft Fabric Data Engineering — Spark, Lakehouse, Spark Job Definitions

What it is

Data Engineering is the Fabric workload that gives you Apache Spark on OneLake — managed, scaled, billed in Capacity Units. It groups Lakehouses, notebooks, Spark Job Definitions, and environments under one umbrella. If you've used Synapse Spark or Databricks, this is the equivalent surface — minus the cluster management.

Spark pools & autoscale

Fabric provides starter pools and custom pools. Starter pools are pre-warmed (sub-10-second sessions); custom pools let you pick node size and autoscale ranges. Autoscale handles bursty workloads — set the maximum nodes to your CU ceiling and Spark figures out the rest.

Spark Job Definitions

A Spark Job Definition (SJD) is the production execution unit: a packaged JAR, PySpark file, or .NET assembly that runs on a schedule or from a pipeline. Use SJDs when:

The logic is stable and rarely changes
You need stronger packaging discipline than notebooks provide
Multiple workloads invoke the same code

Use notebooks for interactive development and orchestrated transformations; promote stable, library-heavy code to SJDs.

Environments

Environments package Python/R/Spark library versions and Spark configuration. Versioned, reusable across notebooks and SJDs. Treat them as production infrastructure — one environment per project; promote through dev/test/prod.

Best practices

Use starter pools for development. Sub-10-second session starts beat anything with cold provisioning.
Tune autoscale ceilings. The most common cause of unexpected CU spikes is an autoscale-uncapped notebook.
Optimize Delta writes. Set spark.databricks.delta.autoOptimize on, run OPTIMIZE on a schedule, VACUUM per your retention policy.
Right-size your data. If your transformations run in seconds on Pandas, you don't need Spark — use a single-machine Python notebook.

Data Engineering

What it is

Spark pools & autoscale

Spark Job Definitions

Environments

Best practices

Related items

Production-grade data engineering