Getting started with data science in Microsoft Fabric

Introduction

Microsoft Fabric is reshaping the data landscape by offering a unified platform where data engineering, data science and business intelligence converge. As Fabric is SaaS, it’s easy to get started with a few clicks of the mouse to set up the resources required.

This blog gives an outline of the tools available to data scientists and analysts in Fabric to help you hit the ground running.

Tools of the trade: What Fabric offers data scientists

Fabric brings together several components under one roof:

  • Lakehouse: Combines the flexibility of data lakes with the structure of data warehouses. Ideal for storing raw and curated data for Machine Learning (ML) workflows.
  • Notebooks: Built-in Jupyter-style notebooks support Python, Spark and SQL. You can write, test, and visualise your code directly in Fabric.
  • Dataflows Gen2: A low-code way to ingest, transform and prepare data. Perfect for building repeatable ETL (Extract, Transform, Load) pipelines.
  • ML Integration: Connectivity with Azure Machine Learning for model training, deployment and monitoring.

In essence, these tools make Fabric a one-stop shop for data science and analytics.

Workspaces: Your collaborative playground

Workspaces in Fabric are more than just folders – they’re collaborative environments where teams can organise and manage all their artifacts:

  • Lakehouses, notebooks, dataflows and reports live side-by-side
  • Role-based access control ensures secure collaboration
  • Versioning and Git integration help maintain reproducibility and track changes.

The workspace is where data scientists, engineers and analysts can work together without stepping on each other’s toes.

Pipelines: Automating the data science lifecycle

Fabric supports end-to-end pipelines that streamline your workflow:

  1. Data ingestion: Use Dataflows Gen2 to pull in data from SQL, APIs or cloud storage.
  2. Data preparation: Clean and transform data using notebooks or Spark jobs.
  3. Model training: Train models directly in notebooks or via Azure ML integration.
  4. Deployment & monitoring: Push models to endpoints and monitor performance.

Pipelines can be scheduled, triggered by events or manually run, giving you full control over automation.

A quick example: Predicting customer churn

Let’s say you’re building a churn prediction model. The following high-level steps can be followed:

  1. Ingest customer data using Dataflows Gen2.
  2. Store data in a Lakehouse and explore it with Notebooks.
  3. Train a classification model using scikit-learn or PySpark MLlib within a Notebook.
  4. Deploy the model via Azure ML.
  5. Visualise predictions in Power BI.

All of this happens inside Fabric – no context switching, no data silos.  Fabric includes quite a few built-in Artificial Intelligence functions too.

Built-in AI functions

Fabric has several prebuilt generative AI tools that can be applied directly to your data from within a Notebook. Simple configurations can be performed through the ‘AI tools’ menu. Alternatively, if coding is required, it is often a single-line of code. The tools are centred around text functions, such as translating from one language to another, sentiment analysis, classification and responding to an input.

These functions are powered by GPT-4o-mini and are optimised for use with both Spark and Pandas DataFrames.

Final thoughts

Microsoft Fabric is more than just a new tool – it takes away the need to manually integrate different components, because all those components are already integrated.

By unifying the ecosystem, it empowers data scientists to focus on insights rather than infrastructure.

Breadcrumb Digital’s certified Fabric Analytics engineers, data engineers and data scientists can help you design, implement and make the most of all aspects of Microsoft cloud-based computing including data science utilising Fabric.