Insights from building a Synapse Analytics Data WarehouseData
Data warehouses have evolved a lot from the early days of simply being a large, central data store that enables historic reporting. Today, data warehouses play the crucial role of being enablers for artificial intelligence and machine learning.
Synapse Analytics is the primary Azure service for delivering a data warehouse on the Azure cloud. (Note that Synapse Analytics was previously called Azure Data Warehouse.) Synapse Analytics comes with two flavours of distributed SQL Pools: Serverless SQL Pool, an auto-scaling pool optimized for ad-hoc analysis of data including files in a lake, and Dedicated SQL Pool, a provisioned pool with relational tables that can enable enterprise data warehouse services.
While there are a lot of benefits to using Synapse Analytics Dedicated SQL Pools to implement a data warehouse, there are some key differences to be mindful of, when compared to a traditional SQL Server environment:
- Non-sequential identity columns: Unlike traditional SQL Server databases, the identity columns in Synapse Analytics are not sequential. This is due to the distributed architecture, with up to 60 nodes, and the associated Massively Parallel Processing (MPP) engine. The identity columns are still unique, however.
- Lack of system variable support: There are some key SQL system variables which are not currently supported in Synapse Analytics, e.g. @@rowcount
- MERGE statement not fully supported: Currently, the SQL MERGE statement support is in preview mode in Synapse Analytics and Microsoft recommends not to use the MERGE statement in production environments. If you are building a data warehouse schema, you will therefore have to replicate the MERGE logic using a series of INSERT and UPDATE statements.
- Synapse pipelines limitations: Synapse Analytics comes with a built-in data integration and orchestration feature called Synapse Pipelines, which is very similar to Azure Data Factory. But Synapse Pipelines have some limitations when compared to Azure Data Factory Pipelines, e.g. Synapse Pipelines do not support SSIS integration runtimes and Power Query Activities.
While Synapse Analytics is the go-to service for designing and implementing an enterprise data warehouse on the Azure cloud, the above differences should be kept in mind when designing a data warehouse solution based on Synapse Analytics.