Modern data integration with Azure Data Factory

More data, more data integration

As the volume and variety of data being captured by organisations continues to increase, so does the demand for data integration work.

Central data teams working on enterprise-scale data integration are concerned with breaking down silos and centralising data in trusted data assets such as data lakes and data warehouses.

Business teams often have more discrete data integration challenges such as interfacing with public datasets and preparing data for input into analytical models. Modern data integration also increasingly requires obtaining value from semi-structured and unstructured data.

It’s not surprising, therefore, that the tools needed to overcome modern data integration challenges are evolving at an ever-increasing rate. Microsoft’s latest release of Azure Data Factory is positioned to address the variety of modern data integration challenges.

Azure Data Factory

Azure Data Factory is a Microsoft cloud-based service for modern data integration. More specifically, Data Factory orchestrates the movement and processing of data.

Let’s break this sentence down. Movement, in this context, encompasses both moving and copying data. As an example, data is typically copied when it is ingested from an on-premise operational system to a cloud-hosted data store.

I use the term processing to encompass transformations, such as cleansing or aggregation, as well as more advanced analytics, such as machine learning.

Finally, orchestration refers to the fact that Data Factory does not perform all the data processing ‘work’ itself. It can perform some transformations but typically its role is manage the transmission of work to other data services for processing.

The following Microsoft graphic highlights the types of services Data Factory can interact with and the types of data activities it can perform.

An overview of the operations Data Factory can perform within a typical information flow

Enterprise-grade integration

As you would expect from a Microsoft Azure service, it is enterprise-grade, meaning it is robust, secure and scalable.

Data Factory integrates with over 100 data services. Its presence in a variety of Microsoft reference architecture diagrams highlights the key role it plays in many data solutions, from modern data warehousing to advanced analytics on big data.

Data Factory also integrates with non-Microsoft services including Amazon AWS cloud services (S3, RedShift, MWS), Google Cloud services (Big Query), SAP and Salesforce.

One big improvement in Data Factory v2 (released in 2018) is that it enables data integration workflows (known as pipelines) to be built using a visual development environment rather than relying upon coding.

Data Factory also supports automation, both in terms of provisioning the service as well as provisioning the pipelines that run on it. These features can significantly reduce the effort required to build and maintain modern data integration solutions.

Summary

Microsoft Data Factory is an enterprise-grade data integration service that enables teams to overcome modern data integration challenges. It integrates with a staggering variety of data services and the automation and code-free features significantly reduce the costs associated with building and maintaining data solutions.

If you’d like to know more about how Data Factory or other Microsoft services can help solve your data integration challenges then we’d love to chat!