Building a data lake and data warehouse on Azure Synapse Analytics to enhance ATSICHS’s analytical capabilities.
ATSICHS sought to provision a data lake and modern data warehouse platform to remove the challenges they faced with integrating and reporting on data dispersed across many disconnected platforms.
Aboriginal and Torres Strait Islander Community Health Service (ATSICHS) Brisbane, has been the leading not-for-profit community owned health and human services provider to the Aboriginal and Torres Strait Islander Community in Queensland for over four decades.
From humble beginnings in the early 1970s, ATSICHS has grown to provide a diverse range of health, social and educational services throughout the Greater Brisbane and Logan areas.
ATSICHS’s IT strategic roadmap identified that its reporting capabilities were significantly limited by the heterogeneous nature of its technical platforms and reporting solutions. Breadcrumb Digital was engaged to deliver a homogenous analytics solution to assist ATSICHS in achieving its strategic priorities.
- integrating data from a broad range of cloud-hosted vendor systems through a variety of REST APIs and other interface formats
- logically organising data in the lake according to classification standards and access requirements
- implementing fine-grained access controls in the data warehouse to enable restricting access by table, column or row
- consolidating data from various health data sources and presenting a conformed view of deidentified patient data across service areas
- ensuring the solution is scalable, intuitive to configure and leverages automation wherever possible.
The project required the following skills and competencies:
- business analysis skills to work with stakeholders to clarify the required capabilities of the platform and articulate these to the technical team
- experience applying best practices with the Microsoft Azure data services including Storage, Data Factory, Synapse Analytics, and Key Vault
- technical analysis skills to interpret a wide array of data interface formats and design appropriate data load pipelines
- experience designing complex security models in Microsoft Azure
- data architecture knowledge and experience building large-scale data assets
- data warehouse dimensional modelling experience to enable intuitive views of the data built across multiple sources
- experience provisioning data platforms incrementally using an agile framework and automating deployments with DevOps.
Workshops were used in the initial discovery phase to validate the analytical capabilities required of the platform. This phase also prioritised the data sources to be ingested into the lake and the subject areas to build-out in the data warehouse.
An architecture phase was used to form agreement on the high-level design for the platform including the roles played by each service. Early connectivity tests were also carried out during this phase to the high priority data sources.
A four-person team was assembled for the build phase with the early sprints focusing on provisioning and securing the platform. The data lake and data warehouse were developed in parallel through the build phase. Data sources were incrementally ingested into the lake on a sprint-by-sprint basis. Similarly, the data warehouse was built out incrementally one subject-area at a time.
Azure DevOps was used to manage source control and where possible, the automated deployments from development to production.
A Microsoft Azure data platform was implemented incorporating:
- an Azure Data Lake Storage (ADLS) Gen2 solution, organised into logical zones, hosting around fifteen different data sources in formats such as Parquet and CSV
- a centralised data warehouse provisioned on Azure Synapse Analytics change tracking, and table/column/row level security
- two sets of data processing pipelines built and orchestrated using Azure Data Factory – the first set ingests and transforms data from sources to the lake while the second set loads data from the lake to the warehouse
- varying schedules set up to ingest and load data on a regular basis with appropriate monitoring to ensure pipelines are executed as expected.
Ultimately the provisioned data platform has addressed ATSICHS’s key challenges by providing a centralised data repository and enabling a variety of analytical capabilities.
More specifically, the data lake has already delivered value to a variety of projects by acting as a simple standardised data source for downstream consumers. ATSICHS is currently building-out the capabilities of their data team and they have started to extend the lake by applying the pipeline design patterns to new data sources.
Furthermore, ATSICHS will be able to draw previously unforeseen insights from their health/patient data in the warehouse, and visualise it in tools such as Power BI.
The data platform offers a foundational set of analytics capabilities. It has already shown to be extensible and it will be scalable to support the needs of the organisation over the years to come.
Find out more about what we do by taking a look at some of our other projects...
Department of Environment and Science
Implementating a new system in Microsoft’s Power Platform to replace a legacy application used to track the declaration and management of nature refuges and special wildlife reserves.
Glencore Coal Assets Australia’s Mobile Equipment Operator Awareness
Making mines safer through operator awareness reporting at a remote monitoring centre.
Natural Resources Company Analytics Platform
Provisioning an analytical platform to enable intuitive near-real time reporting.