NAPCo

NAPCo sought to centralise their herd performance and carbon reporting data using a cloud-based platform, enabling them to streamline data preparation and enhance analytical reporting capabilities.

The client

The North Australian Pastoral Company (NAPCo) is a leading producer in Australia’s cattle industry, managing more than 200,000 head of cattle across six million hectares across Queensland and the Northern Territory.

With one of the world’s premier cattle breeding programs, NAPCo is recognised for its commitment to quality, innovation, and the long-term sustainability of Australia’s beef industry.

Background

NAPCo had commissioned Breadcrumb to develop a data strategy that established a clear and comprehensive plan for the future of data and reporting at the organisation. This strategy identified opportunities to improve data quality, reduce manual data preparation time, and accelerate report development.

Following on from the data strategy, Breadcrumb Digital was then engaged to deliver a centralised data platform, with the goals of resolving key data challenges and establishing a single source of truth for the organisation’s data. The delivery schedule for the data platform had to align with business timelines for enabling reporting on two priority subject areas: herd performance and carbon reporting.

The skills

The project required the following skills and competencies:

  • experience managing large complex projects and specifically using agile methodologies to incrementally deliver value
  • business and data analysis skills to work with stakeholders to capture reporting requirements and articulate these to the technical team
  • data architecture knowledge and experience building large-scale data assets
  • experience provisioning and configuring Microsoft data platforms
  • data integration skills to interpret a wide array of data interface formats and design appropriate data load pipelines in Synapse Analytics and supporting tools
  • experience diagnosing data quality issues and implementing corresponding data cleansing strategies
  • data modelling experience in SQL Server to integrate datasets from multiple sources.

Mark Habgood

Walter Leong

Patrick Dadey

Tony Ketteringham

Michael Cowls

The approach

An agile framework was used for the delivery, with roles and responsibilities being agreed for five Breadcrumb resources and two primary NAPCo staff members. Early sprints focused on discovery, architecture, and provisioning the platform.

Workshops were used in the initial discovery phase to gather information on the data sources for ingesting into the data platform. Furthermore, the required reporting subject areas were mapped against the list of data sources, which enabled a prioritised data ingestion ‘run sheet’ to be established.

Architecture artefacts were developed and were used to form agreement on the platform’s high-level design including the roles played by each Microsoft Azure service. Following agreement on the design, NAPCo provisioned and secured the platform with support from Breadcrumb.

An API assessment phase was executed to conduct early connectivity tests to high priority data sources. This phase also identified and prioritised data sources that could have a long lead time in gathering integration detail, such as those with third-party vendors.

As each data source became accessible, a high-level data quality assessment was conducted to identify data-related risks. Where data quality issues were present, data remediation strategies were developed and executed on a case-by-case basis.

The data lake and data warehouse were developed in parallel through the build phase. Data sources were incrementally ingested into the lake on a sprint-by-sprint basis. Similarly, the data warehouse was built out incrementally one subject area at a time, focusing on data that provided the highest reporting value.

Following delivery of the solution, Breadcrumb conducted training sessions to ensure a smooth transition to go live and enable NAPCo technical staff to be responsible for primary support of the solution.

Key challenges

The primary challenge was the source data, both in terms of obtaining the data through interfacing with 10-15 discrete sources, and the quality of the data itself. Some of the interfaces are not designed for the easy extraction of data or are not compliant with modern standards.

Overcoming this challenge firstly required performing an assessment on each interface, involving analysing existing documentation, testing connectivity with Postman, and in some cases, collaborating with NAPCo and third-party vendors to agree a secure ingestion approach. From a technical perspective, some challenges could be met by altering the interface. Other challenges were solved by construction of data pipelines that can handle variations and anomalies.

Issues with data quality were dealt with by selecting the most appropriate technology for the remediation work at hand. This enabled issues to be resolved cost effectively, and resulted in a combination of Python scripts, Azure pipelines and SQL procedures being used. Historical data was run through cleansing code to align values and data types. Data was also transformed to align across systems and standards, which ultimately enabled the solution to present a consistent view across all the available data.

The solution

A Microsoft Azure data platform was established consisting of the following components:

  • an Azure Data Lake Storage (ADLS) Gen2 account, organised into logical zones, hosting around fifteen different data sources in formats such as Parquet and CSV
  • a data warehouse provisioned on SQL Server, with a relational data model designed to enable herd performance and carbon reporting
  • a suite of data interface tools and data ingestion pipelines built on Azure Synapse Analytics and Python.
  • SQL procedures to extract data from the lake and then transform and load it to the SQL data model, based on agreed business logic.

The outcome

Delivery of their central data platform capitalised on opportunities to enhance NAPCo’s analytical capabilities. The organisation’s key data sets now continuously flow into their platform and various options are available for reporting – from raw versions of data sets to intuitive data models optimised for specific subject areas.

Data on the platform is also cleansed and standardised to simplify downstream reporting. These improvements significantly reduce the time taken for NAPCo staff to develop reports and enable employees to focus on identifying business insights rather than data wrangling.

The business was also able to utilise the new data platform to meet their priority reporting needs in the areas of herd performance and carbon reporting.


Find out more about what we do by taking a look at some of our other projects...