An introduction to Azure Machine Learning StudioData
This week I’m excited to be attending a three-day ‘Data and AI Airlift’ Microsoft event in Sydney that centres around how customers look at their Data + AI stack, what challenges they face, and how Azure’s services can cater to those challenges.
In line with this event I wanted to do a post that focuses on one particular service I’ve been working with in this space, Azure Machine Learning Studio.
First, let’s get our bearings and I’d like to point out that Azure Machine Learning Studio is, somewhat confusingly, distinct from the Azure Machine Learning service as explained below:
- Azure Machine Learning Studio, released in 2014, is a collaborative browser-based environment for advanced analytics. Its simple interface and rich gallery of preconfigured analytical solutions allows users to rapidly build, test and operationalise Machine Learning (ML) experiments.
- Azure Machine Learning service, released in December 2018, contains more advanced capabilities such as automated machine learning, integrated DevOps functionality and support for open-source frameworks like Tensor Flow. I’ll be discussing this service in more detail in a separate post.
Machine learning and the dark arts
While the flask icon of Azure Machine Learning Studio conjures images of Professor Snape’s potion classes, the tool goes some way to demystify the subject area through an incredibly intuitive drag-and-drop development environment.
It’s here that users are able to string together a series of ‘modules’ – code bases that perform a ML, statistical or data integration task – to form an end-to-end ML workflow.
Users with a background in data integration will find the user interface very intuitive and the workflow concept similar to data pipelines in other tools.
Azure Machine Learning Studio supports ingestion from a wide-range of data source from Azure Blob storage to on-prem SQL Servers and even manual data entry.
The tool offers standard pre-processing functions such as splitting and filtering as well as support for embedding SQL and R scripts for more custom needs.
An example workflow for a direct marketing classification problem
The tool supports an extensive set of in-built algorithms covering the standard ML problem areas of anomaly detection, classification, clustering and regression. Each algorithm comes with its own set of customisable parameters and the tool also has hyperparameter tuning for optimising parameter selection.
The tool simplifies the experimentation process by categorising modules according to the applicable step within the ML process – training, scoring, evaluation, etc. I found the following one-page diagram great as it encapsulates the high-level capabilities offered by the tool:
Overview of Azure Machine Learning Studio capabilities
Once a workflow has been built it can be tested and the development environment displays live progress as the modules in the workflow are sequentially executed.
To assist with debugging, users can view the data at any step in the workflow through a data visualisation window that also includes various statistical measures.
Azure Machine Learning Studio also has support for operationalising a ML workflow, which is a particular business challenge for analytical solutions. Once again, this is a topic for a separate post.
Microsoft also offer an impressive set of online support resources for the tool including:
- A gallery of pre-built Azure ML Studio solutions that are surprisingly interesting real-world scenarios – I’ve worked through a number of these and the supporting documentation holds your hand through the entire end-to-end process.
- An interactive guide that steps you through the process of identifying appropriate algorithms for a given scenario.
- A one-page ‘cheat sheet’ that articulates the application of different algorithms to various analytical challenges.
You do not require a Microsoft Azure subscription for the free pricing tier, which makes it incredibly easy to get a workspace environment set up where you can run small experiments.
As it is browser-based, there is no local installation required and the free tier supports up to 10GB of storage space, although execution is limited to a single node. You will need a basic understanding of the theory behind ML including knowledge of the process (model selection, training, scoring, evaluation) and an understanding of the algorithms that can be applied to solve different analytical problems.
The low barriers to entry, the usability of the tool and the extensive set of support resources makes Microsoft Machine Learning Studio a soft-landing for organisations who have analytical problems and are looking for an environment where they can begin experimenting.