Est. reading time: 5 minutes
Author: Mia Hatton
Azure Batch is a service that manages the workload of applications. It is designed to take the workload that is greater than the capability of your application, and divide it between a number of nodes - virtual machines (VMs) - that can each run your application and perform different parts of the workload in parallel.
Azure Batch is a Microsoft Azure product. Read more about Microsoft Azure here.
Definition of Azure Batch
Azure Batch is a robust service that provides parallel batch processing to execute intensive workloads of varying size. It creates a pool of compute nodes (virtual machines) to tackle heavy loads. With Azure Batch, batch processing has become more streamlined and viable to complete data-intensive workloads at any scale.
Does you organisation need Azure Data Factory?
Azure Batch is a service that manages the workload of applications. A workload is the work assigned to an application over a given time period. Sometimes the workload of an application is greater than it can handle in that time period, for example if it needs to process vast volumes of data, and this can lead to slow processing time, crashes, and expensive server costs if you use a cloud-based server. Azure Batch is designed to take the workload that is greater than the capability of your application, and divide it between a number of nodes - virtual machines (VMs) - that can each run your application and perform different parts of the workload in parallel. It is like asking a group of people to each make a car component and bring them together, instead of asking one person to build a car on their own. Each node performs a task that is a subset of the overall workload. Azure Batch can create the nodes required to complete the workload, assign tasks to them, schedule the tasks, get the data it needs from your storage solution and pass it to the nodes, and scale the number of nodes to suit your budget and timescale.
You may need Azure Batch if:
- You offer a Software-as-a-Service application that needs to handle large volumes of data
- You have experienced crashes and slow processing times due to the volume of data you need to process
- You are developing (or want to develop) an IoT device that will generate data
- You need to test your application for peak load or usage before deployment
- You have a high-compute application that performs tasks such as media rendering, image analysis or complex simulations
Azure Batch lists the following example use-cases:
- Financial risk modeling using Monte Carlo simulations
- VFX and 3D image rendering
- Image analysis and processing
- Media transcoding
- Genetic sequence analysis
- Optical character recognition (OCR)
- Data ingestion, processing, and ETL operations
- Software test execution
- Finite element analysis
- Fluid dynamics
- Multi-node AI training
- Large-scale rendering workloads
Benefits of Azure Batch include:
- It can automate scaling of your computing power to suit your needs, from tens of VMs to thousands.
- You only pay for the computing power you actually use, and you can control the scale to suit your deadline and budget.
- It can deliver compute power on demand, rather than on a schedule.
- It can manage high-volume repetitive tasks with ease.
- It allows you to expand your data processing capability to handle large volumes of data.
- It allows you to scale your application without additional infrastructure costs.
Prerequisites and Integrations
To get started with Azure Batch, you need a Microsoft Azure account. Read more about Microsoft Azure here. You also need to create a Batch Account. The Batch Account is used to authenticate your application when you run tasks. You can create a Batch Account here.
Most Batch solutions use Azure Storage for storing resource files and output files. You can associate a storage account with your Batch account when you create the Batch account, or later. You can create an Azure Storage Account here.
One of the benefits of Azure Batch is that you can choose the operating system and developer tools that you use to run workloads. Nodes running Windows will accept Windows code, including Microsoft .NET, and with Linux there is a choice of distrbutions including CentOS, Ubuntu, and SUSE Linux Enterprise Server. When creating a pool of nodes, you can set it up to run tasks in Docker containers.
Nodes can run any executable or script that is supported by the operating system environment of the node. Executables or scripts include _.exe, _.cmd, *.bat, PowerShell scripts for Windows, and binaries, shell, and Python scripts for Linux.
One Batch is up and running you can monitor your applications via the Azure portal, the Batch Explorer tool, or from the command line.
Security and Compliance
Azure Batch is built on Microsoft Azure security infrastructure and uses all Microsoft Azure security measures. You can read more about Microsoft Azure security here.
All compute nodes in Azure Batch have configurable Firewall settings. When you create a pool of nodes they operate in isolation from other pools, so data is not processed or transported uneccesarily.
Microsoft Azure carries an impressive list of compliance certifications which you can view here.
Azure Batch is a pay-as-you-go service so you only pay for what you use, and there are no up-front set-up fees. Read more about Microsoft Azure and its pricing structure here.
There’s no charge for Batch itself, only the underlying compute and other resources consumed to run your batch jobs, including applicable software licence costs. You are billed by the second of compute resources, and can choose the compute power and storage of the nodes you run to suit your budget. You can also choose between low-priority and high-priority virtual machines to further manage the per-second cost of computing. Reserved pricing is available for Azure Batch.
You can read more about Azure Batch pricing here.
Alternatives to Azure Batch
If you already use Azure storage and compute services, Azure Batch is an obvious choice to start job scheduling and parallel process management. However, if your storage is elsewhere or you are not familiar with Azure services you may find an alternative solution suits your organisation better. Examples of alternative job scheduling and compute management services are: