What’s New in the World of eBPF from Azure Container Networking!
May 2, 2025Azure Confidential computing VM and OS disk encryption through HSM backed key CMK
May 2, 2025As described in our official document, the classic communication mode of Batch node will be retired on 31 March 2026. Instead, it’s recommended to use simplified communication mode while creating Batch pool.
But while user changes their Batch pool communication mode from classic to simplified and applies the necessary changes of network security group per documentation, they will find out that the node is still stuck in unusable status.
A very possible cause of this issue is due to the bad networking setting of Batch Account.
This blog will mainly talk about why networking setting can cause node using simplified communication mode stuck in unusable status and how to configure correct networking setting under different user scenarios.
Cause:
As described in this document, the difference between classic and simplified communication mode is very clear:
- Classic: the Batch service initiates communication with the compute nodes.
- Simplified: the compute nodes initiate communication with the Batch service.
The purpose of communication is simple: Batch service needs to receive traffic from Batch nodes to know whether a node is healthy and which status it’s in.
The difference is where the traffic initiates. If it’s initiated from Batch service side like classic communication mode, then it’s considered as outgoing traffic of your Batch Account. If it’s initiated from Batch nodes like simplified communication mode, then it’s considered as incoming traffic of your Batch Account.
The networking settings of Batch Account will only check the incoming traffic, not outgoing one. Hence if the networking setting completely disables public network access, the classic communication mode nodes will still be able to communicate with Batch service, but the simplified communication mode nodes will be unable to communicate with Batch service, which will further cause Batch service to mark this node as unusable status.
Solution:
The only important point of solution is to make sure the traffic from the simplified communication mode node is allowed by Batch Account networking setting.
Here is the diagram for different user scenarios:
*1: The resource group where the public IP address is created will be different depending on the Batch Account pool allocation mode. If it’s Batch Service, the public IP address will be created in same resource group as Virtual Network resource. If it’s User Subscription, it will be in a resource group with name AzureBatch-{GUID}-C.
*2: This scenario is as document
TIPS: In some scenarios, there will be more than 1 public IP address which can be used by a Batch pool, such as the scenario of pool with Virtual Network and own public IP address will require one additional public IP address as buffer, or the scenario of pool with Virtual Network with more than 100 nodes. In those scenarios, please remember to put all public IP addresses into allow list.