In today’s fast-paced business environment, ensuring efficiency and responsiveness is crucial for maintaining productivity and user satisfaction. Proactively managing virtual machines (VMs) is essential to prevent potential issues before they arise, ensuring that end users have the resources they need to remain productive and satisfied.
By leveraging Azure’s Event Grid and Function App, we can automate the process of resizing VMs in response to capacity issues with specific SKUs or Zone issues. This proactive approach can help reduce or possibly eliminate the need for users to report issues when it comes to virtual machines not starting because of a Zone or SKU capacity issue thus minimizing downtime and maximizing productivity.
Below is how you can use Event Grid and Function App to monitor for ZonalAllocationFailed or insufficient capacity and trigger automatic resizing and starting of the Azure Virtual Desktop.
First you will need to create a Function App and Function.
Create a Function and Select Azure Event Grid Trigger.
Specify a Function Name. I try to keep the Function Name and Event Grid Subscription Name the same so it’s easier to follow for everyone on your team.
After the Function is created, copy and paste the script into the Function. Check for formatting issues and make any adjustments to tags, SKUs, and whatif’s as needed. This script is wont make any changes until the -whatifs are removed.
param(
[Parameter(Mandatory=$true)]
[object]$eventGridEvent,
[Parameter(Mandatory=$true)]
[object]$TriggerMetadata
)
# Import necessary modules
Import-Module Az.Accounts -ErrorAction Stop
Import-Module Az.Compute -ErrorAction Stop
Import-Module Az.Resources -ErrorAction Stop
# Log the incoming event
Write-Host "Event received:"
$eventGridEvent | ConvertTo-Json -Depth 3 | Write-Host
function Get-ResourceGroupName {
param (
[string]$resourceId
)
return ($resourceId -split "/")[-5]
}
function Get-VMName {
param (
[string]$resourceId
)
return ($resourceId -split "/")[-1]
}
function Handle-VMStartFailure {
param (
[string]$resourceId
)
Write-Host ("Handling VM start failure for resource: " + $resourceId)
# Retrieve existing tags
$resource = Get-AzResource -ResourceId $resourceId -ErrorAction Stop
$existingTags = $resource.Tags
if (-not $existingTags) {
$existingTags = @{}
}
Write-Host ("Existing tags: " + ($existingTags | ConvertTo-Json))
#Using Tags so that we can run a script to change VM Size back to standard after cap issues.
$existingTags['VMSizeAllocationFailure'] = 'True'
# Update tags
Try {
Write-Host ("Updating tags for resource: " + $resourceId)
Update-AzTag -ResourceId $resourceId -Tag $existingTags -Operation Merge -WhatIf -
ErrorAction Stop
Write-Host ("WhatIf: Successfully updated tags for resource: " + $resourceId)
}
Catch {
$ErrorMessage = $_.Exception.Message
Write-Host ('Error assigning failure tag: ' + $ErrorMessage)
}
$resourceGroupName = Get-ResourceGroupName -resourceId $resourceId
$vmName = Get-VMName -resourceId $resourceId
$vm = Get-AzVM -ResourceGroupName $resourceGroupName -Name $vmName -ErrorAction Stop
$currentSku = $vm.HardwareProfile.VmSize
$skuList = @()
if ($currentSku -eq "Standard_D16ads_v5" -or $currentSku -eq "Standard_D16as_v4") {
$skuList = @("Standard_D16ds_v5")
} else {
$skuList = @("Standard_E4s_v5", "Standard_D4s_v5", "Standard_D4s_v4")
#Can Add more SKUs here if needed.
#Standard_E4s_v5", "Standard_D4s_v5", "Standard_E4s_v4", "Standard_D4s_v4",
#"Standard_D4ads_v5"
}
Write-Host ("Current SKU: " + $currentSku)
Write-Host ("Attempting to change SKU to one of: " + ($skuList -join ", "))
foreach ($sku in $skuList) {
# Change VM size and start the VM
Try {
Write-Host ("Deallocating VM: " + $vm.Name)
Stop-AzVM -ResourceGroupName $resourceGroupName -Name $vmName -Force -WhatIf
-ErrorAction Stop
Write-Host ("WhatIf: Deallocated VM: " + $vm.Name)
Write-Host ("Changing VM size to: " + $sku)
$vm.HardwareProfile.VmSize = $sku
Update-AzVM -ResourceGroupName $resourceGroupName -VM $vm -WhatIf -ErrorAction Stop
Write-Host ("WhatIf: Changed VM size to: " + $sku)
Write-Host ("Starting VM: " + $vm.Name)
Start-AzVM -ResourceGroupName $resourceGroupName -Name $vm.Name -WhatIf -ErrorAction
Stop
Write-Host ("WhatIf: Successfully started VM with SKU: " + $sku)
break
}
Catch {
$ErrorMessage = $_.Exception.Message
Write-Host ('Error changing VM size and starting the VM with SKU ' + $sku + ': ' +
$ErrorMessage)
if ($sku -eq $skuList[-1]) {
Write-Host ('Failed to start VM with all SKUs attempted.')
}
}
}
}
# Check if VM start action failed
if ($eventGridEvent.data.status -eq "Failed" -and $eventGridEvent.data.authorization.action -eq "Microsoft.Compute/virtualMachines/start/action") {
$resourceId = $eventGridEvent.data.resourceUri
Write-Host ("Event indicates a VM start failure for resource: " + $resourceId)
# Check if properties exist
if ($null -ne $eventGridEvent.data.properties) {
# Check if statusMessage exists
if ($null -ne $eventGridEvent.data.properties.statusMessage) {
# Parse the status message
$statusMessage = $eventGridEvent.data.properties.statusMessage | ConvertFrom-Json
# Check for specific failure reasons
$errorDetails = $statusMessage.error.details
$allocationFailed = $false
foreach ($detail in $errorDetails) {
if ($detail.code -eq "ZonalAllocationFailed" -or $detail.message -like "*insufficient capacity*") {
$allocationFailed = $true
break
}
}
if ($allocationFailed) {
Write-Host ("Failure due to ZonalAllocationFailed or insufficient capacity.")
Handle-VMStartFailure -resourceId $resourceId
} else {
Write-Host ("Failure due to reasons other than ZonalAllocationFailed or insufficient capacity.")
}
} else {
Write-Host ("statusMessage is null or not present in the event data.")
}
} else {
Write-Host ("properties field is null or not present in the event data.")
# Handle VM start failure without relying on properties field
Handle-VMStartFailure -resourceId $resourceId
}
} else {
Write-Host ("Event does not match VM start failure criteria.")
}
Click Save when Finished.
Create Event Grid Event Subscription
Event Subscriptions listen for events emitted by the topic resource and send them to the endpoint resource.
Go to your Subscription and click on Events.
Click on Event Subscription
Specify the Name.
Specify Resource and System Topic Name. Information on System Topics can be found here. System topics in Azure Event Grid – Azure Event Grid | Microsoft Learn
Choose Resource Action Failure only.
Click Endpoint Type drop down and select Function App.
Select the Resource Group, Function App and Function.
Set the Key: data.authorization.action
Operator: String Contains
Value: Microsoft.Compute/virtualMachines/start/action
Click Create when finished.
If you do have Zone or capacity issues occurring then you can go to your function and view the logs and output by changing to Filesystem logs.