At Nerdio, we see thousands of Azure environments running virtual desktops provisioned using our technology. This gives us a high level view into the relative cost breakdown of an average virtual desktop Azure environment. The typical breakdown we see is this:
- Compute (VMs): 68%
- Storage (managed disks): 22%
- Backup: 6%
- Bandwidth, VPN, IPs and other: 4%
This isn’t very surprising since most virtual desktop environments are CPU-heavy, so compute being the largest component makes a lot of sense. Still, it is nice to be able to quantitively confirm this assumption.
How to Optimize Azure Costs
Now that we know that compute and storage spend account for 90% of total Azure costs when deploying virtual desktops, let’s focus on strategies to reduce the cost of these components.
One simple and obvious strategy to reduce cost is to use Azure Reserved Instances for the virtual machines running the desktop workloads. This is a good first step, but it doesn’t go far enough in making Azure as efficient as possible.
Three-year compute reservations save roughly 50%-60% of the compute costs and one-year reservations save 30% or so. Once a reservation is purchased, it makes no sense to deallocate or destroy desktop VMs (since the capacity is already pre-paid for) and the resources will continue running 24/7. Because a typical work week for a virtual desktop user is about 50 hours, which is 30% of the time (50 hours out of 168 in a week), it becomes apparent that using even 3-year reservations doesn’t save us as much as a finely tuned autoscaling system would by deallocating - or even better - destroying VMs when they are no longer in use and re-creating them when they are needed again.
Furthermore, if we consider the costs of managed disks as part of this equation, then reservations become even less compelling since the OS disks associated with all of the reserved VMs must continue to exist and be paid for. As a matter of fact, any autoscaling mechanism that deallocates VMs and leaves the OS disk intact is not optimized. Even if the OS disk has no unique user data, it is still causing the storage meter to run.
Given the above, it becomes apparent that a mechanism is needed that can optimize compute and storage Azure costs by creating desktop VMs on demand and destroying them when they are no longer needed. Such an auto-scale mechanism can provide significant savings above and beyond reserved instances and beyond simply powering VMs on and off.
Let’s take a look at a numeric example.
Sample Use Case
- 1000 virtual desktop users
- All users on desktops during peak demand (50 hours per week)
- 100 (10%) users on desktops 24/7
- E8sv3 VMs used as part of a WVD host pool with 3 users per core (25 users on each VM)
- Total E8sv3 VMs needed at peak demand: 40
- OS disk: P10 (128GB Premium SSD)
Scenario 1: Pay-as-you-go with no optimization
- 40 x E8sv3 = $17,040/month
- 40 x P10 OS disks = $760/month
- Total = $17,800 (or $18/user)
Scenario 2: 1-year Reserved Instances without autoscaling
- 40 x E8sv3 (1yr RI) = $10,960
- 40 x P10 OS disk = $760/month
- Total = $11,720/month (or $12/user)
Scenario 3: 3-year Reserved Instances without autoscaling
- 40 x E8sv3 (3yr RI) = $7,000/month
- 40 x P10 OS disks = $760/month
- Total = $7,760/month (or $8/user)
Scenario 4: Pay-as-you-go with power on/off auto-scaling
- 4 x E8sv3 (always on) = $1,704/month
- 36 x E8sv3 (50 hours per week) = $4,600/month
- 40 x P10 OS disks = $760/month
- Total = $7,065/month (or $7/user)
Scenario 5: Optimal configuration (3yr RI for always on capacity, event-based autoscaling, ephemeral OS disks)
- 4 x E8sv3 (3yr RI, always on) = $700/month
- 36 x E8sv3 (event scaled for 50 hours/week peak demand) = $3,834/month
- 40 x ephemeral SSD OS disk = $0
- Total = $4,534/month (or $4.50/user)
In the five scenarios above, the cost decreases from the high of $18/user with simple PAYG model to $4.50/user with event-based auto-scaling and ephemeral OS disk. That’s a savings of 75%!
The question is: “how can such an efficient auto-scaling system be implemented?”
Nerdio for Azure has enhanced the native Windows Virtual Desktop (WVD) functionality to enable event-based autoscaling by leveraging Azure VM Scale Sets. Nerdio also increases desktop VM performance and reduces storage costs by supporting the use of ephemeral OS disks. Here is how it works.
WVD Session Host Pool Creation
WVD session host pools are natively integrated with Azure VM Scale Sets. Each WVD host pool starts out with a generalized Windows 10 multi-session image and a scale set based on this image. Ephemeral OS disks can be optionally enabled for all VM instances of this scale set. During creation, the host pool base image can be cloned from an existing session host pool or a golden image VM.
Once created, the Azure VM Scale Set integrated WVD Session Host Pools can be configured for autoscaling. There are 4 configuration steps.
- Set scale set boundaries – These conditions will be proactively maintained. If the scale set falls out of compliance, Nerdio will automatically bring the scale set back into compliance.
- Minimum number of active hosts. This is the number of WVD session hosts (i.e. Azure VMSS instances) that are accepting new user connections.
- Maximum number of active hosts. This is the maximum number of powered-on instances that the scale set can have. The system will scale out up to this maximum, but won’t exceed it.
- Number of Standby hosts. These are VM Scale Set instances that have been previously created but are shut down and deallocated. They are ready to power up on demand. Standby hosts are useful to speed up scale out operations during peak demand since it takes less than 5 minutes to power on a Standby host and about 15 minutes to build a brand new one from scratch.
- Set the scaling logic – This logic defines when the session host pool should scale out (add hosts) and scale in (destroy hosts)
- Scale out. Build a new host when the average CPU across all session hosts in this pool exceeds a threshold for a set amount of time. After the new host is built and becomes active the system will again evaluate if the CPU threshold is exceeded and will build another host and so on until the CPU utilization falls below the defined threshold.
- Scale in. Destroy an existing host when the average CPU across all session hosts in this pool falls below a threshold for a set amount of time. The system will find the host with no active user session or lowest number of active users, send a warning message to the users (defined below), wait for a preset number of minutes, and proceed to destroy the host.
- Business hours end time. The scale in process can only start at or after the predefined end of business hours.
- Pre-stage hosts– Helps avoid boot storms during busy log-in times in the morning
- Work days. Select the typical work days.
- Start of work hours. The time by which session host capacity should be ready to go.
- Number of hosts to have ready. Specifies how many hosts are to be ready by the start of business hours.
- Messaging – Send a warning message to any users still logged into the host elected for removal and wait for a predefined number of minutes
- Gives users some time before proceeding with host removal to save work and log off.
- The text that will pop up on users’ screen before the host is placed in drain mode (not accepting new connections). Once the message is displayed, the system will wait for the defined number of minutes and proceed with host removal without further warning.
The end result of this configuration will be an automatically scaling WVD environment where hosts are added dynamically before business hours start, grow based on user demand during business hours, and removed gradually after the end of business hours. The minimum active hosts should be placed on Azure compute reservations while the remaining, dynamic hosts should be on pay-as-you-go consumption.
Nerdio’s integration of WVD Session Host Pools with Azure VM Scale Sets produces best possible savings and closest matching of compute capacity to user demand resulting in excellent end-user experience at the best possible price. The use of ephemeral OS disks further reduces the total cost and improves VM performance through local SSD storage.
Try Nerdio for yourself by scheduling a free trial, no strings attached.