Skip to main content

NERDIO GUIDE

Using automation and auto-scaling to manage AVD cost and performance

Amol Dalvi | July 28, 2025

Introduction

AVD auto-scaling is a critical practice for managing your Azure Virtual Desktop environment. It dynamically adjusts the number of active session host VMs to precisely match user demand in real time. By using automation to power on resources during work hours and power them off when idle, you can significantly reduce your Azure cloud costs. 

This ensures you only pay for what you use, while also guaranteeing your users have the resources they need for a fast, responsive desktop experience, preventing performance bottlenecks and ensuring immediate availability.

What is AVD auto-scaling and why is it a critical practice for cost and performance management?

Before diving into specific best practices, it's important to understand what AVD auto-scaling is and the core problem it solves. This practice involves using automated rules and schedules to dynamically increase or decrease the number of session host virtual machines (VMs) in your AVD host pools.

The reason this is so critical comes down to the fundamental nature of the public cloud and the dual goals of every IT department:

  • Controlling Cloud Costs: In a pay-as-you-go model like Microsoft Azure, you are billed for compute resources for every minute they are running. An AVD host that is powered on but not being used represents significant wasted spending. Auto-scaling directly addresses this by ensuring VMs are automatically deallocated during nights, weekends, holidays, and other idle periods, directly translating to a lower monthly Azure bill.
  • Ensuring a Great User Experience: The alternative to auto-scaling is manual provisioning, which forces a difficult choice. You either overprovision your environment by leaving enough hosts running 24/7 to meet peak demand (which is very expensive) or you underprovision to save money, leading to slow logins and poor in-session performance for your users. Auto-scaling solves this by providing "just-in-time" resources, ensuring performance is always aligned with real-time user demand.

Ultimately, implementing a robust auto-scaling strategy is not just a minor optimization—it's a foundational component for running an AVD environment that is both financially efficient and highly performant for your users.

What are the core principles of an effective AVD auto-scaling strategy?

An effective auto-scaling strategy is built on fundamental principles that balance cost control with a seamless user experience. Understanding these core concepts allows you to build a framework that is both efficient and reliable.

How do you align resource availability with user work patterns?

The most fundamental best practice is to ensure your AVD resources are only running when they are actually needed. This requires aligning your VM schedules with your organization's real-world work patterns.

  • Map to Business Hours: Your baseline schedule should ensure session hosts are active during primary work hours and powered off during nights and weekends.
  • Account for Global Workforces: For organizations spread across different regions, you need to apply different schedules based on each user group's local time zone.
  • Pre-Warm Capacity: Power on a percentage of your hosts 15-30 minutes before users start their day. This "pre-warming" ensures that the first users to log in have an instantly available desktop, preventing initial login delays.

Implementing granular, time-zone-aware schedules manually can be complex. Management platforms like Nerdio Manager for Enterprise allow you to create and apply these precise schedules through a simple interface, removing the need for custom scripting.

How do you balance cost savings with user experience?

Auto-scaling is a constant balancing act between two competing goals: minimizing cost and maximizing performance.

  • Cost Optimization: Every minute a VM is deallocated, you save money on Azure compute costs. An aggressive scaling strategy powers off VMs the instant they become idle.
  • User Experience: If a user logs in and no hosts are available, they face a long delay while a VM starts up. If active hosts are overloaded, in-session performance degrades for everyone.

An optimal strategy avoids extremes. It maintains a small buffer of available resources to absorb unexpected logins but aggressively scales in during periods of inactivity.

Why is proactive scaling superior to purely reactive scaling?

You can scale your environment based on different types of triggers, and the most effective strategies use a hybrid approach.

  • Reactive Scaling: This method reacts to real-time performance metrics. This reactive method provides resources on demand, a core cloud computing capability where computing power is provisioned in direct response to real-time workload requirements. For example, if average CPU usage across your hosts exceeds 80% for 10 minutes, a new host is powered on. This is useful for handling unexpected demand spikes but can be slow to respond.
  • Proactive (Schedule-Based) Scaling: This method powers hosts on and off at set times based on known work patterns. It is highly reliable and cost-effective for predictable workloads.

A best-practice strategy combines both. You use proactive schedules to build your baseline capacity for the day and then overlay reactive scaling to handle unexpected surges in demand. For example, platforms like Nerdio Manager for Enterprise use a hybrid engine that proactively prepares capacity for the workday and then reactively adds or removes hosts based on real-time session usage, giving you the best of both worlds.

What are the different auto-scaling methodologies for AVD?

Once you understand the principles, you must choose the right technical methods for scaling your hosts. These methodologies determine how your environment expands and contracts to handle user sessions.

What is the difference between breadth-first and depth-first scaling?

This refers to how new user sessions are distributed across the active hosts in your host pool.

  • Depth-First Load Balancing: This method directs all new user sessions to a single host until it is full (based on a session limit you define). Only then does it start sending users to the next available host. This method is designed for maximum cost efficiency, which in cloud computing means delivering the required level of service and performance while minimizing wasted resource spend. This is the most cost-effective approach because it allows idle hosts to be scaled in (powered off) as quickly as possible.
  • Breadth-First Load Balancing: This method distributes new user sessions evenly across all available hosts in the pool. For example, with two active hosts, the first user goes to Host 1, the second to Host 2, the third to Host 1, and so on. This prioritizes performance by ensuring no single host becomes overloaded.

There is no single "best" method; it depends on your priority. AVD's native capabilities default to breadth-first. A best practice is to have the flexibility to choose, which management platforms like Nerdio Manager provide through a simple configuration setting, allowing you to switch between depth-first for cost savings or breadth-first for performance-sensitive user groups.

What is the role of scaling in and scaling out?

This is the most common form of scaling, also known as horizontal scaling.

  • Scale-Out: Automatically adding more VM instances to the host pool as user demand increases.
  • Scale-In: Automatically removing (and deallocating) VM instances from the host pool as user demand decreases.

This is the primary mechanism you will use to match capacity to user load throughout the day.

How does scaling up and scaling down contribute to optimization?

This is a more advanced technique known as vertical scaling, where you change the size of the VMs themselves.

  • Scale-Up: Automatically changing the VM instance size to a larger SKU (e.g., from 2 vCPU/8 GB RAM to 4 vCPU/16 GB RAM) to handle more intensive workloads.
  • Scale-Down: Changing the VM instance to a smaller, less expensive SKU during periods of low demand.

While not as common as horizontal scaling, this can be a powerful tool for environments with highly variable performance requirements. For example, a development VDI environment could use larger VMs during the day for compiling code and automatically scale down to smaller, cheaper VMs overnight. Manually scripting this is complex, but the functionality can be automated within a platform like Nerdio Manager.

Optimize and save

See how you can optimize processes, improve security, increase reliability, and save up to 70% on Microsoft Azure costs.

What are advanced best practices for AVD optimization?

Beyond basic scaling, advanced automation techniques can unlock further cost savings and improve operational security and efficiency.

How can you optimize storage costs for AVD?

For session hosts that use Premium SSDs for high performance during the user session, you are still paying for that expensive storage even when the VM is deallocated.

An advanced best practice is to automate disk-type switching. When the auto-scaler deallocates a VM after hours, it can also trigger a process to change its OS disk from a Premium SSD to a low-cost Standard HDD. Before the VM is powered back on in the morning, the process is reversed. This has no impact on user-perceived performance but can significantly reduce your monthly storage costs. This functionality is complex to script but is an integrated, one-click feature in optimization platforms like Nerdio Manager for Enterprise.

How does automated image management improve performance and security?

The performance and security of your AVD environment depend on the health of your golden image. A best practice is to regularly update your image with the latest Windows updates, application patches, and security configurations.

Automating this process ensures consistency and reduces manual labor. A complete automation workflow includes:

  • Powering on the golden image VM on a schedule (e.g., monthly on Patch Tuesday).
  • Applying updates via Windows Update or an enterprise patch management tool.
  • Running scripts to install or update applications.
  • Shutting down and sealing the image.
  • Automatically deploying the new image to your host pools in a rolling fashion.

Platforms like Nerdio integrate image management directly into the AVD lifecycle, allowing you to schedule these updates and deployments automatically.

This demo shows how you can optimize processes, improve security, increase reliability, and save up to 70% on AVD costs.

What should you monitor to ensure your auto-scaling strategy is effective?

You cannot optimize what you cannot measure. A final best practice is to continuously monitor key metrics to validate and refine your auto-scaling policies.

  • Cost & Savings: Track your actual AVD compute costs and, most importantly, the estimated savings generated by your auto-scaling actions.
  • User Experience Metrics: Monitor average user login times and session responsiveness (e.g., input delay) to ensure your cost-saving measures are not negatively impacting performance.
  • Utilization: Keep an eye on CPU, memory, and session count trends to identify if your scaling triggers are set correctly.

This data is vital for effective cloud cost management, which involves the continuous process of monitoring, analyzing, and optimizing your cloud spending to maximize business value. While Azure provides many of these metrics, a dedicated management platform often provides a consolidated view. For example, Nerdio Manager includes built-in dashboards that translate your auto-scale activity directly into a "Billed vs. Potential Cost" report, giving you a clear, tangible view of the ROI of your optimization efforts.

Optimize and save

See how you can optimize processes, improve security, increase reliability, and save up to 70% on Microsoft Azure costs.

How do you implement a best-practice auto-scaling solution for AVD?

Implementation requires careful configuration to ensure the automation is both effective and non-disruptive to your users. A unified management platform can simplify these steps significantly.

The flowchart below provides a visual map of the end-to-end logic behind a best-practice auto-scaling implementation. It shows how an advanced engine combines schedules, responds to real-time user demand, and handles idle resources gracefully. The following subsections will explore the key components of this process in greater detail, including configuring standby hosts and managing user sessions during scale-in operations.

How do you configure pre-staged and standby hosts?

To avoid making users wait for a VM to start, a best practice is to always maintain a buffer of available, running hosts during business hours.

  • Define a Minimum Active Host count for your schedule. For example, set a policy that at least 10% of your hosts, or a minimum of 2 hosts, are always on and ready to accept logins between 8 AM and 6 PM.
  • This standby capacity provides a buffer to absorb the initial wave of morning logins and ensures a seamless user experience.

While this can be scripted, it's easier to manage in a platform like Nerdio Manager, where setting the "Minimum Active Hosts" is a simple input field within the auto-scale configuration.

How should you manage user sessions during scale-in operations?

The cardinal rule of scaling in is to never forcibly log off a user with an active session. A graceful shutdown process is a critical best practice.

  1. Set Host to Drain Mode: When a host is identified for scale-in, the system should first put it into "drain mode." This prevents any new user sessions from being directed to that host.
  2. Wait for Logoffs: The system then waits for the remaining active or disconnected users on that host to log off naturally.
  3. Deallocate the VM: Only after the last user has logged off is the command sent to shut down and deallocate the VM to stop incurring costs.

This graceful process is essential for user trust and data integrity. Nerdio's auto-scale engine has this logic built-in, automatically managing drain mode and handling disconnected sessions according to administrator-defined policies to ensure shutdowns are never disruptive.

How can you automate the management of different host pools?

Applying distinct policies is a key part of effective resource management, which ensures that compute power is allocated efficiently across the organization according to specific departmental needs and priorities. Different user groups have different needs. Your finance department may work a standard 9-to-5 schedule, while your IT support team may need 24/7 availability.

A best practice is to create separate auto-scaling policies for each distinct user group or host pool. Manually, this can be done by using Azure tags to identify which VMs belong to which group and adding complex logic to your scripts. A more efficient approach is to use a management platform that allows you to create named policies (e.g., "Finance Dept Schedule," "Power User Performance") and apply them to different host pools through a graphical user interface.

This table compares two distinct auto-scaling profiles to illustrate how you can tailor policies for different user groups, balancing cost for standard users against performance for power users.

Setting / Parameter Finance Host Pool (Cost-Optimized Profile) Developer Host Pool (Performance-Optimized Profile)
Active Hours Schedule 7:00 AM - 7:00 PM, Monday - Friday 8:00 AM - 10:00 PM, Monday - Friday; 10:00 AM - 4:00 PM, Saturday
Load Balancing Method Depth-First: Fills one host completely before using the next to maximize the number of idle hosts that can be shut down. Breadth-First: Distributes users across all active hosts to ensure no single user experiences performance lag.
Minimum Active Hosts 1 Host: A minimal buffer to handle initial logins while keeping baseline costs as low as possible. 3 Hosts: A larger buffer to guarantee immediate resource availability for performance-sensitive tasks.
VM Size (SKU) Standard_D4as_v5 (4 vCPU, 16 GiB RAM) Standard_D8as_v5 (8 vCPU, 32 GiB RAM)
Disconnected Session Timeout Log off after 60 minutes of inactivity to free up resources quickly. Log off after 240 minutes of inactivity to support long-running processes and builds.
Storage Auto-Scaling Enabled: Automatically switches OS Disks from Premium SSD to Standard HDD when VMs are off to reduce storage costs. Disabled: Keeps Premium SSDs active at all times to ensure the absolute fastest VM start times when needed.

Frequently Asked Questions

About the author

Amol Dalvi

VP, Product

Software product executive and Head of Product at Nerdio, with 15+ years leading engineering teams and 9+ years growing a successful software startup to 20+ employees. A 3x startup founder and angel investor, with deep expertise in Microsoft full stack development, cloud, and SaaS. Patent holder, Certified Scrum Master, and agile product leader.

Ready to get started?