How to Configure Azure Web Apps Autoscaling using Terraform

One of the reasons people pick Azure App Service in the first place is the scalability when things get busy. That part is true, but only if someone has actually configured it to do so. Out of the box, an App Service Plan will sit on whatever instance count you gave it and happily watch your application fall over at 100% CPU.

Autoscale rules are where that gets fixed. In this post, we will walk through configuring autoscaling for Azure Web Apps using Terraform, covering the App Service Plan, the autoscale rules themselves, and the small gotchas that tend to catch people out the first time.

What do you need

All you need to follow this post is:

Terraform installed
Azure CLI installed
VS Code (or your IDE of choice)
An Azure Subscription to deploy to

To point out is that autoscale is not available on the Basic, Free, or Shared tiers (only on Standard, Premium, or Isolated). If you try to attach an autoscale setting to one of those, Azure will reject it, and Terraform will surface the error during apply. We will be using an S1 plan in this walkthrough, which is the cheapest tier that supports it.

Terraform Configuration

First, create a directory for this project and open it within your chosen IDE. Then, create a file and call it main.tf. This is where we will configure everything we need.

Let's start with the provider and the resource group:

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "webapp_rg" {
  name     = "webapp-autoscale-rg"
  location = "West Europe"
}

Then we need an App Service Plan and a Web App sitting within the resource group.

resource "azurerm_service_plan" "webapp_plan" {
  name                = "asp-webapp-weu"
  resource_group_name = azurerm_resource_group.webapp_rg.name
  location            = azurerm_resource_group.webapp_rg.location
  os_type             = "Linux"
  sku_name            = "S1"
}

resource "azurerm_linux_web_app" "webapp" {
  name                = "app-jamescookdev-weu"
  resource_group_name = azurerm_resource_group.webapp_rg.name
  location            = azurerm_service_plan.webapp_plan.location
  service_plan_id     = azurerm_service_plan.webapp_plan.id

  site_config {}
}

Nothing exciting here. A Linux S1 plan and a Web App bound to it. In a real deployment you would have app settings, a runtime stack, deployment slots and so on, but that is not the focus of this post.

Adding the Autoscale Setting

Now for the interesting part. The azurerm_monitor_autoscale_setting resource is what drives autoscale on Azure, and it is the same resource you would use for a Virtual Machine Scale Set. The target resource is the App Service Plan.

Here is a basic CPU-based autoscale configuration:

resource "azurerm_monitor_autoscale_setting" "webapp_autoscale" {
  name                = "autoscale-webapp-weu"
  resource_group_name = azurerm_resource_group.webapp_rg.name
  location            = azurerm_resource_group.webapp_rg.location
  target_resource_id  = azurerm_service_plan.webapp_plan.id
  enabled             = true

  profile {
    name = "default"

    capacity {
      default = 2
      minimum = 2
      maximum = 10
    }

    rule {
      metric_trigger {
        metric_name        = "CpuPercentage"
        metric_namespace   = "microsoft.web/serverfarms"
        metric_resource_id = azurerm_service_plan.webapp_plan.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "GreaterThan"
        threshold          = 70
      }

      scale_action {
        direction = "Increase"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT5M"
      }
    }

    rule {
      metric_trigger {
        metric_name        = "CpuPercentage"
        metric_namespace   = "microsoft.web/serverfarms"
        metric_resource_id = azurerm_service_plan.webapp_plan.id
        time_grain         = "PT1M"
        statistic          = "Average"
        time_window        = "PT5M"
        time_aggregation   = "Average"
        operator           = "LessThan"
        threshold          = 30
      }

      scale_action {
        direction = "Decrease"
        type      = "ChangeCount"
        value     = "1"
        cooldown  = "PT10M"
      }
    }
  }

  notification {
    email {
      send_to_subscription_administrator    = false
      send_to_subscription_co_administrator = false
      custom_emails                         = ["alerts@example.com"]
    }
  }
}

There is a lot going on, so let's break it down.

The capacity block

Default - The instance count the plan starts at when the autoscale setting is first applied. This is also what Azure falls back to if the rules ever fail to evaluate
Minimum - The floor. Even if every rule says scale in, the plan will never drop below this number
Maximum - The ceiling. Scale out will stop here no matter how busy things get

Setting minimum = 2 is a common pattern for production workloads, as running a single instance removes any redundancy. You are one restart or platform event away from downtime.

The scale-out rule

Metric_name - For App Service Plans this is CpuPercentage. This catches people out because in the Azure Portal the same metric is displayed as "CPU Percentage". Terraform expects the raw metric name
Metric_namespace - Setting this to microsoft.web/serverfarms is optional but recommended. There is a long-running issue where Azure occasionally throws an UnsupportedMetric error if the namespace is not explicit, and setting it removes the flakiness
Metric_resource_id - The App Service Plan ID. This tells the rule which resource to read the metric from
Time_grain - How often the metric is sampled. PT1M is one minute, which is the lowest supported value
Time_window - How far back Azure looks when evaluating the rule. PT5M means the last five minutes
Time_aggregation - How Azure combines the samples within the time window. Average is the sensible default for CPU
Operator and Threshold - The comparison that triggers the action. In this case, scale out when average CPU over the last five minutes is greater than 70%
Scale_action - What to do when the rule fires. We are increasing the count by one, with a five minute cooldown before the same rule can fire again

The scale-in rule

The second rule is the opposite of the scale-out. When average CPU drops below 30% over five minutes, remove one instance. The cooldown is deliberately longer than the scale-out cooldown, which is a pattern I always follow.

The reason is simple: scaling out is cheap and fast, scaling in is where things go wrong. If you scale in too aggressively and traffic ticks back up, you end up oscillating between instance counts, which is bad for your application and bad for your bill. Giving scale-in a longer cooldown smooths that out.

Notifications

The notification block is optional but genuinely useful. It sends an email whenever a scale action fires, which is a great way to build confidence in your rules when you first deploy them. After a week or two of watching them work, you can turn it off or route it into your alerting platform.

Adding a Memory Rule

CPU is the obvious metric, but on App Service Plans memory pressure is just as common a reason to scale, especially for .NET or Node applications that lean on in-process caches. The good news is that adding more rules to the same profile is straightforward. Drop another rule block into the profile:

rule {
  metric_trigger {
    metric_name        = "MemoryPercentage"
    metric_namespace   = "microsoft.web/serverfarms"
    metric_resource_id = azurerm_service_plan.webapp_plan.id
    time_grain         = "PT1M"
    statistic          = "Average"
    time_window        = "PT5M"
    time_aggregation   = "Average"
    operator           = "GreaterThan"
    threshold          = 80
  }

  scale_action {
    direction = "Increase"
    type      = "ChangeCount"
    value     = "1"
    cooldown  = "PT5M"
  }
}

Azure evaluates each rule independently. If either CPU or memory crosses the threshold, scale-out fires. You do not need to combine them in a single rule.

A Scheduled Profile for Business Hours

The metric-based profile handles reactive scaling, but a lot of workloads have predictable daily patterns. If you know traffic ramps up at 8am every weekday, you can pre-scale before the metrics catch up. The recurrence block inside a second profile handles this:

profile {
  name = "business-hours"

  capacity {
    default = 4
    minimum = 4
    maximum = 10
  }

  recurrence {
    timezone = "GMT Standard Time"
    days     = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
    hours    = [8]
    minutes  = [0]
  }
}

During the recurrence window, the business-hours profile takes over and sets the minimum to four instances. Outside of those hours, the default profile is back in charge. You can still layer the same metric rules inside the business-hours profile if you want reactive scaling on top of the baseline bump.

One thing worth noting is that the recurrence block only defines the start of the window. Azure uses the next recurrence on the list to work out the end, so if you want this profile to only run 8am to 6pm Monday to Friday, you need a second profile with a recurrence at 6pm that reverts the capacity.

Deploying the Configuration

With everything in place, we can deploy. Open a terminal in your project directory and run the following:

az login
terraform init
terraform plan
terraform apply

The terraform plan step is worth paying attention to here. Autoscale settings are one of those resources where a small typo in a metric name or namespace can cause the apply to succeed but the rules to never actually fire. Read through the plan and double-check the metric names and resource IDs before applying.

Once the apply completes, head into the Azure Portal, find the App Service Plan, and open the Scale out (App Service Plan) blade. You should see your rules listed under the Custom autoscale option, exactly as they were defined in Terraform.

A Known Issue

There is a known issue with the azurerm_monitor_autoscale_setting resource on provider versions at or above 3.87.0 where the case of serverFarms in the target resource ID changed from lowercase to mixed case. On upgrade, Terraform reads the state file, compares it to the new ID format, and wants to destroy and recreate the autoscale setting.

The fix is usually a terraform apply -refresh-only to update the state without actually making any changes, or a terraform state rm followed by an import. If you see a plan that wants to replace an autoscale setting for no obvious reason after a provider upgrade, this is likely what you are looking at.

Wrapping Up

Autoscale is one of those features that is easy to set up once and then forget about, but it is worth revisiting the rules occasionally as your application changes. A CPU threshold that made sense when the app was smaller might be too aggressive once you have added caching, or too relaxed once you have introduced a heavier workload.

In terms of what we have built here, the key pieces are a metric-based profile for reactive scaling, a scheduled profile for predictable traffic patterns, and notifications to keep an eye on what the rules are doing. All defined in code, versioned alongside the rest of your infrastructure, and easy to promote between environments.

%buymeacoffe-butyellow

How to Configure Azure Web Apps Autoscaling using Terraform

What do you need

Terraform Configuration

Adding the Autoscale Setting

The capacity block

The scale-out rule

The scale-in rule

Notifications

Adding a Memory Rule

A Scheduled Profile for Business Hours

Deploying the Configuration

A Known Issue

Wrapping Up

Comments

More from this blog

Setting up the Azure Terraform MCP Server in Visual Studio Code

Terraform Plan as a Pull Request Comment in GitHub Actions

AI-Assisted IaC Reviews: Using Claude and Copilot to Audit Your Azure Terraform

Terraform Stacks on Azure: Is it ready to replace Workspaces?

Terraform Drift Detection in Azure: Finding and Fixing Configuration Drift

Command Palette

What do you need

Terraform Configuration

Adding the Autoscale Setting

The capacity block

The scale-out rule

The scale-in rule

Notifications

Adding a Memory Rule

A Scheduled Profile for Business Hours

Deploying the Configuration

A Known Issue

Wrapping Up

Comments

More from this blog