How to Configure Azure Web Apps Autoscaling using Terraform

Executive technology leader responsible for platform reliability, cloud operations, security posture, and enterprise technology risk within an investor-backed fintech environment. I lead technology operations at the intersection of engineering execution, governance, and business outcomes — ensuring platforms are scalable, resilient, and trusted by investors, regulators, and clients.
Currently VP of DevOps at InvestorFlow, where I focus on building board-ready technology operations, strengthening risk and resilience, and shaping long-term platform strategy to support growth and regulatory confidence.
One of the reasons people pick Azure App Service in the first place is the scalability when things get busy. That part is true, but only if someone has actually configured it to do so. Out of the box, an App Service Plan will sit on whatever instance count you gave it and happily watch your application fall over at 100% CPU.
Autoscale rules are where that gets fixed. In this post, we will walk through configuring autoscaling for Azure Web Apps using Terraform, covering the App Service Plan, the autoscale rules themselves, and the small gotchas that tend to catch people out the first time.
What do you need
All you need to follow this post is:
Terraform installed
Azure CLI installed
VS Code (or your IDE of choice)
An Azure Subscription to deploy to
To point out is that autoscale is not available on the Basic, Free, or Shared tiers (only on Standard, Premium, or Isolated). If you try to attach an autoscale setting to one of those, Azure will reject it, and Terraform will surface the error during apply. We will be using an S1 plan in this walkthrough, which is the cheapest tier that supports it.
Terraform Configuration
First, create a directory for this project and open it within your chosen IDE. Then, create a file and call it main.tf. This is where we will configure everything we need.
Let's start with the provider and the resource group:
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "webapp_rg" {
name = "webapp-autoscale-rg"
location = "West Europe"
}
Then we need an App Service Plan and a Web App sitting within the resource group.
resource "azurerm_service_plan" "webapp_plan" {
name = "asp-webapp-weu"
resource_group_name = azurerm_resource_group.webapp_rg.name
location = azurerm_resource_group.webapp_rg.location
os_type = "Linux"
sku_name = "S1"
}
resource "azurerm_linux_web_app" "webapp" {
name = "app-jamescookdev-weu"
resource_group_name = azurerm_resource_group.webapp_rg.name
location = azurerm_service_plan.webapp_plan.location
service_plan_id = azurerm_service_plan.webapp_plan.id
site_config {}
}
Nothing exciting here. A Linux S1 plan and a Web App bound to it. In a real deployment you would have app settings, a runtime stack, deployment slots and so on, but that is not the focus of this post.
Adding the Autoscale Setting
Now for the interesting part. The azurerm_monitor_autoscale_setting resource is what drives autoscale on Azure, and it is the same resource you would use for a Virtual Machine Scale Set. The target resource is the App Service Plan.
Here is a basic CPU-based autoscale configuration:
resource "azurerm_monitor_autoscale_setting" "webapp_autoscale" {
name = "autoscale-webapp-weu"
resource_group_name = azurerm_resource_group.webapp_rg.name
location = azurerm_resource_group.webapp_rg.location
target_resource_id = azurerm_service_plan.webapp_plan.id
enabled = true
profile {
name = "default"
capacity {
default = 2
minimum = 2
maximum = 10
}
rule {
metric_trigger {
metric_name = "CpuPercentage"
metric_namespace = "microsoft.web/serverfarms"
metric_resource_id = azurerm_service_plan.webapp_plan.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "GreaterThan"
threshold = 70
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
rule {
metric_trigger {
metric_name = "CpuPercentage"
metric_namespace = "microsoft.web/serverfarms"
metric_resource_id = azurerm_service_plan.webapp_plan.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "LessThan"
threshold = 30
}
scale_action {
direction = "Decrease"
type = "ChangeCount"
value = "1"
cooldown = "PT10M"
}
}
}
notification {
email {
send_to_subscription_administrator = false
send_to_subscription_co_administrator = false
custom_emails = ["alerts@example.com"]
}
}
}
There is a lot going on, so let's break it down.
The capacity block
Default - The instance count the plan starts at when the autoscale setting is first applied. This is also what Azure falls back to if the rules ever fail to evaluate
Minimum - The floor. Even if every rule says scale in, the plan will never drop below this number
Maximum - The ceiling. Scale out will stop here no matter how busy things get
Setting minimum = 2 is a common pattern for production workloads, as running a single instance removes any redundancy. You are one restart or platform event away from downtime.
The scale-out rule
Metric_name - For App Service Plans this is
CpuPercentage. This catches people out because in the Azure Portal the same metric is displayed as "CPU Percentage". Terraform expects the raw metric nameMetric_namespace - Setting this to
microsoft.web/serverfarmsis optional but recommended. There is a long-running issue where Azure occasionally throws anUnsupportedMetricerror if the namespace is not explicit, and setting it removes the flakinessMetric_resource_id - The App Service Plan ID. This tells the rule which resource to read the metric from
Time_grain - How often the metric is sampled.
PT1Mis one minute, which is the lowest supported valueTime_window - How far back Azure looks when evaluating the rule.
PT5Mmeans the last five minutesTime_aggregation - How Azure combines the samples within the time window.
Averageis the sensible default for CPUOperator and Threshold - The comparison that triggers the action. In this case, scale out when average CPU over the last five minutes is greater than 70%
Scale_action - What to do when the rule fires. We are increasing the count by one, with a five minute cooldown before the same rule can fire again
The scale-in rule
The second rule is the opposite of the scale-out. When average CPU drops below 30% over five minutes, remove one instance. The cooldown is deliberately longer than the scale-out cooldown, which is a pattern I always follow.
The reason is simple: scaling out is cheap and fast, scaling in is where things go wrong. If you scale in too aggressively and traffic ticks back up, you end up oscillating between instance counts, which is bad for your application and bad for your bill. Giving scale-in a longer cooldown smooths that out.
Notifications
The notification block is optional but genuinely useful. It sends an email whenever a scale action fires, which is a great way to build confidence in your rules when you first deploy them. After a week or two of watching them work, you can turn it off or route it into your alerting platform.
Adding a Memory Rule
CPU is the obvious metric, but on App Service Plans memory pressure is just as common a reason to scale, especially for .NET or Node applications that lean on in-process caches. The good news is that adding more rules to the same profile is straightforward. Drop another rule block into the profile:
rule {
metric_trigger {
metric_name = "MemoryPercentage"
metric_namespace = "microsoft.web/serverfarms"
metric_resource_id = azurerm_service_plan.webapp_plan.id
time_grain = "PT1M"
statistic = "Average"
time_window = "PT5M"
time_aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
scale_action {
direction = "Increase"
type = "ChangeCount"
value = "1"
cooldown = "PT5M"
}
}
Azure evaluates each rule independently. If either CPU or memory crosses the threshold, scale-out fires. You do not need to combine them in a single rule.
A Scheduled Profile for Business Hours
The metric-based profile handles reactive scaling, but a lot of workloads have predictable daily patterns. If you know traffic ramps up at 8am every weekday, you can pre-scale before the metrics catch up. The recurrence block inside a second profile handles this:
profile {
name = "business-hours"
capacity {
default = 4
minimum = 4
maximum = 10
}
recurrence {
timezone = "GMT Standard Time"
days = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
hours = [8]
minutes = [0]
}
}
During the recurrence window, the business-hours profile takes over and sets the minimum to four instances. Outside of those hours, the default profile is back in charge. You can still layer the same metric rules inside the business-hours profile if you want reactive scaling on top of the baseline bump.
One thing worth noting is that the recurrence block only defines the start of the window. Azure uses the next recurrence on the list to work out the end, so if you want this profile to only run 8am to 6pm Monday to Friday, you need a second profile with a recurrence at 6pm that reverts the capacity.
Deploying the Configuration
With everything in place, we can deploy. Open a terminal in your project directory and run the following:
az login
terraform init
terraform plan
terraform apply
The terraform plan step is worth paying attention to here. Autoscale settings are one of those resources where a small typo in a metric name or namespace can cause the apply to succeed but the rules to never actually fire. Read through the plan and double-check the metric names and resource IDs before applying.
Once the apply completes, head into the Azure Portal, find the App Service Plan, and open the Scale out (App Service Plan) blade. You should see your rules listed under the Custom autoscale option, exactly as they were defined in Terraform.
A Known Issue
There is a known issue with the azurerm_monitor_autoscale_setting resource on provider versions at or above 3.87.0 where the case of serverFarms in the target resource ID changed from lowercase to mixed case. On upgrade, Terraform reads the state file, compares it to the new ID format, and wants to destroy and recreate the autoscale setting.
The fix is usually a terraform apply -refresh-only to update the state without actually making any changes, or a terraform state rm followed by an import. If you see a plan that wants to replace an autoscale setting for no obvious reason after a provider upgrade, this is likely what you are looking at.
Wrapping Up
Autoscale is one of those features that is easy to set up once and then forget about, but it is worth revisiting the rules occasionally as your application changes. A CPU threshold that made sense when the app was smaller might be too aggressive once you have added caching, or too relaxed once you have introduced a heavier workload.
In terms of what we have built here, the key pieces are a metric-based profile for reactive scaling, a scheduled profile for predictable traffic patterns, and notifications to keep an eye on what the rules are doing. All defined in code, versioned alongside the rest of your infrastructure, and easy to promote between environments.





