Autoscaling AKS Microservices with Knative: HTTP Workload Optimization in Action

SQL Server 2025 Integration Services Public Preview

May 20, 2025

Keep your Azure Landing Zones policy assignments up to date with Azure Governance Visualizer

May 20, 2025

Published by azurefeeds on May 20, 2025

🚀 Why Autoscaling Matters

Autoscaling ensures that your application can handle varying levels of traffic without manual intervention. It helps with:

Cost Optimization: Scale down during periods of low demand to save on infrastructure costs.

Performance: Scale up during peak traffic to maintain responsiveness.

Reliability: Automatically recover from failures by provisioning new instances as needed.

🧱 Understanding the Building Blocks

🔹 AKS (Azure Kubernetes Service)

AKS is a managed Kubernetes service that simplifies the deployment, management, and scaling of containerized applications. It supports features like node pools, the cluster autoscaler, and integration with Azure Monitor.

🔹 Knative Serving

Knative Serving provides autoscaling capabilities for HTTP-based workloads. It supports:

Scale-to-zero when there’s no traffic

Rapid scale-up in response to incoming HTTP requests

Knative supports two types of autoscalers:

KPA (Knative Pod Autoscaler) – The default, based on concurrency or requests per second (RPS)

HPA (Horizontal Pod Autoscaler) – Optional, based on CPU/memory metrics

🔹 KEDA (Optional)

KEDA (Kubernetes-based Event Driven Autoscaling) is useful for event-driven workloads (e.g., Azure Service Bus, Kafka). It can complement Knative for hybrid autoscaling scenarios.

⚖️ Pod vs. Node Autoscaling

🧩 Pod Autoscaling (Knative)

Knative handles pod-level autoscaling based on HTTP traffic. It dynamically adjusts the number of pods depending on:

Request concurrency (e.g., 100 requests per pod)

Requests per second (RPS)

CPU usage (if HPA mode is enabled)

Knative can even scale to zero when there’s no traffic, making it ideal for bursty or event-driven workloads.

🧱 Node Autoscaling (AKS)

AKS uses the Cluster Autoscaler for node-level autoscaling. It ensures that the underlying infrastructure can support the number of pods requested by Knative.

If there’s insufficient capacity for new pods, AKS adds nodes.

If nodes are underutilized, AKS removes them to reduce cost.

Together, these layers ensure both application responsiveness and infrastructure efficiency.

🛠️ Real Example: Deployment and Testing

Step 1: Create an AKS Cluster

az group create -l centralus -n MyResourceGroup az aks create –resource-group myResourceGroup –name myAKSCluster1 –enable-managed-identity –enable-aad –location centralus –node-count 2 –enable-addons monitoring –generate-ssh-keys

Add a new node pool and enable the cluster autoscaler:

az aks nodepool add –resource-group myResourceGroup –cluster-name myAKSCluster –name newpool –node-vm-size Standard_D4s_v3 –enable-cluster-autoscaler –min-count 1 –max-count 5

🔧 Step 2: Connect to AKS

az aks get-credentials –resource-group –name –overwrite-existing

✅ Verify:

kubectl get nodes

📦 Step 3: Install Knative Serving Core

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-crds.yaml kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-core.yaml

✅ Verify:

kubectl get pods -n knative-serving

Ensure all pods are in Running or Completed state.

🌐 Step 4: Install Istio (Ingress for Knative)

Install Istio components

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/istio.yaml

Install Knative-Istio integration

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/net-istio.yaml

✅ Verify Istio installation:

kubectl get pods -n istio-system

Expected pods:

istiod

istio-ingressgateway

✅ Check Istio IngressGateway Service (with public IP):

kubectl get svc istio-ingressgateway -n istio-system

Ensure the EXTERNAL-IP field has a public IP (not ) like in the screenshot below.

✅ Verify Knative uses Istio

# Linux / Git Bash kubectl get configmap config-network -n knative-serving -o yaml | grep ingress-class # Windows kubectl get configmap config-network -n knative-serving -o yaml | findstr ingress-class

Expected Output:

🚀 Step 6: Deploy a Sample Microservice with Autoscaling

Create autoscale-service.yaml:

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: autoscale-demo spec: template: metadata: annotations: autoscaling.knative.dev/target: “50” autoscaling.knative.dev/minScale: “1” autoscaling.knative.dev/maxScale: “10” spec: containers: – image: gcr.io/knative-samples/helloworld-go env: – name: TARGET value: “Knative Autoscaler”

Apply it:

kubectl apply -f autoscale-service.yaml

✅ Verify:

kubectl get ksvc autoscale-demo

Ensure READY is True and URL is populated.

Sample output:-

📌 Optional: Configure a Custom Domain (For Simpler Access)

If you want to access Knative services like http://autoscale-demo.myapps.com directly:

Point a domain/subdomain to the Istio EXTERNAL-IP.

Update the config-domain ConfigMap:

kubectl edit configmap config-domain -n knative-serving

Add:

myapps.com: “”

Deploy the service again. It will now be available under:

http://autoscale-demo.default.myapps.com

🌍 Step 7: Load Test the Service

Load testing is done by sending 500 concurrent requests for 60 seconds to the autoscale-demo web page. You should observe Knative scaling up pods in response to the load.

Using Internal Cluster URL

kubectl run -i –tty –rm loadgen –image=williamyeh/hey –restart=Never — -z 60s -c 500 http://autoscale-demo.default.svc.cluster.local

This command will run a pod in the cluster and run a load test.

Test hey tool using External endpoint

Find Istio’s external IP:

kubectl get svc istio-ingressgateway -n istio-system

Run hey load test using host header and external IP:

Install https://github.com/rakyll/hey to generate HTTP traffic:

hey -z 60s -c 500 -host autoscale-demo.default.example.com http://72.152.40.239

Replace with the IP from above Replace autoscale-demo.default.example.com with the host from the Knative URL if you have configured a custom domain.

💡 autoscale-demo.default.example.com is the default domain assigned by Knative if you’re using the built-in example.com magic DNS (no custom domain).

🧰 Optional (if you want to avoid Host headers): Use Magic DNS (like nip.io)

Since your EXTERNAL-IP in this example is 72.152.40.239, you can use nip.io like this:

Edit the Knative config map:

kubectl edit configmap config-domain -n knative-serving

And replace:

data: nip.io: “”

Then your Knative URL will become:

http://autoscale-demo.default.72.152.40.239.nip.io

So you can directly run:

curl http://autoscale-demo.default.72.152.40.239.nip.io

# Or

hey -z 60s -c 500 http://autoscale-demo.default.72.152.40.239.nip.io

💡No Host header needed.

📈 Step 8: Monitor Autoscaling

kubectl get pods -l serving.knative.dev/service=autoscale-demo -w

💡Observe how the number of pods increases during load and scales down when idle.

Pods before starting Load test:

Running load test:

Pods while running load test:

Pods few minutes after running load test:

🧼 Step 9: (Optional) Clean Up

kubectl delete ksvc autoscale-demo

🧾 Conclusion

Combining Knative with AKS provides a robust, cloud-native platform for running microservices that scale dynamically based on real-time HTTP traffic. Whether you’re building a FinOps-aligned landing zone or simply optimizing for elasticity, this setup offers:

Elasticity: Scale from zero to high traffic effortlessly.

Efficiency: Pay only for what you consume.

Simplicity: Use YAML and native Kubernetes constructs to manage autoscaling.

SQL Server 2025 Integration Services Public Preview

Keep your Azure Landing Zones policy assignments up to date with Azure Governance Visualizer

SQL Server 2025 Integration Services Public Preview

Keep your Azure Landing Zones policy assignments up to date with Azure Governance Visualizer

🚀 Why Autoscaling Matters

🧱 Understanding the Building Blocks

🔹 AKS (Azure Kubernetes Service)

🔹 Knative Serving

🔹 KEDA (Optional)

⚖️ Pod vs. Node Autoscaling

🧩 Pod Autoscaling (Knative)

🧱 Node Autoscaling (AKS)

🛠️ Real Example: Deployment and Testing

Step 1: Create an AKS Cluster

🔧 Step 2: Connect to AKS

📦 Step 3: Install Knative Serving Core

🌐 Step 4: Install Istio (Ingress for Knative)

🚀 Step 6: Deploy a Sample Microservice with Autoscaling

📌 Optional: Configure a Custom Domain (For Simpler Access)

🌍 Step 7: Load Test the Service

Using Internal Cluster URL

Test hey tool using External endpoint

🧰 Optional (if you want to avoid Host headers): Use Magic DNS (like nip.io)

📈 Step 8: Monitor Autoscaling

🧼 Step 9: (Optional) Clean Up

🧾 Conclusion

Related posts

Understanding the ICL impact [Under the draft, reviewing by the team]

Announcing General Availability of Microsoft Purview SDK and APIs

Deploying macOS FileVault with Microsoft Intune