- English
- français
- Deutsche
- Contact us
- Docs
- Login
Traffic surges, whether from a product launch or Black Friday shopping, shouldn't crash your application or max out your storage and RAM capacity. This is where smart autoscaling comes in. It keeps applications fast and costs in check by adding or removing application instances as demand changes. Rather than guessing a fixed size, the platform watches live signals and adjusts resources automatically.
Modern applications require more than just throwing additional servers at a problem. They need sophisticated scaling strategies that can distinguish between temporary spikes and sustained growth, scale individual components independently, and maintain performance across multiple cloud environments. The key lies in understanding two fundamental scaling approaches: horizontal scaling and cluster autoscaling.
Application scaling isn't a one-size-fits-all solution. Different workloads require different strategies, and the most effective scaling implementations combine multiple approaches to create resilient, cost-effective systems.
Horizontal scaling focuses on adding more instances of your application containers to distribute load. Rather than upgrading to a more powerful server, horizontal scaling creates additional identical application instances that share the incoming traffic. This approach works exceptionally well for stateless applications where any available instance can handle each request.
Cluster scaling, on the other hand, operates at the infrastructure layer, automatically adjusting the number of compute nodes available to your applications. When your application instances need more resources than your current cluster can provide, cluster autoscaling provisions additional nodes. Conversely, when demand decreases, it removes underutilized nodes to reduce costs.
You're probably wondering, what about vertical scaling then? Vertical scaling takes a different approach by increasing the power of existing servers, adding more CPU, RAM, or storage to the machines you already have. Instead of running more copies of your app, you allocate more resources to your current app instances to handle the increased load. This works well for applications that cannot be easily split into multiple instances, such as databases or applications with complex state management. However, vertical scaling has its limits; there's only so much CPU and RAM that can be added to a single server, and it doesn't provide the same fault tolerance as horizontal scaling, since you're still relying on individual machines.
Let's dive a little into an example we can both relate to.
Think of a road trip. Normally, your family of 4 travels comfortably in one sedan. Today, you're organizing a reunion and need to transport 20 people to the beach. Vertical scaling is getting a bigger vehicle; trade your sedan for a large bus that can carry all 20 people in one trip.
On the other hand, horizontal scaling involves keeping your sedan and getting four more identical cars. Now you have five sedans that can carry a total of 20 people (4 people per car). If one car has trouble, the other four can still make the trip, and people can redistribute between cars.
With cluster autoscaling, you recruit three more family members who can drive. Now each of your five cars has a driver, and all 20 people can travel efficiently.
Let's take a closer look at cluster autoscalers.
In traditional cloud environments, cluster autoscaling operates at the infrastructure layer, automatically adjusting the number of compute nodes in response to application demands. When your applications need more resources than your current nodes can provide, a cluster autoscaler provisions additional nodes. When demand decreases, it removes underutilized nodes to reduce costs.
The cluster autoscaler continuously monitors resource requests and scheduling needs across your infrastructure. Modern implementations utilize algorithms that take into account factors such as pending pod scheduling requests, node utilization patterns, and application resource requirements when making scaling decisions. This prevents both resource shortages and wasteful over-provisioning.
The integration between horizontal pod scaling and cluster autoscaling creates a complete scaling ecosystem in Kubernetes environments. When applications need more instances but existing nodes lack capacity, pods remain pending until the cluster autoscaler provisions additional nodes to accommodate them.
However, managing cluster autoscaling requires expertise in Kubernetes operations, cloud provider integrations, and careful configuration of scaling policies, node pools, and cost controls. This operational complexity is one reason why many teams prefer managed platforms that automatically handle infrastructure scaling.
By “hyperscaler,” we mean the large cloud providers, such as AWS, Google Cloud, and Microsoft Azure, that run massive fleets of machines across multiple regions. Their value is simple: near-instant capacity and a global footprint when you need it. Their global network of data centers, advanced orchestration capabilities, and managed services create the perfect environment for implementing intelligent scaling strategies. However, with their services, you would still wire up node groups, autoscalers, and metrics.
Upsun operates across AWS, Azure, and Google Cloud Platform, giving your applications access to hyperscaler infrastructure while abstracting away the complexity of managing multiple cloud environments. This multicloud approach provides several advantages for scaling operations.
First, it eliminates vendor lock-in concerns that often limit scaling decisions. Your applications can leverage the best features from each hyperscaler without compromising on architecture. AWS might offer the most mature autoscaling services, Google Cloud might provide the best Kubernetes integration, and Azure might deliver superior enterprise integration. With Upsun, you don't have to choose just one.
Second, multicloud deployment enables geographic distribution that improves both performance and resilience. Your European users can be served from Azure's European regions while your Asian traffic routes through Google Cloud's Asia-Pacific infrastructure. Serve users from the provider and region you choose. If you require multi-region high availability, run projects in multiple regions and place a CDN or DNS routing layer in front.
Third, cost optimization becomes more sophisticated in a multicloud environment. Different hyperscalers offer varying pricing models for compute, storage, and networking. Intelligent workload placement can significantly reduce operational expenses while maintaining performance requirements.
Upsun provides flexibility with both horizontal and vertical scaling, allowing you to choose the right approach for each situation. In the Console today, autoscaling is driven by average CPU. Memory-based autoscaling is on the roadmap. During traffic spikes, Console autoscaling automatically adds or removes application instances within the rules you set. When you need more power per instance, you can adjust CPU, RAM, and disk for each container, including databases and caches.
Horizontal scaling on Upsun
Horizontal scaling on Upsun is handled through a built-in autoscaling feature. Upsun adds or removes application instances to match live demand. Rather than having to watch your app and manually add more instances when traffic picks up, Upsun tracks your app’s average CPU and adjusts instance count within rules you set.
Here's how it works: Upsun watches the average CPU across all your app instances. If CPU stays at 80% or higher for 5 minutes straight, it automatically spins up another instance to help handle the load. When there’s a change and the CPU drops below 20%, it waits for 5 minutes and then removes extra instances. Default instance limits are typically 1–8 per environment, although the exact values vary by region.
You can set this up right from the Console by clicking "Configure resources" and then "Enable" under the autoscaling column. From there, you get to decide:
Once autoscaling is enabled, you cannot manually set the instance counts anymore. Upsun handles it all within the limits you set. However, you can adjust the amount of CPU, RAM, and disk each instance receives, but the number of instances becomes automatic.
Keep in mind that autoscaling currently works only for applications.
A quick example: If your shop typically runs two app instances, the autoscaler will add instances once the CPU usage exceeds 80% for 5 minutes and continue until demand settles or it reaches your configured maximum. Treat “8” as a standard default cap, not a guarantee, because caps vary by region.
The beauty of this approach lies in its seamless operation. Since horizontal scaling involves adding or removing instances, Upsun deploys the change, which utilizes build minutes. Each scaling action consumes build minutes since new or removed instances are deployed with the scaling action. If your app scales frequently, this could increase build minute usage. Keep evaluation periods sensible to avoid frequent changes and control costs by avoiding overly aggressive scaling settings.
Sometimes, you don't need more instances; you just need to give your existing instances more power. With vertical scaling, you're basically giving your existing app a hardware upgrade: more CPU, extra RAM, or additional storage space, rather than spinning up additional instances.
This approach works really well when you're dealing with databases or apps that don't work well being split across multiple instances. Consider this: running multiple database instances sounds great until you realize you now have to keep all that data in sync, which becomes complicated quickly. It's much simpler to allocate more RAM and CPU power to your single database instance.
Upsun offers four different container profiles that give you various combinations of CPU and RAM, depending on what your app needs:
You can adjust these resources through the Console or the CLI at any time. Saving vertical changes redeploy the environment (short downtime), and each instance receives the full CPU and RAM you select. This works great for services like databases, caches, or apps that can't easily split their work across multiple instances. Instead of trying to run multiple database instances (which can become complicated due to data consistency issues), you can simply allocate more RAM and CPU to your single database instance to handle the increased load.
You can mix and match, use vertical scaling for your database and horizontal scaling for your web app, all within the same project. Upsun handles resource allocation per environment, allowing you to allocate more power to your production database than to your development one, while keeping costs reasonable.
Upsun operates differently from traditional Kubernetes setups. While the cluster autoscalers apply to self-managed clusters, Upsun handles all infrastructure scaling automatically behind the scenes. As a user, you focus on horizontal and vertical scaling of your applications, while Upsun manages the underlying infrastructure capacity for you.
Getting autoscaling right isn't just about turning on a feature; you need to understand how your app actually behaves, select the right metrics to monitor, and continually adjust settings based on what actually happens in production. Platforms like Upsun make this much easier by handling the complex infrastructure stuff automatically, so you can focus on building great apps instead of wrestling with scaling configurations.
As your applications grow, your scaling approach should grow with them. Implementing smart autoscaling in place pays off in better performance, reduced costs, and the ability to quickly adapt when your business needs change or new opportunities arise.
The apps that will succeed are those that can scale smartly, adjusting to demand automatically while keeping both performance and costs in check. With the right platform and approach, your applications can handle whatever growth throws at them.