The Future of Compute and Scaling

kim monzon

15 Jun 2025 — 5 min read

Written by Kim — a technical futurist sharing real-time thoughts on where compute is taking us.

I don’t write in polished essays. I write when something won’t leave my head, Im practicing my solution architecture skills in AWS using free-tier and researched about roadmaps and plans of Google Cloud, AWS and Azure and if are there new players and there are. But more importantly my topic is compute and scaling, not just as a system design problem, but as a defining question for the next era of compute.

Everyone's building fast, vibe-coding, spinning up MVPs, launching products overnight. But here's the question that matters: Can it scale?

Whether it’s a cloud app, a GPU-heavy workload, a VPS, a pipeline, or even a business model, if it can’t scale, it can’t survive.

Even content creators understand this better than most engineers. A YouTuber uploads once, and their content scales forever, zero infrastructure, zero maintenance. It’s pure leverage.

Meanwhile, in engineering, we still overbuild, overintegrate, and underestimate one thing: the future is compute.

Our laptops are always running, and everyone is constantly prompting...but few realize how much power this consumes. OpenAI has to scale massively behind the scenes, yet most users aren’t aware of the cost. I’ve tried running larger models on my own homelab, and there’s just no way OpenAI can maintain this kind of latency at scale without major infrastructure advances.

This is not just about infrastructure. This is about how we build, how we collaborate, and how we think. In a world where everything becomes compute—your tools, your assistants, your outputs—scaling is not optional. It’s the core literacy of the modern engineer.

Why This Isn’t Just Another Scaling Blog

Most scaling content teaches how to autoscale, how to set up Kubernetes, how to reduce downtime. That’s useful—but it's not the point anymore.

What I'm exploring here is where we're going:

A world where AI workloads dominate infrastructure.
A world where compute is the most precious and expensive resource.
A world where scale decisions define product success or failure.

Today, people are still wiring together services, using microservices like Lego blocks. But eventually they’ll realize: it’s not about how many services you connect. It’s about what actually scales—sustainably, predictably, and profitably.

Refactoring and Microservices aren’t always the answer. Sometimes they’re the obstacle.

That’s why scaling today is not just about handling traffic. It’s about:

Scaling compute efficiently
Scaling logic meaningfully
Scaling across systems, teams, and costs

What’s New, and Who’s Leading the Charge

1. A Resurgence of AI-Driven Infrastructure

Hyperscalers like AWS, Google, Microsoft, and Oracle are massively scaling data centers to serve AI workloads:

Amazon has pledged A$20B (~US $13B) in Australia alone for AI-ready infrastructure.
Oracle forecasts over 70% growth in cloud infrastructure next year, with $25B in new capex.
Cisco is repositioning around AI-first data centers.
Microsoft continues scaling, shifting from general-purpose to AI-inference-focused builds.

2. The Rise of AI-First Cloud Providers & Neoclouds

CoreWeave went public with $1.5B raised and 250K GPUs powering OpenAI and more.
Vultr is scaling fast post a $3.5B valuation and expansion into GPU hosting.
Nvidia is now a platform, building its own neocloud powered by Blackwell chips.

3. Architectural Trends: Serverless, Kubernetes & eBPF

Serverless is evolving with smart DBs like SkySQL that autoscale and cold-start instantly.
Kubernetes overlays (like KNative) enable event-based zero-scaling functions.
Fermyon brings WebAssembly-native scaling to production.
eBPF and tools like Cilium reshape how networking and security scale under pressure.

4. Hybrid, Multi-Cloud, Edge & Sovereignty

Enterprises now embrace “Cloud-Smart” strategies—balancing cloud, bare metal, and edge.
Gartner predicts 75% of data will be processed at the edge by 2025.
Some platforms are testing agentic AI and quantum scheduling for infrastructure orchestration.

5. Hardware Shift: The Rise of Arm

Arm Neoverse is on track to power nearly 50% of all hyperscaler compute by end of 2025.
Custom silicon (like Nvidia Grace Blackwell) is now a strategic advantage for every cloud vendor.

6. The Emerging Triad: Hyperscalers, Smart On-Prem, and Smart Bare Metal

The future of scaling won't be cloud-only. Hyperscalers provide global compute, but the rise of smart on-prem clusters and managed bare-metal boxes gives teams the power to optimize for data sovereignty, latency, and cost.
Bare metal is no longer "dumb metal." It’s containerized, observable, and increasingly AI-integrated.
On-prem is becoming smarter, using autoscaling orchestration tools and dedicated ML workloads while still offering full control.

Key Players at a Glance

Player	Role & Contribution	Differentiator
AWS, Azure, GCP	Hyperscale leaders; major AI capex	Custom chips, ML stacks, global infra scale
Oracle	Scaling fast for AI catch-up	$25B in capex, enterprise AI push
Cisco	Enabling AI-ready infrastructure	Networking + secure edge expansion
CoreWeave, Vultr	New wave GPU-first cloud providers	High GPU density, optimized for LLMs and inference
Nvidia	Chip-to-cloud disruptor	Vertical integration, Blackwell-powered cloud
Fermyon	Pioneering WebAssembly autoscaling	WASM-first scale orchestration
Kubernetes/CNCF	Core orchestration backbone	Serverless, mesh, eBPF innovation

Why This Matters for Tech Teams

1. Scaling is Driven by Compute, Not Just Traffic

Workloads are increasingly compute-heavy (e.g., large language models, generative AI, real-time inference).
Scaling today means provisioning GPUs, TPUs, Arm chips, and memory bandwidth—not just CPU threads or web servers.

2. Power Consumption Is Now a Bottleneck

Data centers are facing energy constraints. The cost of electricity, cooling, and heat management is rising.
Major providers (AWS, Azure, GCP) are investing in green energy and liquid cooling to stay operational at scale.

3. AI Workloads Are Changing the Stack

Traditional autoscaling based on CPU or HTTP load doesn’t work for LLMs and inference tasks.
Scaling AI requires predictive job scheduling, GPU-aware orchestration, and high-speed networking (InfiniBand, NVLink).

4. Kubernetes + Serverless Are the New Norm

Kubernetes is the backbone of modern scaling. It enables horizontal scaling, resilience, and self-healing workloads.
Serverless (e.g., AWS Lambda, GCP Cloud Functions) continues to rise for event-driven tasks with unpredictable load.

5. Hybrid & Multicloud Architectures Are Standardizing

Enterprises are adopting hybrid cloud (on-prem + cloud) to manage costs, data laws, and latency needs.
Multicloud setups are used to avoid vendor lock-in and improve redundancy.

6. Edge Computing Is Critical for Low Latency

75% of enterprise data is predicted to be processed outside traditional data centers by 2025 (Gartner).
Edge scaling allows faster inference for applications like autonomous vehicles, smart factories, and AR/VR.

7. Arm and GPU Chips Are Taking Over

Cloud providers are shifting to Arm (e.g., AWS Graviton) for efficiency and GPUs (e.g., Nvidia Blackwell) for performance.
Arm-based servers are cheaper, cooler, and optimized for parallel workloads.

8. Data Sovereignty & Compliance Drive On-Prem Adoption

Regulatory pressure in the EU, China, and other regions forces companies to host and scale compute locally.
This creates demand for cloud-like orchestration on bare metal and private cloud solutions.

9. Autoscaling Is No Longer Reactive—It’s Predictive

Systems are moving toward AI-driven autoscaling, predicting demand before it occurs.
This is essential for maintaining latency SLAs with large AI models.

10. Latency Is the New Uptime

For AI-powered apps, sub-second latency is as important as high availability.
Scaling decisions are now optimized for latency zones, GPU proximity, and cold start times.

And most importantly:

Scaling is emotional.

It’s not just the system that gets overwhelmed. It’s your team.

Scaling wrong adds cost, complexity, burnout. Scaling right gives clarity, speed, and room to think.

I’ve scaled systems that looked fine on paper, but wore teams out. The system worked, but the cost was hidden. That’s when I learned: scale isn’t just throughput, it’s trust and solve frustration.

Final Thoughts

If the last era was about building apps, the next one is about compute and scalability.

The future will be:

AI-first
Energy-constrained
Cost-sensitive
Multi-cloud
Queued, optimized, orchestrated
Supported by hyperscalers, smart on-prem, and smart bare metal infrastructure to solve cloud problems and frustration

Whether you’re writing backend code, deploying LLMs, or building tools for others...understanding scaling across compute layers will be your greatest technical leverage.

Not everything needs to scale. But what you choose to scale will shape your team’s performance, your infrastructure’s limits, and your product’s ability to survive what's coming next.

The future is compute. Scaling is how we survive it.