What’s new in Cloud Run at Next ‘26

2026-04-22 03:00 GMT · 4 days ago aimagpro.com

From vibe-coded and large-scale apps to AI models and agents, Cloud Run delivers on-demand compute with zero overhead and pay-per-use pricing for all of your workloads. Last year, the number of external active developers and applications on Cloud Run doubled, with more new customers and apps coming to Cloud Run in 2025 than in its first 6 years combined!
Today, we’re announcing new features and improvements to Cloud Run to help you run your workloads:

Build and deploy full-stack apps in Google AI Studio with Cloud Run, Firestore, and user authentication.

Build, scale, govern, and optimize reliable AI agents with the all new Gemini Enterprise Agent Platform and Cloud Run.

Enable easy deployments from developers and agents with Cloud Run’s fully managed remote MCP server.

Combine high-performance inference and serverless compute with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Cloud Run.

Empowering the new era of developers
For decades, software development has had an inaccessible learning curve, but thanks to AI, anyone can be a digital builder. With Cloud Run, you can go from prototype to deployed app in seconds.
Build full-stack apps in Google AI Studio AI Studio now supports full-stack applications that can run server-side code, a Firestore database, and user authentication. Deploy your vibe-coded apps with a single click to Cloud Run, now generally available.
Cloud Run’s fully managed remote MCP serverTo make it even easier for developers or agents to deploy code, we are launching an official remote Cloud Run MCP (Model Context Protocol) server, giving you the tools to manage and deploy apps. Now GA.
Billing caps Soon, you’ll be able to define your maximum spend per month. If your bill reaches this amount, your Cloud Run resources will be de-activated.
“Cloud Run has been one of the best technical choices we made in our deployments platform. It is our primary target, powering us to over 1 million live projects being hosted on Replit.” – Scott Kennedy, VP of Engineering, Replit
Embracing the agentic era
AI agents are just like people in that they need access to a compute environment to perform their tasks. For a cloud-based AI agent to take complex actions, it can use Cloud Run’s on-demand compute service.
Cloud Run integration with Gemini Enterprise Agent Platform Through its integration with Cloud Run, Agent Platform helps agents transition from experimental environments into fully managed, production-grade systems without having to rebuild them. Now in preview with select customers.
Cloud Run instances Traditionally, Cloud Run services, jobs, or worker pools have been opinionated ways to manage Cloud Run infrastructure. Now, we are giving you access to the underlying primitive: you can create individual Cloud Run instances. Coupled with Cloud Storage volume mounts, these instances are ideal for hosting long-running background agents like OpenClaw in one simple command:

code_block
<ListValue: [StructValue([('code', 'gcloud run instances create \rn –image alpine/openclaw:latest \rn –port 18789 \rn –memory 4Gi \rn –default-url \rn –add-volume mount-path=/home/node/.openclaw,type=cloud-storage,bucket=$BUCKET_NAME'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f518dd2daf0>)])]>

This functionality is available in preview to select customers.
Cloud Run sandboxes Agents often need a safe place to execute code or other commands as quickly as possible. Coming soon, while processing a single request, you will be able to very quickly spin up an ephemeral sandbox that’s strictly isolated from your agent code using a built-in sandbox tool:

code_block
<ListValue: [StructValue([('code', 'app.post('/execute', (req, res) => {rn const escapedCode = req.body.code.replace(/"/g, '\\"');rnrn exec(`sandbox do — /usr/bin/python3 -c "${escapedCode}"`, (e, stdout, stderr) => {rn res.send({ stdout, stderr });rn });rnrn});'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f518dd2d1f0>)])]>

“Cloud Run’s concurrency model has been instrumental in simplifying our AI workloads for our customer service AI tool, Lumi. Using Cloud Run alongside Gemini and AlloyDB, we’ve created a unified action layer that enables real-time call summarization and flow guidance, leading to improved first-call resolution rates and faster onboarding for our contact center team.” – Edward Wright, Head of Engineering, VirginMedia O2 UK
Automatic scaling for high-demand applications
Cloud Run automatically scales to meet your demand, making it a great fit for large customers that need to serve and respond instantly to heavy traffic spikes.
SSH support for Cloud RunDevelopers can now gain secure shell access (SSH) directly into a running Cloud Run container, enabling advanced troubleshooting and inspecting the container’s file system on the fly, now in preview with select customers. Open a secure interactive shell session with simple command:

code_block
<ListValue: [StructValue([('code', 'gcloud run services ssh SERVICE'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x7f518dd2d5b0>)])]>

Cloud Run service bindings Coming soon, unlock seamless service-to-service communication for your scalable microservices architectures with Cloud Run service bindings.
“Cloud Run’s serverless architecture empowers us to meet exponentially growing demand through near-instant scaling, while its streamlined developer experience simplifies building and running our applications.” – Mimi Chen, Member of Technical Staff, Anthropic
Running AI models
From serving custom models with GPUs to training and fine-tuning models with jobs, you can use Cloud Run for your most AI-intensive workloads.
Support for NVIDIA RTX PRO 6000 Blackwell GPU on Cloud Run We’re bringing the serverless experience to high-end inference with support for NVIDIA RTX PRO™ 6000 Blackwell GPUs on Cloud Run, now GA. This means you can serve up to 70B+ parameter models without having to manage any underlying infrastructure, including scaling to zero when the resource is not in use.
Ephemeral diskWith per-instance temporary disk storage, workloads can process large files or use scratch space without eating up your container memory. Now in preview, ephemeral disk storage is created when an instance starts and deleted when it stops.
“Cloud Run has fundamentally changed how we manage our model deployments. By moving to a usage-based, scale-to-zero model, we’ve eliminated idle GPU costs for low-traffic models. We are now running over 17 model variants in production across multiple regions, each independently deployable and isolated, without the burden of capacity planning or fleet management.” – Ajay Nair, Global VP, Elastic
On-demand compute for every workload
Whether you’re a seasoned software developer or a vibe coder looking to deploy the next viral app, Cloud Run delivers on-demand compute for everyone and every workload. Get started today.