Network Namespaces vs Docker: A Performance Deep Dive

People hear “isolated development environments” and immediately assume Docker. It’s the default answer. It’s also the wrong answer for what we’re building.

Docker is brilliant. It revolutionized deployment. But for Colony’s use case — orchestrating AI coding agents — Docker solves problems we don’t have while adding overhead we can’t afford.

Let’s talk about what Docker actually does, why we use raw Linux network namespaces instead, and what the performance difference looks like in practice.

Docker Isn’t One Thing

Docker is a stack:

Namespaces for process isolation
Cgroups for resource limits
Union filesystem for layered images
Docker daemon managing lifecycle
Image registry for distribution

This is great engineering. It’s why Docker took over the world. But when you’re building dev tooling for AI agents, you realize you only need one of these: network isolation.

What AI Agents Actually Need

Agents need to:

Read and write thousands of files quickly (no filesystem overhead)
Run processes without conflicts (network isolation)
Start fast (agents spawn frequently)
Access native tooling (no need to install languages inside images)

Docker gives you all of this. But it also gives you image layers, union filesystems, daemon-based lifecycle management, and image building. We don’t need any of that.

The Performance Tax

Here’s a Docker setup for a typical dev environment:

docker run -it --rm \
  -v $(pwd):/workspace \
  -p 3000:3000 \
  node:20-alpine \
  /bin/sh

Startup time: 800-1200ms cold start

Memory overhead: ~150MB base + runtime

Filesystem I/O: Union filesystem adds 10-15% latency on metadata operations

Now the network namespace approach:

ip netns add colony-abc123
ip netns exec colony-abc123 /bin/sh

Startup time: <50ms

Memory overhead: ~2-5MB (just namespace metadata)

Filesystem I/O: Native. Zero overhead.

The difference compounds. An AI agent running npm install with 10,000 files sees measurable slowdown in Docker compared to native filesystem access.

The Filesystem Problem

This matters more than you’d think.

Docker’s union filesystem is optimized for images (immutable layers) and copy-on-write (fast container creation). But AI agents don’t work in immutations. They work in mutable source trees.

When an agent runs:

git clone (thousands of files)
npm install (scanning node_modules repeatedly)
grep -r across the codebase
Test suites writing temp files

Every operation hits the union filesystem’s metadata overhead. File access goes through:

Container filesystem layer
OverlayFS metadata lookup (which layer has this file?)
Host filesystem

With namespaces, it’s just step 3. Direct host filesystem access.

Real numbers: Running npm test in a TypeScript project with ~500 test files:

Docker (bind mount): ~8.2 seconds
Native filesystem: ~6.8 seconds

That’s 17% faster just by removing the filesystem abstraction. For agents running hundreds of test iterations daily, this adds up.

Resource Limits? We Don’t Need Them Yet

Docker uses cgroups to prevent one container from eating all CPU or memory. Critical in multi-tenant production systems.

Colony doesn’t need this yet. Users run colonies on their own machines or dedicated dev servers. If a colony’s agent maxes out CPU compiling Rust code, that’s expected. It’s their workload.

When we need resource limits (for cloud-hosted Colony), we’ll add cgroups directly:

cgcreate -g cpu,memory:/colony-abc123
cgset -r cpu.max=200000 colony-abc123  # 2 cores
cgset -r memory.max=4G colony-abc123

We’ll add it when we need it. Without taking on Docker’s other overhead.

Network Isolation: The Part We Actually Use

Network namespaces are our core isolation primitive. Each colony gets:

Its own network stack
Its own loopback interface
Virtual ethernet pairs (veth) to the host
Its own routing table and firewall rules

This lets us:

Run identical services on the same port across multiple colonies
Use Caddy to route api-4001.colony.local to the right colony
Isolate network failures between colonies

Here’s how we create a colony’s network namespace:

pub fn create_namespace(name: String) -> Result(Namespace, Error) {
  case exec("ip", ["netns", "add", name]) {
    Ok(_) -> {
      let host_veth = "veth-" <> name
      let colony_veth = "veth-colony"

      exec("ip", ["link", "add", host_veth, "type", "veth", "peer", "name", colony_veth])
      exec("ip", ["link", "set", colony_veth, "netns", name])
      exec("ip", ["netns", "exec", name, "ip", "addr", "add", "10.200.1.2/24", "dev", colony_veth])
      exec("ip", ["netns", "exec", name, "ip", "link", "set", colony_veth, "up"])

      Ok(Namespace(name: name, veth: host_veth))
    }
    Error(e) -> Error(e)
  }
}

Total time: <50ms. Try spinning up a Docker container with custom networking that fast.

When Docker IS the Right Choice

Docker isn’t wrong. It’s optimized for a different problem.

Use Docker when you need:

Reproducible builds. Shipping the exact environment to production that you tested locally.

Multi-tenant SaaS. Resource limits and strong isolation are critical.

Image distribution. Users pull docker run myapp and it works.

Orchestration platforms. Kubernetes, ECS, Cloud Run speak Docker/OCI.

Batteries-included networking. Docker Compose’s service discovery is excellent.

Colony doesn’t need any of this. We’re building development environments, not deployment artifacts. Native performance matters more than portability.

The Hybrid Future

As Colony matures, we might use both:

Namespaces for local development (fast, lightweight)
Docker/Podman for cloud colonies (stronger isolation, resource limits)

The beauty of our architecture is that the isolation mechanism is pluggable. The OTP actor managing a colony doesn’t care whether it’s using namespaces or containers.

Benchmark Summary

For a typical workflow (clone repo, install deps, run tests):

Metric	Docker (bind mount)	Network Namespace	Improvement
Cold start	800-1200ms	`<50ms`	16-24x faster
Memory overhead	~150MB	~2-5MB	30-75x lighter
Filesystem I/O	+10-15% latency	Native	10-15% faster
Port isolation	✅	✅	Equal
Resource limits	✅	❌ (for now)	Docker wins

For AI agents that spawn frequently, iterate rapidly, and hammer the filesystem, namespaces win.

The Bottom Line

Docker is an amazing tool. But it’s optimized for deployment, not development velocity.

By using only the Linux primitives we actually need (network namespaces), Colony gets:

Sub-50ms cold starts
Native filesystem performance
Minimal memory overhead
Direct access to the host’s toolchain

If you’re building tooling for fast-iteration workflows, question whether you actually need all of Docker’s complexity. Sometimes the kernel’s primitives are enough.

Want to see network namespaces powering real AI development workflows? Join the waitlist for Colony.