Network Namespaces vs Docker: A Performance Deep Dive
Why Colony uses Linux network namespaces instead of Docker containers, with real performance comparisons.
People hear “isolated development environments” and immediately assume Docker. It’s the default answer. It’s also the wrong answer for what we’re building.
Docker is brilliant. It revolutionized deployment. But for Colony’s use case — orchestrating AI coding agents — Docker solves problems we don’t have while adding overhead we can’t afford.
Let’s talk about what Docker actually does, why we use raw Linux network namespaces instead, and what the performance difference looks like in practice.
Docker Isn’t One Thing
Docker is a stack:
- Namespaces for process isolation
- Cgroups for resource limits
- Union filesystem for layered images
- Docker daemon managing lifecycle
- Image registry for distribution
This is great engineering. It’s why Docker took over the world. But when you’re building dev tooling for AI agents, you realize you only need one of these: network isolation.
What AI Agents Actually Need
Agents need to:
- Read and write thousands of files quickly (no filesystem overhead)
- Run processes without conflicts (network isolation)
- Start fast (agents spawn frequently)
- Access native tooling (no need to install languages inside images)
Docker gives you all of this. But it also gives you image layers, union filesystems, daemon-based lifecycle management, and image building. We don’t need any of that.
The Performance Tax
Here’s a Docker setup for a typical dev environment:
docker run -it --rm \
-v $(pwd):/workspace \
-p 3000:3000 \
node:20-alpine \
/bin/sh
Startup time: 800-1200ms cold start
Memory overhead: ~150MB base + runtime
Filesystem I/O: Union filesystem adds 10-15% latency on metadata operations
Now the network namespace approach:
ip netns add colony-abc123
ip netns exec colony-abc123 /bin/sh
Startup time: <50ms
Memory overhead: ~2-5MB (just namespace metadata)
Filesystem I/O: Native. Zero overhead.
The difference compounds. An AI agent running npm install with 10,000 files sees measurable slowdown in Docker compared to native filesystem access.
The Filesystem Problem
This matters more than you’d think.
Docker’s union filesystem is optimized for images (immutable layers) and copy-on-write (fast container creation). But AI agents don’t work in immutations. They work in mutable source trees.
When an agent runs:
git clone(thousands of files)npm install(scanning node_modules repeatedly)grep -racross the codebase- Test suites writing temp files
Every operation hits the union filesystem’s metadata overhead. File access goes through:
- Container filesystem layer
- OverlayFS metadata lookup (which layer has this file?)
- Host filesystem
With namespaces, it’s just step 3. Direct host filesystem access.
Real numbers: Running npm test in a TypeScript project with ~500 test files:
- Docker (bind mount): ~8.2 seconds
- Native filesystem: ~6.8 seconds
That’s 17% faster just by removing the filesystem abstraction. For agents running hundreds of test iterations daily, this adds up.
Resource Limits? We Don’t Need Them Yet
Docker uses cgroups to prevent one container from eating all CPU or memory. Critical in multi-tenant production systems.
Colony doesn’t need this yet. Users run colonies on their own machines or dedicated dev servers. If a colony’s agent maxes out CPU compiling Rust code, that’s expected. It’s their workload.
When we need resource limits (for cloud-hosted Colony), we’ll add cgroups directly:
cgcreate -g cpu,memory:/colony-abc123
cgset -r cpu.max=200000 colony-abc123 # 2 cores
cgset -r memory.max=4G colony-abc123
We’ll add it when we need it. Without taking on Docker’s other overhead.
Network Isolation: The Part We Actually Use
Network namespaces are our core isolation primitive. Each colony gets:
- Its own network stack
- Its own loopback interface
- Virtual ethernet pairs (veth) to the host
- Its own routing table and firewall rules
This lets us:
- Run identical services on the same port across multiple colonies
- Use Caddy to route
api-4001.colony.localto the right colony - Isolate network failures between colonies
Here’s how we create a colony’s network namespace:
pub fn create_namespace(name: String) -> Result(Namespace, Error) {
case exec("ip", ["netns", "add", name]) {
Ok(_) -> {
let host_veth = "veth-" <> name
let colony_veth = "veth-colony"
exec("ip", ["link", "add", host_veth, "type", "veth", "peer", "name", colony_veth])
exec("ip", ["link", "set", colony_veth, "netns", name])
exec("ip", ["netns", "exec", name, "ip", "addr", "add", "10.200.1.2/24", "dev", colony_veth])
exec("ip", ["netns", "exec", name, "ip", "link", "set", colony_veth, "up"])
Ok(Namespace(name: name, veth: host_veth))
}
Error(e) -> Error(e)
}
}
Total time: <50ms. Try spinning up a Docker container with custom networking that fast.
When Docker IS the Right Choice
Docker isn’t wrong. It’s optimized for a different problem.
Use Docker when you need:
Reproducible builds. Shipping the exact environment to production that you tested locally.
Multi-tenant SaaS. Resource limits and strong isolation are critical.
Image distribution. Users pull docker run myapp and it works.
Orchestration platforms. Kubernetes, ECS, Cloud Run speak Docker/OCI.
Batteries-included networking. Docker Compose’s service discovery is excellent.
Colony doesn’t need any of this. We’re building development environments, not deployment artifacts. Native performance matters more than portability.
The Hybrid Future
As Colony matures, we might use both:
- Namespaces for local development (fast, lightweight)
- Docker/Podman for cloud colonies (stronger isolation, resource limits)
The beauty of our architecture is that the isolation mechanism is pluggable. The OTP actor managing a colony doesn’t care whether it’s using namespaces or containers.
Benchmark Summary
For a typical workflow (clone repo, install deps, run tests):
| Metric | Docker (bind mount) | Network Namespace | Improvement |
|---|---|---|---|
| Cold start | 800-1200ms | <50ms | 16-24x faster |
| Memory overhead | ~150MB | ~2-5MB | 30-75x lighter |
| Filesystem I/O | +10-15% latency | Native | 10-15% faster |
| Port isolation | ✅ | ✅ | Equal |
| Resource limits | ✅ | ❌ (for now) | Docker wins |
For AI agents that spawn frequently, iterate rapidly, and hammer the filesystem, namespaces win.
The Bottom Line
Docker is an amazing tool. But it’s optimized for deployment, not development velocity.
By using only the Linux primitives we actually need (network namespaces), Colony gets:
- Sub-50ms cold starts
- Native filesystem performance
- Minimal memory overhead
- Direct access to the host’s toolchain
If you’re building tooling for fast-iteration workflows, question whether you actually need all of Docker’s complexity. Sometimes the kernel’s primitives are enough.
Want to see network namespaces powering real AI development workflows? Join the waitlist for Colony.