Free lesson

Tag images with semantic versions and push to a container registry

Learn image tagging strategies (latest, semver, git-sha) and push your LLM app image to a container registry so Kubernetes can pull it.

~25 min read · Free to read — no subscription required.

Use Docker Compose to run an LLM app with a local proxy service

Introduction

When you ship an LLM app to production, the model client is never alone — it talks to a proxy that handles rate limiting and key rotation, a cache that absorbs repeat prompts, and sometimes a monitoring sidecar. Wiring these together with raw docker run commands becomes unmanageable after the second service: you forget the network flag, mismatch a hostname, or start the app before the proxy is ready and watch every first request 502. Docker Compose fixes this by declaring the entire stack — services, networks, volumes, health checks, startup order — in a single compose.yaml. By the end of this lesson you'll be able to write a Compose file for a Gemini-backed LLM app talking to a local proxy and a Redis cache, bring the stack up with a single command, and verify each service is healthy before sending traffic.

Key Terminology

service — a named container definition inside compose.yaml; each service becomes one or more containers sharing the same image, env vars, and network attachments, and is reachable by its service name on the Compose network.
compose network — the user-defined bridge network Compose creates so services can reach each other by service name as a DNS hostname (e.g. the app calls http://api-proxy:9090), with no IPs or --link flags needed.
health check — a command Compose runs inside a container at intervals; the result (healthy / unhealthy) drives depends_on: condition: service_healthy so dependents only start once their supports are ready.
named volume — a Docker-managed storage object referenced by name (e.g. redis-data) that survives docker compose down; only down --volumes deletes it, so cached LLM responses persist across restarts.
local proxy service — a sidecar container (here Nginx) sitting between the app and the external model API, giving you a stable internal hostname, retry/rate-limit policy, and a swap-point for mocking the model in tests.

Concepts

Why a multi-service topology for LLM apps

The Gemini API has per-key rate limits, charges per token, and occasionally returns transient errors. Putting a proxy between your app and the external API gives you one place to retry, throttle, and rotate keys — and a clean swap-point to switch the backend from real Gemini to a mock during testing, without touching application code. The platform proxy used in this course's labs is exactly this pattern. A Redis cache absorbs repeat prompts so identical requests don't cost a token round-trip. The concrete wiring of these services lives in the Code Walkthrough.

Loading diagram...

All three services live on the same Compose-managed network (llm-net) and address each other by service name. No hardcoded IPs, no localhost. When you migrate this stack to Kubernetes in later chapters, those service names map directly to Kubernetes Service objects, so the topology survives the transition almost unchanged.

Declarative orchestration: health checks and startup order

The two Compose primitives that turn a pile of containers into a reliable stack are healthcheck and depends_on: condition: service_healthy. Without conditions, Compose starts every service at once — the app tries to call the proxy before Nginx has finished parsing its config, and the first wave of requests fails with connection refused. With service_healthy, Compose blocks the app's start until the proxy's wget --spider /healthz and Redis's redis-cli ping both pass. The payoff is a stack that comes up clean every time, and a single command (docker compose up) you can safely put in onboarding docs.

Code Walkthrough

The two files below turn the architecture above into a runnable stack: a compose.yaml declaring the three services with their health checks and startup conditions, and the lifecycle commands you use to bring the stack up, inspect it, and tear it down.

Code snippetyaml
1name: gemini-llm-stack
2
3services:
4  api-proxy:
5    image: nginx:1.25-alpine
6    ports:
7      - "9090:9090"
8    volumes:
9      - ./proxy/nginx.conf:/etc/nginx/conf.d/default.conf:ro
10    healthcheck:
11      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/healthz"]
12      interval: 10s
13      timeout: 3s
14      retries: 3
15      start_period: 5s
16    networks: [llm-net]
17
18  redis:
19    image: redis:7-alpine
20    command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
21    volumes:
22      - redis-data:/data
23    healthcheck:
24      test: ["CMD", "redis-cli", "ping"]
25      interval: 10s
26      timeout: 3s
27      retries: 3
28    networks: [llm-net]
29
30  gemini-app:
31    build:
32      context: .
33      dockerfile: Dockerfile
34    image: gemini-app:local
35    ports:
36      - "8000:8000"
37    env_file:
38      - .env.development
39    environment:
40      - GEMINI_API_URL=http://api-proxy:9090
41      - REDIS_URL=redis://redis:6379/0
42    depends_on:
43      api-proxy:
44        condition: service_healthy
45      redis:
46        condition: service_healthy
47    healthcheck:
48      test: ["CMD", "python", "-c",
49        "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
50      interval: 30s
51      timeout: 5s
52      retries: 3
53      start_period: 10s
54    networks: [llm-net]
55
56networks:
57  llm-net:
58    driver: bridge
59
60volumes:
61  redis-data:

The name: field gives the project a stable prefix so two developers running this from differently named directories don't clobber each other's containers. The api-proxy service bind-mounts nginx.conf read-only (:ro); its health check uses wget --spider because curl isn't in the Alpine base image. The redis service's command: override caps memory at 128 MB with an LRU eviction policy — ideal for caching repeat LLM prompts. The gemini-app service is the orchestration payoff: depends_on with condition: service_healthy blocks startup until both supports pass their checks, and the environment block addresses the proxy and Redis by service name — Compose's internal DNS resolves them to the right container IP on the llm-net bridge.

Code snippetbash
1# Build images and start the stack in the background
2docker compose up --build --detach
3
4# Verify all three services are healthy
5docker compose ps
6
7# Stream logs from all services
8docker compose logs --follow --tail 50
9
10# Exercise the full request path: client -> app -> proxy -> Gemini
11curl -s -X POST http://localhost:8000/generate \
12    -H "Content-Type: application/json" \
13    -d '{"prompt": "What is a container image?"}' \
14    | python -m json.tool
15
16# Confirm Redis cached the response
17docker compose exec redis redis-cli KEYS "gemini:*"
18
19# Clean shutdown (preserve volumes)
20docker compose down
21
22# Full cleanup including volumes and networks
23docker compose down --volumes --remove-orphans

docker compose ps is the first sanity check — every service should show healthy in the STATUS column. If gemini-app shows unhealthy, the application logs (docker compose logs gemini-app) almost always point at a missing GEMINI_API_KEY in .env.development. The curl exercises the whole chain in one shot: a successful JSON response proves networking, DNS, env vars, and proxy auth all line up. The follow-up redis-cli KEYS "gemini:*" confirms the app actually wrote to the cache rather than silently bypassing it.

You'll know it works when docker compose ps shows all three services as healthy, the curl to /generate returns a JSON body, and redis-cli KEYS "gemini:*" lists at least one cached entry.

Do's and Don'ts

Do's

✓Do declare a healthcheck on every service you depends_on — condition: service_healthy is only as good as the check it gates on; a missing healthcheck silently falls back to "container started" and your app races the proxy.
✓Do address services by name, not IP or localhost — http://api-proxy:9090 works inside the Compose network and migrates cleanly to a Kubernetes Service; hardcoded IPs and localhost do neither.
✓Do keep secrets in env_file: and topology in environment: — bulk config (API keys) lives in .env.development, while Compose-specific values like GEMINI_API_URL=http://api-proxy:9090 live in compose.yaml so the topology is self-documenting.

Don'ts

✗Don't expose internal services to the host unless you must — publishing Redis on 6379:6379 is fine for local debugging, but in any shared environment remove the ports: so only the app can reach the cache.
✗Don't run docker compose down --volumes casually — it deletes the redis-data volume and every cached response with it; reach for plain down unless you actually want a clean slate.
✗Don't skip --build after editing the app's source or Dockerfile — Compose reuses the tagged local image (gemini-app:local) and will restart the stack with stale code unless you force a rebuild.

Keep going with GenAI Agent Engineering

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.

Create a free account Subscribe — →