Free lesson
Tag images with semantic versions and push to a container registry
Learn image tagging strategies (latest, semver, git-sha) and push your LLM app image to a container registry so Kubernetes can pull it.
~25 min read · Free to read — no subscription required.
Use Docker Compose to run an LLM app with a local proxy service
Introduction
When you ship an LLM app to production, the model client is never alone — it talks to a proxy that handles rate limiting and key rotation, a cache that absorbs repeat prompts, and sometimes a monitoring sidecar. Wiring these together with raw docker run commands becomes unmanageable after the second service: you forget the network flag, mismatch a hostname, or start the app before the proxy is ready and watch every first request 502. Docker Compose fixes this by declaring the entire stack — services, networks, volumes, health checks, startup order — in a single compose.yaml. By the end of this lesson you'll be able to write a Compose file for a Gemini-backed LLM app talking to a local proxy and a Redis cache, bring the stack up with a single command, and verify each service is healthy before sending traffic.
Key Terminology
- service — a named container definition inside
compose.yaml; each service becomes one or more containers sharing the same image, env vars, and network attachments, and is reachable by its service name on the Compose network. - compose network — the user-defined bridge network Compose creates so services can reach each other by service name as a DNS hostname (e.g. the app calls
http://api-proxy:9090), with no IPs or--linkflags needed. - health check — a command Compose runs inside a container at intervals; the result (
healthy/unhealthy) drivesdepends_on: condition: service_healthyso dependents only start once their supports are ready. - named volume — a Docker-managed storage object referenced by name (e.g.
redis-data) that survivesdocker compose down; onlydown --volumesdeletes it, so cached LLM responses persist across restarts. - local proxy service — a sidecar container (here Nginx) sitting between the app and the external model API, giving you a stable internal hostname, retry/rate-limit policy, and a swap-point for mocking the model in tests.
Concepts
Why a multi-service topology for LLM apps
The Gemini API has per-key rate limits, charges per token, and occasionally returns transient errors. Putting a proxy between your app and the external API gives you one place to retry, throttle, and rotate keys — and a clean swap-point to switch the backend from real Gemini to a mock during testing, without touching application code. The platform proxy used in this course's labs is exactly this pattern. A Redis cache absorbs repeat prompts so identical requests don't cost a token round-trip. The concrete wiring of these services lives in the Code Walkthrough.
All three services live on the same Compose-managed network (llm-net) and address each other by service name. No hardcoded IPs, no localhost. When you migrate this stack to Kubernetes in later chapters, those service names map directly to Kubernetes Service objects, so the topology survives the transition almost unchanged.
Declarative orchestration: health checks and startup order
The two Compose primitives that turn a pile of containers into a reliable stack are healthcheck and depends_on: condition: service_healthy. Without conditions, Compose starts every service at once — the app tries to call the proxy before Nginx has finished parsing its config, and the first wave of requests fails with connection refused. With service_healthy, Compose blocks the app's start until the proxy's wget --spider /healthz and Redis's redis-cli ping both pass. The payoff is a stack that comes up clean every time, and a single command (docker compose up) you can safely put in onboarding docs.
Code Walkthrough
The two files below turn the architecture above into a runnable stack: a compose.yaml declaring the three services with their health checks and startup conditions, and the lifecycle commands you use to bring the stack up, inspect it, and tear it down.
Code snippetyaml
1name: gemini-llm-stack 2 3services: 4 api-proxy: 5 image: nginx:1.25-alpine 6 ports: 7 - "9090:9090" 8 volumes: 9 - ./proxy/nginx.conf:/etc/nginx/conf.d/default.conf:ro 10 healthcheck: 11 test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/healthz"] 12 interval: 10s 13 timeout: 3s 14 retries: 3 15 start_period: 5s 16 networks: [llm-net] 17 18 redis: 19 image: redis:7-alpine 20 command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru 21 volumes: 22 - redis-data:/data 23 healthcheck: 24 test: ["CMD", "redis-cli", "ping"] 25 interval: 10s 26 timeout: 3s 27 retries: 3 28 networks: [llm-net] 29 30 gemini-app: 31 build: 32 context: . 33 dockerfile: Dockerfile 34 image: gemini-app:local 35 ports: 36 - "8000:8000" 37 env_file: 38 - .env.development 39 environment: 40 - GEMINI_API_URL=http://api-proxy:9090 41 - REDIS_URL=redis://redis:6379/0 42 depends_on: 43 api-proxy: 44 condition: service_healthy 45 redis: 46 condition: service_healthy 47 healthcheck: 48 test: ["CMD", "python", "-c", 49 "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"] 50 interval: 30s 51 timeout: 5s 52 retries: 3 53 start_period: 10s 54 networks: [llm-net] 55 56networks: 57 llm-net: 58 driver: bridge 59 60volumes: 61 redis-data:
The name: field gives the project a stable prefix so two developers running this from differently named directories don't clobber each other's containers. The api-proxy service bind-mounts nginx.conf read-only (:ro); its health check uses wget --spider because curl isn't in the Alpine base image. The redis service's command: override caps memory at 128 MB with an LRU eviction policy — ideal for caching repeat LLM prompts. The gemini-app service is the orchestration payoff: depends_on with condition: service_healthy blocks startup until both supports pass their checks, and the environment block addresses the proxy and Redis by service name — Compose's internal DNS resolves them to the right container IP on the llm-net bridge.
Code snippetbash
1# Build images and start the stack in the background 2docker compose up --build --detach 3 4# Verify all three services are healthy 5docker compose ps 6 7# Stream logs from all services 8docker compose logs --follow --tail 50 9 10# Exercise the full request path: client -> app -> proxy -> Gemini 11curl -s -X POST http://localhost:8000/generate \ 12 -H "Content-Type: application/json" \ 13 -d '{"prompt": "What is a container image?"}' \ 14 | python -m json.tool 15 16# Confirm Redis cached the response 17docker compose exec redis redis-cli KEYS "gemini:*" 18 19# Clean shutdown (preserve volumes) 20docker compose down 21 22# Full cleanup including volumes and networks 23docker compose down --volumes --remove-orphans
docker compose ps is the first sanity check — every service should show healthy in the STATUS column. If gemini-app shows unhealthy, the application logs (docker compose logs gemini-app) almost always point at a missing GEMINI_API_KEY in .env.development. The curl exercises the whole chain in one shot: a successful JSON response proves networking, DNS, env vars, and proxy auth all line up. The follow-up redis-cli KEYS "gemini:*" confirms the app actually wrote to the cache rather than silently bypassing it.
You'll know it works when docker compose ps shows all three services as healthy, the curl to /generate returns a JSON body, and redis-cli KEYS "gemini:*" lists at least one cached entry.
Do's and Don'ts
Do's
- ✓Do declare a
healthcheckon every service youdepends_on—condition: service_healthyis only as good as the check it gates on; a missinghealthchecksilently falls back to "container started" and your app races the proxy. - ✓Do address services by name, not IP or
localhost—http://api-proxy:9090works inside the Compose network and migrates cleanly to a KubernetesService; hardcoded IPs andlocalhostdo neither. - ✓Do keep secrets in
env_file:and topology inenvironment:— bulk config (API keys) lives in.env.development, while Compose-specific values likeGEMINI_API_URL=http://api-proxy:9090live incompose.yamlso the topology is self-documenting.
Don'ts
- ✗Don't expose internal services to the host unless you must — publishing Redis on
6379:6379is fine for local debugging, but in any shared environment remove theports:so only the app can reach the cache. - ✗Don't run
docker compose down --volumescasually — it deletes theredis-datavolume and every cached response with it; reach for plaindownunless you actually want a clean slate. - ✗Don't skip
--buildafter editing the app's source or Dockerfile — Compose reuses the tagged local image (gemini-app:local) and will restart the stack with stale code unless you force a rebuild.
Keep going with GenAI Agent Engineering
Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.