Free lesson
Write a Python app that calls the Gemini API and returns structured responses
Build a Python CLI application that sends prompts to the Gemini API via the platform proxy and formats the response. This is the app you will containerize throughout the chapter.
~25 min read · Free to read — no subscription required.
Write a Python application that calls the Gemini API to generate text
Introduction
When you wire a Python service to the Gemini API for the first time, the temptation is to drop the API key into a constant, paste a request into your route handler, and ship it. That works once — until you need to deploy it. Hardcoded secrets break the moment you containerize, ad-hoc error handling makes every Gemini hiccup look like a 500, and a per-request client init burns latency on every call. Teams that skip this discipline end up rebuilding the image to rotate a key and chasing intermittent failures that have no consistent error shape.
By the end of this lesson you'll be able to write a Python application that loads its Gemini configuration from environment variables, initializes the client exactly once via FastAPI's lifespan hook, sends prompts to the Gemini API, and returns structured JSON responses with predictable error semantics — ready to drop into a container in the next lesson.
Key Terminology
- Gemini API — Google's HTTPS endpoint for the Gemini family of large language models. This lesson's application sends a prompt to it and returns the generated text; knowing its request/response shape is what makes the wrapper code below sensible.
- google-generativeai — The official Python SDK that wraps the Gemini HTTP API, exposing
GenerativeModel.generate_contentfor prompt-in / text-out calls. TheGeminiClientclass below is a thin layer over it. - FastAPI — The async Python web framework used here to expose a
/generateendpoint so external callers can submit prompts over HTTP without speaking the Gemini SDK directly. - Pydantic BaseSettings — The configuration loader that reads the Gemini API key, model name, and timeout from environment variables and fails fast at startup if any required value is missing.
- lifespan context — A FastAPI hook that runs setup and teardown around the app's lifetime. Used here to construct the Gemini client once at startup rather than on every request.
Concepts
Layered Architecture
The service splits into three layers, each with its own configuration concern. FastAPI handles HTTP and Pydantic validation; a GeminiClient class owns API communication and error translation; a Settings module loads configuration from environment variables. This separation isolates the secret (API key) from the request-handling logic, and lets the call shape (timeout, max tokens) be tuned without touching code.
If validation rejects the input — missing prompt, empty string, prompt over the cap — FastAPI returns a 422 before any Gemini call is made, which keeps quota usage predictable.
Configuration via Environment Variables
BaseSettings from pydantic_settings declares each config field with a type, a default, and numeric range validators. At startup it reads actual values from environment variables, falling back to a .env file in development. Required fields use ... as their default, so a missing GEMINI_API_KEY causes startup to fail immediately rather than producing an opaque runtime error on the first request. (See Code Walkthrough.)
Lifespan-Managed Client Initialization
FastAPI's lifespan async context manager runs once around the app's life. The example uses it to construct one GeminiClient and stash it on app.state. Every request reuses that client — and the SDK's underlying connection pool — instead of re-initializing on every call. This also means credential failures surface at startup, not during user traffic.
Error Translation at the Boundary
The Gemini SDK can throw on network timeouts, auth failures, and rate limits. The generate method wraps the call in try/except and re-raises as HTTPException(502), mapping every SDK failure mode to a single predictable HTTP response. A separate guard handles the case where Gemini returns a response with no parts (typically safety-filtered output) and produces the same 502 with a clear message instead of the SDK's opaque ValueError.
Code Walkthrough
This walkthrough demonstrates the four concepts together: environment-driven configuration via BaseSettings, lifespan-bootstrapped client initialization, request validation, and error translation at the boundary. The first snippet declares Settings; the second wires it into a FastAPI app with a GeminiClient.
Code snippetpython
1# settings.py 2from functools import lru_cache 3 4from pydantic import Field 5from pydantic_settings import BaseSettings, SettingsConfigDict 6 7class Settings(BaseSettings): 8 model_config = SettingsConfigDict(env_file=".env", case_sensitive=False) 9 10 gemini_api_key: str = Field(..., description="API key for Gemini") 11 gemini_model: str = Field(default="gemini-1.5-flash") 12 request_timeout: int = Field(default=30, ge=5, le=120) 13 max_output_tokens: int = Field(default=1024, ge=1, le=8192) 14 temperature: float = Field(default=0.7, ge=0.0, le=2.0) 15 16@lru_cache 17def get_settings() -> Settings: 18 return Settings()
gemini_api_key uses ... as its default, which makes it required — the app fails fast at startup if GEMINI_API_KEY is unset. The ge/le validators block invalid numeric values before they reach Gemini. @lru_cache ensures one shared Settings instance across the process.
Code snippetpython
1# main.py 2from contextlib import asynccontextmanager 3 4import google.generativeai as genai 5from fastapi import FastAPI, HTTPException 6from pydantic import BaseModel, Field 7 8from settings import get_settings 9 10class PromptRequest(BaseModel): 11 prompt: str = Field(..., min_length=1, max_length=10000) 12 temperature: float | None = Field(default=None, ge=0.0, le=2.0) 13 14class GeminiClient: 15 def __init__(self, settings): 16 genai.configure(api_key=settings.gemini_api_key) 17 self._model = genai.GenerativeModel(settings.gemini_model) 18 self._settings = settings 19 20 def generate(self, prompt: str, temperature: float | None = None) -> dict: 21 config = genai.types.GenerationConfig( 22 max_output_tokens=self._settings.max_output_tokens, 23 temperature=temperature or self._settings.temperature, 24 ) 25 try: 26 response = self._model.generate_content( 27 prompt, 28 generation_config=config, 29 request_options={"timeout": self._settings.request_timeout}, 30 ) 31 except Exception as exc: 32 raise HTTPException(status_code=502, detail=str(exc)) from exc 33 34 if not response.parts: 35 raise HTTPException(status_code=502, detail="Empty response from Gemini") 36 37 return { 38 "model": self._settings.gemini_model, 39 "prompt": prompt, 40 "generated_text": response.text, 41 } 42 43@asynccontextmanager 44async def lifespan(app: FastAPI): 45 app.state.client = GeminiClient(get_settings()) 46 yield 47 48app = FastAPI(title="Gemini LLM Service", lifespan=lifespan) 49 50@app.post("/generate") 51async def generate_text(request: PromptRequest): 52 return app.state.client.generate( 53 prompt=request.prompt, 54 temperature=request.temperature, 55 ) 56 57@app.get("/health") 58async def health_check(): 59 return {"status": "healthy"}
PromptRequest enforces a 1-to-10,000-character prompt at the edge, so an empty or oversized request never reaches Gemini. The lifespan context builds the GeminiClient once at startup and stores it on app.state. Inside generate, the try/except translates any SDK exception into HTTP 502, and the response.parts check catches the safety-filtered case before .text would raise. The /health endpoint returns without calling Gemini, giving container probes a cheap target.
You'll know it works when, after exporting GEMINI_API_KEY and running uvicorn main:app --reload, a POST to http://localhost:8000/generate with Content-Type: application/json and body {"prompt":"Say hello in one short sentence."} returns a 200 with a non-empty generated_text field.
Do's and Don'ts
Do's
- ✓Do load the API key from an environment variable — keeps the secret out of source control and lets you swap keys per environment without rebuilding the image.
- ✓Do initialize the Gemini client once in the lifespan hook — reuses the SDK's connection pool and surfaces credential failures at startup instead of mid-traffic.
- ✓Do translate SDK exceptions into HTTP 502 — gives callers a single, predictable failure mode whether the underlying issue was a timeout, auth failure, or rate limit.
Don'ts
- ✗Don't hardcode the API key or model name — every environment change becomes an image rebuild, and a leaked image leaks production credentials.
- ✗Don't call
response.textwithout first checkingresponse.parts— a safety-filtered Gemini response is a valid object with no parts, and.textraises an opaqueValueError. - ✗Don't reuse the
/generateendpoint for liveness probes — every probe burns Gemini quota; expose a separate/healthroute that returns without calling the API.
Keep going with GenAI Agent Engineering
Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.