Preview lesson

Write a Python app that calls the Gemini API and returns structured responses

Build a Python CLI application that sends prompts to the Gemini API via the platform proxy and formats the response. This is the app you will containerize throughout the chapter.

Free to read — no subscription required.

Explore Complete Lesson

Write a Python application that calls the Gemini API to generate text

Introduction

When you wire a Python service to the Gemini API for the first time, the temptation is to drop the API key into a constant, paste a request into your route handler, and ship it. That works once — until you need to deploy it. Hardcoded secrets break the moment you containerize, ad-hoc error handling makes every Gemini hiccup look like a 500, and a per-request client init burns latency on every call. Teams that skip this discipline end up rebuilding the image to rotate a key and chasing intermittent failures that have no consistent error shape.

By the end of this lesson you'll be able to write a Python application that loads its Gemini configuration from environment variables, initializes the client exactly once via FastAPI's lifespan hook, sends prompts to the Gemini API, and returns structured JSON responses with predictable error semantics — ready to drop into a container in the next lesson.

Key Terminology

Gemini API — Google's HTTPS endpoint for the Gemini family of large language models. This lesson's application sends a prompt to it and returns the generated text; knowing its request/response shape is what makes the wrapper code below sensible.
google-generativeai — The official Python SDK that wraps the Gemini HTTP API, exposing GenerativeModel.generate_content for prompt-in / text-out calls. The GeminiClient class below is a thin layer over it.
FastAPI — The async Python web framework used here to expose a /generate endpoint so external callers can submit prompts over HTTP without speaking the Gemini SDK directly.
Pydantic BaseSettings — The configuration loader that reads the Gemini API key, model name, and timeout from environment variables and fails fast at startup if any required value is missing.
lifespan context — A FastAPI hook that runs setup and teardown around the app's lifetime. Used here to construct the Gemini client once at startup rather than on every request.

Concepts

Layered Architecture

The service splits into three layers, each with its own configuration concern. FastAPI handles HTTP and Pydantic validation; a GeminiClient class owns API communication and error translation; a Settings module loads configuration from environment variables. This separation isolates the secret (API key) from the request-handling logic, and lets the call shape (timeout, max tokens) be tuned without touching code.

Loading diagram...

If validation rejects the input — missing prompt, empty string, prompt over the cap — FastAPI returns a 422 before any Gemini call is made, which keeps quota usage predictable.

Configuration via Environment Variables

BaseSettings from pydantic_settings declares each config field with a type, a default, and numeric range validators. At startup it reads actual values from environment variables, falling back to a .env file in development. Required fields use ... as their default, so a missing GEMINI_API_KEY causes startup to fail immediately rather than producing an opaque runtime error on the first request. (See Code Walkthrough.)

Lifespan-Managed Client Initialization

FastAPI's lifespan async context manager runs once around the app's life. The example uses it to construct one GeminiClient and stash it on app.state. Every request reuses that client — and the SDK's underlying connection pool — instead of re-initializing on every call. This also means credential failures surface at startup, not during user traffic.

Error Translation at the Boundary

The Gemini SDK can throw on network timeouts, auth failures, and rate limits. The generate method wraps the call in try/except and re-raises as HTTPException(502), mapping every SDK failure mode to a single predictable HTTP response. A separate guard handles the case where Gemini returns a response with no parts (typically safety-filtered output) and produces the same 502 with a clear message instead of the SDK's opaque ValueError.

Code Walkthrough

This walkthrough demonstrates the four concepts together: environment-driven configuration via BaseSettings, lifespan-bootstrapped client initialization, request validation, and error translation at the boundary. The first snippet declares Settings; the second wires it into a FastAPI app with a GeminiClient.

Code snippetpython
1# settings.py
2from functools import lru_cache
3
4from pydantic import Field
5from pydantic_settings import BaseSettings, SettingsConfigDict
6
7class Settings(BaseSettings):
8    model_config = SettingsConfigDict(env_file=".env", case_sensitive=False)
9
10    gemini_api_key: str = Field(..., description="API key for Gemini")
11    gemini_model: str = Field(default="gemini-1.5-flash")
12    request_timeout: int = Field(default=30, ge=5, le=120)
13    max_output_tokens: int = Field(default=1024, ge=1, le=8192)
14    temperature: float = Field(default=0.7, ge=0.0, le=2.0)
15
16@lru_cache
17def get_settings() -> Settings:
18    return Settings()

gemini_api_key uses ... as its default, which makes it required — the app fails fast at startup if GEMINI_API_KEY is unset. The ge/le validators block invalid numeric values before they reach Gemini. @lru_cache ensures one shared Settings instance across the process.

Code snippetpython
1# main.py
2from contextlib import asynccontextmanager
3
4import google.generativeai as genai
5from fastapi import FastAPI, HTTPException
6from pydantic import BaseModel, Field
7
8from settings import get_settings
9
10class PromptRequest(BaseModel):
11    prompt: str = Field(..., min_length=1, max_length=10000)
12    temperature: float | None = Field(default=None, ge=0.0, le=2.0)
13
14class GeminiClient:
15    def __init__(self, settings):
16        genai.configure(api_key=settings.gemini_api_key)
17        self._model = genai.GenerativeModel(settings.gemini_model)
18        self._settings = settings
19
20    def generate(self, prompt: str, temperature: float | None = None) -> dict:
21        config = genai.types.GenerationConfig(
22            max_output_tokens=self._settings.max_output_tokens,
23            temperature=temperature or self._settings.temperature,
24        )
25        try:
26            response = self._model.generate_content(
27                prompt,
28                generation_config=config,
29                request_options={"timeout": self._settings.request_timeout},
30            )
31        except Exception as exc:
32            raise HTTPException(status_code=502, detail=str(exc)) from exc
33
34        if not response.parts:
35            raise HTTPException(status_code=502, detail="Empty response from Gemini")
36
37        return {
38            "model": self._settings.gemini_model,
39            "prompt": prompt,
40            "generated_text": response.text,
41        }
42
43@asynccontextmanager
44async def lifespan(app: FastAPI):
45    app.state.client = GeminiClient(get_settings())
46    yield
47
48app = FastAPI(title="Gemini LLM Service", lifespan=lifespan)
49
50@app.post("/generate")
51async def generate_text(request: PromptRequest):
52    return app.state.client.generate(
53        prompt=request.prompt,
54        temperature=request.temperature,
55    )
56
57@app.get("/health")
58async def health_check():
59    return {"status": "healthy"}

PromptRequest enforces a 1-to-10,000-character prompt at the edge, so an empty or oversized request never reaches Gemini. The lifespan context builds the GeminiClient once at startup and stores it on app.state. Inside generate, the try/except translates any SDK exception into HTTP 502, and the response.parts check catches the safety-filtered case before .text would raise. The /health endpoint returns without calling Gemini, giving container probes a cheap target.

You'll know it works when, after exporting GEMINI_API_KEY and running uvicorn main:app --reload, a POST to http://localhost:8000/generate with Content-Type: application/json and body {"prompt":"Say hello in one short sentence."} returns a 200 with a non-empty generated_text field.

Do's and Don'ts

Do's

✓Do load the API key from an environment variable — keeps the secret out of source control and lets you swap keys per environment without rebuilding the image.
✓Do initialize the Gemini client once in the lifespan hook — reuses the SDK's connection pool and surfaces credential failures at startup instead of mid-traffic.
✓Do translate SDK exceptions into HTTP 502 — gives callers a single, predictable failure mode whether the underlying issue was a timeout, auth failure, or rate limit.

Don'ts

✗Don't hardcode the API key or model name — every environment change becomes an image rebuild, and a leaked image leaks production credentials.
✗Don't call response.text without first checking response.parts — a safety-filtered Gemini response is a valid object with no parts, and .text raises an opaque ValueError.
✗Don't reuse the /generate endpoint for liveness probes — every probe burns Gemini quota; expose a separate /health route that returns without calling the API.

Everything in this lesson — plus the hands-on labs, quizzes, and your full learning path.

Explore Complete Lesson See plans — from →