Free lesson

Integrate pgvector with SQLAlchemy ORM

You declare Vector columns on SQLAlchemy 2.0 models, run async sessions with asyncpg, and parameterise hybrid queries with proper plan caching.

~25 min read · Free to read — no subscription required.

Integrate pgvector with SQLAlchemy ORM

Most production GenAI codebases reach a point where raw psycopg.fetch calls multiply into hundreds of nearly-identical strings. SQLAlchemy is the standard Python answer to that sprawl, and the pgvector.sqlalchemy adapter teaches the ORM how to round-trip the vector type as a regular column. Done well, you get type-checked queries, automatic connection pooling, and migrations that survive code review. Done poorly, you ship N+1 queries that pull megabytes of vectors per page load. This section is about doing it well.

Introduction

When you scale a GenAI service past a few endpoints, the cost of skipping the ORM shows up as repeated SQL strings, missing connection pooling, and untyped query results that explode at runtime. Wiring pgvector.sqlalchemy into SQLAlchemy 2.0 lets vectors live alongside ordinary columns—same models, same migrations, same async session boundary—instead of forcing a parallel raw-SQL access path. Get this layer wrong and a single list query will pull megabytes of float arrays into every request, throttle the connection pool, and accidentally serialize embeddings to the browser. By the end you'll be able to declare a vector-bearing model with an HNSW index, query it with the typed expression API, exclude embeddings from list reads, and generate Alembic migrations that round-trip the Vector type without hand-editing.

Key Terminology

  • pgvector.sqlalchemy adapter — the bridge package that registers Vector as a SQLAlchemy column type so the ORM round-trips vector(N) like any other column instead of falling back to raw SQL.
  • HNSW index — a graph-based approximate-nearest-neighbour index pgvector exposes through postgresql_using="hnsw"; without it declared on the column, similarity queries hit a sequential scan and collapse under production load.
  • AsyncSession — SQLAlchemy 2.0's async unit-of-work scoped to a single FastAPI request; pairs with async_sessionmaker(expire_on_commit=False) so post-commit attribute access does not trigger an extra round-trip.
  • load_only — a query option that whitelists columns to materialise; on list reads it omits the 1536-dim embedding, which would otherwise pull ~6 KB per row across the wire for data the client never sees.
  • Alembic render_item hook — the autogen extension point that teaches Alembic to emit Vector(N) and its import line so generated migrations apply without manual editing.

Concepts

Production Considerations

  • Forgetting pool_pre_ping=True produces flaky errors after every database restart.
  • Naming a column metadata instead of metadata_ collides with DeclarativeBase.metadata and the import passes silently — until query time when SQLAlchemy raises InvalidRequestError.
  • expire_on_commit=True (the default) makes every post-commit attribute access an extra round-trip; turn it off in async code.
  • Defining the model without the index works in dev (PostgreSQL falls back to seq scan) but blows up in prod under load.
  • Returning ORM objects directly from a FastAPI endpoint serialises the embedding into JSON. Always go through a Pydantic response model that omits it.

Code Walkthrough

Now that you understand how the pgvector.sqlalchemy adapter, HNSW index, and AsyncSession fit together conceptually, let's wire them into a single working layer.

The model below registers Vector(1536) as a typed column and co-locates the HNSW index so it travels with every migration. Note that the Python attribute is named metadata_ while the database column is named "metadata" — this sidesteps the DeclarativeBase.metadata name collision that passes silently at import time but raises InvalidRequestError at query time. The async engine is configured alongside the model so the session factory is ready to use immediately.

Code snippetpython
1from datetime import datetime 2from uuid import UUID, uuid4 3from pgvector.sqlalchemy import Vector 4from sqlalchemy import String, DateTime, Index, func 5from sqlalchemy.dialects.postgresql import JSONB, UUID as PgUUID 6from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column 7from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine 8 9class Base(DeclarativeBase): 10 pass 11 12class Document(Base): 13 __tablename__ = "documents" 14 15 id: Mapped[UUID] = mapped_column(PgUUID(as_uuid=True), primary_key=True, default=uuid4) 16 content: Mapped[str] = mapped_column(String, nullable=False) 17 metadata_: Mapped[dict] = mapped_column("metadata", JSONB, default=dict) 18 embedding: Mapped[list[float]] = mapped_column(Vector(1536), nullable=False) 19 created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=func.now()) 20 21 __table_args__ = ( 22 Index( 23 "documents_embedding_hnsw_idx", 24 "embedding", 25 postgresql_using="hnsw", 26 postgresql_with={"m": 16, "ef_construction": 64}, 27 postgresql_ops={"embedding": "vector_cosine_ops"}, 28 ), 29 ) 30 31engine = create_async_engine( 32 "postgresql+asyncpg://user:pass@db:5432/vectordb", 33 pool_size=10, 34 max_overflow=10, 35 pool_pre_ping=True, 36) 37AsyncSessionLocal = async_sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)

pool_pre_ping=True issues a cheap probe before handing out a pooled connection, catching dropped connections after a database restart. Setting expire_on_commit=False keeps ORM objects readable after commit() — the SQLAlchemy default triggers a refresh round-trip on every post-commit attribute access, which stalls async handlers waiting on network I/O that could have been avoided entirely.

With the session factory in place, you can write typed similarity queries and list queries that leave the embedding on the database side of the wire:

Code snippetpython
1from sqlalchemy import select 2from sqlalchemy.orm import load_only 3from sqlalchemy.ext.asyncio import AsyncSession 4 5async def top_k(session: AsyncSession, qvec: list[float], k: int = 10): 6 stmt = ( 7 select(Document, Document.embedding.cosine_distance(qvec).label("distance")) 8 .order_by(Document.embedding.cosine_distance(qvec)) 9 .limit(k) 10 ) 11 result = await session.execute(stmt) 12 return result.all() 13 14async def list_documents(session: AsyncSession): 15 stmt = ( 16 select(Document) 17 .options(load_only(Document.id, Document.content, Document.metadata_)) 18 .limit(50) 19 ) 20 result = await session.execute(stmt) 21 return result.scalars().all()

cosine_distance(qvec) emits the SQL operator embedding <=> $1 — identical to a hand-written expression but with type checking and IDE autocomplete. The load_only call on the list query whitelists only the columns the client actually needs; a 1536-dim embedding is roughly 6 KB per row, so omitting it from a 50-row page response trims nearly 300 KB per request.

To keep Alembic migrations working, add a render_item hook in alembic/env.py that teaches the autogenerator to emit Vector(N) together with its import line. Import pgvector.sqlalchemy.Vector at the top of env.py, then implement render_item to inject from pgvector.sqlalchemy import Vector into the generated migration's import block. Without this hook, autogenerate either silently drops the column type or emits raw DDL that fails on the next alembic upgrade.

Confirm that running alembic revision --autogenerate -m "add documents" produces a migration file containing Vector(1536) and the HNSW index definition without any manual editing required.

Architecture: where the ORM sits

Code snippet mermaid
Loading diagram...
  • Lines 1-12: A clear lane separation. Pydantic owns the wire shape, the ORM owns the persistence shape, the service layer owns the translation. Mixing these layers is how Document.embedding accidentally ships back to the browser as a 1536-element float array.

Do's and Don'ts

Now that you have worked through the implementation, the practices below separate a durable approach from a fragile one.

Do's

  1. Do name the Python attribute metadata_ and map it to the "metadata" column via mapped_column("metadata", JSONB) — the DeclarativeBase.metadata attribute exists on every ORM base class, so an attribute literally called metadata passes silently at import time and raises InvalidRequestError only when a query is executed.
  2. Do use load_only(Document.id, Document.content, Document.metadata_) on every list query — a 1536-dimension embedding occupies roughly 6 KB per row, so including embedding in a 50-row page response transmits nearly 300 KB of float arrays that the client never displays; load_only keeps that data on the database side of the wire.
  3. Do register a render_item hook in alembic/env.py that injects from pgvector.sqlalchemy import Vector and emits Vector(N) in autogenerated migrations — without it, Alembic either silently drops the vector column type or emits raw DDL that fails on the next alembic upgrade, requiring manual editing of every migration file.

Don'ts

  1. Don't leave expire_on_commit at its SQLAlchemy default of True in async_sessionmaker — every post-commit() attribute access triggers a lazy-refresh round-trip over the network, blocking the async event loop on I/O that expire_on_commit=False eliminates entirely.
  2. Don't write hand-coded embedding <=> $1 SQL strings for similarity queries instead of Document.embedding.cosine_distance(qvec) — the raw string bypasses SQLAlchemy's type-checking and IDE autocomplete, making dimension mismatches between the stored Vector(1536) and the query vector silent until the database returns an error.
  3. Don't omit pool_pre_ping=True from create_async_engine — without it, stale connections recycled after a PostgreSQL restart are handed out from the pool without validation, surfacing as OperationalError on the first live query rather than being replaced proactively by the pre-ping probe.

Keep going with GenAI Platform Engineering

Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.