Free lesson
Integrate pgvector with SQLAlchemy ORM
You declare Vector columns on SQLAlchemy 2.0 models, run async sessions with asyncpg, and parameterise hybrid queries with proper plan caching.
~25 min read · Free to read — no subscription required.
Integrate pgvector with SQLAlchemy ORM
Most production GenAI codebases reach a point where raw psycopg.fetch calls multiply into hundreds of nearly-identical strings. SQLAlchemy is the standard Python answer to that sprawl, and the pgvector.sqlalchemy adapter teaches the ORM how to round-trip the vector type as a regular column. Done well, you get type-checked queries, automatic connection pooling, and migrations that survive code review. Done poorly, you ship N+1 queries that pull megabytes of vectors per page load. This section is about doing it well.
Introduction
When you scale a GenAI service past a few endpoints, the cost of skipping the ORM shows up as repeated SQL strings, missing connection pooling, and untyped query results that explode at runtime. Wiring pgvector.sqlalchemy into SQLAlchemy 2.0 lets vectors live alongside ordinary columns—same models, same migrations, same async session boundary—instead of forcing a parallel raw-SQL access path. Get this layer wrong and a single list query will pull megabytes of float arrays into every request, throttle the connection pool, and accidentally serialize embeddings to the browser. By the end you'll be able to declare a vector-bearing model with an HNSW index, query it with the typed expression API, exclude embeddings from list reads, and generate Alembic migrations that round-trip the Vector type without hand-editing.
Key Terminology
- pgvector.sqlalchemy adapter — the bridge package that registers
Vectoras a SQLAlchemy column type so the ORM round-tripsvector(N)like any other column instead of falling back to raw SQL. - HNSW index — a graph-based approximate-nearest-neighbour index pgvector exposes through
postgresql_using="hnsw"; without it declared on the column, similarity queries hit a sequential scan and collapse under production load. - AsyncSession — SQLAlchemy 2.0's
asyncunit-of-work scoped to a single FastAPI request; pairs withasync_sessionmaker(expire_on_commit=False)so post-commit attribute access does not trigger an extra round-trip. load_only— a query option that whitelists columns to materialise; on list reads it omits the 1536-dim embedding, which would otherwise pull ~6 KB per row across the wire for data the client never sees.- Alembic
render_itemhook — the autogen extension point that teaches Alembic to emitVector(N)and itsimportline so generated migrations apply without manual editing.
Concepts
Production Considerations
- Forgetting
pool_pre_ping=Trueproduces flaky errors after every database restart. - Naming a column
metadatainstead ofmetadata_collides withDeclarativeBase.metadataand theimportpasses silently — until query time when SQLAlchemy raisesInvalidRequestError. expire_on_commit=True(the default) makes every post-commit attribute access an extra round-trip; turn it off inasynccode.- Defining the model without the index works in dev (PostgreSQL falls back to seq scan) but blows up in prod under load.
- Returning ORM objects directly from a FastAPI endpoint serialises the embedding into JSON. Always go through a Pydantic response model that omits it.
Code Walkthrough
Now that you understand how the pgvector.sqlalchemy adapter, HNSW index, and AsyncSession fit together conceptually, let's wire them into a single working layer.
The model below registers Vector(1536) as a typed column and co-locates the HNSW index so it travels with every migration. Note that the Python attribute is named metadata_ while the database column is named "metadata" — this sidesteps the DeclarativeBase.metadata name collision that passes silently at import time but raises InvalidRequestError at query time. The async engine is configured alongside the model so the session factory is ready to use immediately.
Code snippetpython
1from datetime import datetime 2from uuid import UUID, uuid4 3from pgvector.sqlalchemy import Vector 4from sqlalchemy import String, DateTime, Index, func 5from sqlalchemy.dialects.postgresql import JSONB, UUID as PgUUID 6from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column 7from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine 8 9class Base(DeclarativeBase): 10 pass 11 12class Document(Base): 13 __tablename__ = "documents" 14 15 id: Mapped[UUID] = mapped_column(PgUUID(as_uuid=True), primary_key=True, default=uuid4) 16 content: Mapped[str] = mapped_column(String, nullable=False) 17 metadata_: Mapped[dict] = mapped_column("metadata", JSONB, default=dict) 18 embedding: Mapped[list[float]] = mapped_column(Vector(1536), nullable=False) 19 created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), server_default=func.now()) 20 21 __table_args__ = ( 22 Index( 23 "documents_embedding_hnsw_idx", 24 "embedding", 25 postgresql_using="hnsw", 26 postgresql_with={"m": 16, "ef_construction": 64}, 27 postgresql_ops={"embedding": "vector_cosine_ops"}, 28 ), 29 ) 30 31engine = create_async_engine( 32 "postgresql+asyncpg://user:pass@db:5432/vectordb", 33 pool_size=10, 34 max_overflow=10, 35 pool_pre_ping=True, 36) 37AsyncSessionLocal = async_sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)
pool_pre_ping=True issues a cheap probe before handing out a pooled connection, catching dropped connections after a database restart. Setting expire_on_commit=False keeps ORM objects readable after commit() — the SQLAlchemy default triggers a refresh round-trip on every post-commit attribute access, which stalls async handlers waiting on network I/O that could have been avoided entirely.
With the session factory in place, you can write typed similarity queries and list queries that leave the embedding on the database side of the wire:
Code snippetpython
1from sqlalchemy import select 2from sqlalchemy.orm import load_only 3from sqlalchemy.ext.asyncio import AsyncSession 4 5async def top_k(session: AsyncSession, qvec: list[float], k: int = 10): 6 stmt = ( 7 select(Document, Document.embedding.cosine_distance(qvec).label("distance")) 8 .order_by(Document.embedding.cosine_distance(qvec)) 9 .limit(k) 10 ) 11 result = await session.execute(stmt) 12 return result.all() 13 14async def list_documents(session: AsyncSession): 15 stmt = ( 16 select(Document) 17 .options(load_only(Document.id, Document.content, Document.metadata_)) 18 .limit(50) 19 ) 20 result = await session.execute(stmt) 21 return result.scalars().all()
cosine_distance(qvec) emits the SQL operator embedding <=> $1 — identical to a hand-written expression but with type checking and IDE autocomplete. The load_only call on the list query whitelists only the columns the client actually needs; a 1536-dim embedding is roughly 6 KB per row, so omitting it from a 50-row page response trims nearly 300 KB per request.
To keep Alembic migrations working, add a render_item hook in alembic/env.py that teaches the autogenerator to emit Vector(N) together with its import line. Import pgvector.sqlalchemy.Vector at the top of env.py, then implement render_item to inject from pgvector.sqlalchemy import Vector into the generated migration's import block. Without this hook, autogenerate either silently drops the column type or emits raw DDL that fails on the next alembic upgrade.
Confirm that running alembic revision --autogenerate -m "add documents" produces a migration file containing Vector(1536) and the HNSW index definition without any manual editing required.
Architecture: where the ORM sits
Code snippet mermaid
Loading diagram...
- Lines 1-12: A clear lane separation. Pydantic owns the wire shape, the ORM owns the persistence shape, the service layer owns the translation. Mixing these layers is how
Document.embeddingaccidentally ships back to the browser as a 1536-element float array.
Do's and Don'ts
Now that you have worked through the implementation, the practices below separate a durable approach from a fragile one.
Do's
- ✓Do name the Python attribute
metadata_and map it to the"metadata"column viamapped_column("metadata", JSONB)— theDeclarativeBase.metadataattribute exists on every ORM baseclass, so an attribute literally calledmetadatapasses silently atimporttime and raisesInvalidRequestErroronly when a query is executed. - ✓Do use
load_only(Document.id, Document.content, Document.metadata_)on every list query — a 1536-dimension embedding occupies roughly 6 KB per row, so includingembeddingin a 50-row page response transmits nearly 300 KB of float arrays that the client never displays;load_onlykeeps that data on the database side of the wire. - ✓Do register a
render_itemhook inalembic/env.pythat injectsfrom pgvector.sqlalchemy import Vectorand emitsVector(N)in autogenerated migrations — without it, Alembic either silently drops the vector column type or emits raw DDL that fails on the nextalembic upgrade, requiring manual editing of every migration file.
Don'ts
- ✗Don't leave
expire_on_commitat its SQLAlchemy default ofTrueinasync_sessionmaker— every post-commit()attribute access triggers a lazy-refresh round-trip over the network, blocking theasyncevent loop on I/O thatexpire_on_commit=Falseeliminates entirely. - ✗Don't write hand-coded
embedding <=> $1SQL strings for similarity queries instead ofDocument.embedding.cosine_distance(qvec)— the raw string bypasses SQLAlchemy's type-checking and IDE autocomplete, making dimension mismatches between the storedVector(1536)and the query vector silent until the database returns an error. - ✗Don't omit
pool_pre_ping=Truefromcreate_async_engine— without it, stale connections recycled after a PostgreSQL restart are handed out from the pool without validation, surfacing asOperationalErroron the first live query rather than being replaced proactively by the pre-ping probe.
Keep going with GenAI Platform Engineering
Create a free account to track your progress and open this lesson in the full learning view. Subscribe to unlock the entire path — every goal, the hands-on labs, quizzes, and your verifiable skill graph — from . Cancel anytime.