On This Page
Prerequisites
-
Familiarity with PostgreSQL administration including table creation, indexing, and VACUUM operations
-
Understanding of vector embeddings and approximate nearest neighbor (ANN) search concepts
-
Experience deploying Helm charts on Kubernetes and managing persistent volume claims
-
Working knowledge of Prometheus metric types: counters, gauges, histograms, and summaries
-
Comfort writing Python with asyncio for concurrent I/O operations
-
Familiarity with LiteLLM for unified embedding model API access
-
Understanding of Argo Workflows for scheduling CronJobs and defining DAG-based workflows
-
Experience with Grafana dashboard creation including PromQL query authoring
-
Completion of Chapter 37 (FinOps Reporting and Governance) or equivalent knowledge of cost-aware infrastructure patterns
Learning Goals
-
Deploy pgvector and build embedding ingestion pipeline with operational monitoring
-
Deploy pgvector and build embedding ingestion pipeline with operational monitoring.You will start by deploying PostgreSQL with the pgvector extension inside your vCluster, creating embedding tables with HNSW indexes optimized for approximate nearest neighbor queries.
-
On top of this storage layer, you will build an EmbeddingIngestionPipeline that reads documents from a source queue, chunks them using a configurable strategy, calls an embedding model through LiteLLM, and stores the resulting vectors in pgvector.
-
Every stage of the pipeline will be instrumented from the start: throughput measured in documents per second, per-stage latency broken down across chunking, embedding, and storage, error counters categorized by type, and queue depth tracking to detect backpressure before it becomes an outage.
-
-
Implement pipeline throughput tracking and failure detection with alerting
-
Implement pipeline throughput tracking and failure detection with alerting.With the pipeline running, you will build the failure detection layer.
-
You will define detection criteria and alerting thresholds for each failure type, building a system that catches problems before they propagate downstream.
-
With the pipeline running, you will build the failure detection layer.
-
Embedding pipelines fail in ways that traditional monitoring misses: API rate limits that cause silent backpressure, pgvector connection failures during high-throughput bursts, and quality errors like dimension mismatches or NaN values that corrupt the index without raising exceptions.
-
-
Build reprocessing workflow for failed or stale embeddings
-
Build reprocessing workflow for failed or stale embeddings.Failed documents cannot simply be dropped.
-
You will build a reprocessing workflow backed by a failed_documents table that tracks every document that failed embedding along with its failure reason and retry history.
-
Failed documents cannot simply be dropped.
-
An Argo CronJob will implement exponential backoff retries -- 5 minutes, 30 minutes, 2 hours, then 24 hours -- ensuring transient failures are recovered automatically while persistent failures are escalated.
-
-
Create pipeline health dashboards with freshness SLA tracking
-
Create pipeline health dashboards with freshness SLA tracking.The final goal ties everything together into operational visibility.
-
You will build Grafana dashboards with panels for pipeline throughput with capacity lines, queue depth trends that should remain bounded, per-stage latency breakdowns, error rates by type, freshness SLA compliance percentages, and reprocessing queue age distributions.
-
You will define and track pipeline SLOs -- 99% of documents embedded within one hour of source update, less than 0.1% failure rate, and p95 per-document processing under 30 seconds -- and build capacity planning projections that forecast when current infrastructure will hit its limits.
-
The final goal ties everything together into operational visibility.
-