Vector Database Integration with pgvector
π October 18, 2025 β Vector Database
From Embeddings to Retrieval: Building LLM4Sβs Semantic Memory
After completing the Embedding Engine in Phase 1, the next major milestone in my GSoC journey was to give LLM4S semantic memory β the ability to store embeddings, search them efficiently, and retrieve the most meaningful information when a query is issued.
This meant building a vector database layer, and for LLM4S we chose something open-source, battle-tested, and developer-friendly:
PostgreSQL + pgvector
This article walks through why pgvector, what the system does, how I implemented it phase-by-phase, where it fits inside LLM4S, and the challenges that forced architectural improvements.
Why pgvector? (Not FAISS, Not Milvus, Not Pinecone)
Most RAG systems default to FAISS or cloud-hosted vector stores.
We intentionally avoided both for three reasons:
1. Developer Experience
Every Scala/Java/Python developer already knows PostgreSQL.
Using pgvector meant zero new infrastructure learning curve.
2. Portability and Open Source
No vendor lock-in.
No API limits.
Everything runs locally or in production through the same interface.
3. Future Flexibility
pgvector allows:
cosine similarity
L2 distance
inner product
HNSW + IVFFLAT indexing
As LLM4S evolves, we can choose indexing strategies without rewriting the whole system.
Bottom line: pgvector gave us the right mix of control, performance, and simplicity.
What the Vector Database Layer Does
Phase 2 transformed LLM4S from βembedding-onlyβ into a full retrieval system:
Stores document embeddings with metadata
Supports semantic search via similarity
Retrieves top-k relevant chunks
Combines filters + metadata + vector similarity
Is fully provider-agnostic (OpenAI/VoyageAI/local embeddings)
In short: this is the βmemory backendβ for all future RAG and agent workflows.
Architecture (WordPress-Safe ASCII Diagram)
βββββββββββββββββββββββββββ
β Embedding Engine β
β (OpenAI / VoyageAI) β
βββββββββββββββ¬ββββββββββββ
β vectors + metadata
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β pgvector Layer β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Embeddings Table β β
β β id | doc_id | chunk_id | vector | metadata β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Indexing: IVFFLAT / HNSW β
β Similarity: cosine / L2 / inner product β
βββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Retrieval Module β
β – Embed query β
β – Search top-k vectors β
β – Rank by similarity β
β – Return text chunks + metadata β
βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β Downstream Integrations β
β – RAG Context Builder β
β – Agent Memory Retrieval β
β – Conversation Context Injection β
ββββββββββββββββββββββββββββββββββββββββββββββ
DBx: The Three-Phase Journey to LLM4S Semantic Retrieval
Phase 1 of my GSoC project delivered the Embedding Engine: the system that turns documents into clean, semantic vectors.
Phase 2 was about building the memory system that makes those vectors useful.
This became a three-phase architecture called DBx:
β DBx-Core β minimal vector storage
β DBx-Mid β indexing, performance, hybrid filters
β DBx-Full β full retrieval engine powering RAG + agents
In this article, I explain why DBx exists, what each phase does, how the system is designed, and where it fits inside LLM4S.
Why the Database Layer Needed Three Phases
I could have dumped embeddings into PostgreSQL and called it a day, but that would have failed long-term.
Enterprise RAG and agent memory require:
scalable storage
fast similarity search
metadata filtering
hybrid ranking
consistent dimensions
high recall and predictable latency
Trying to build everything in one step would have produced a fragile system.
So I broke it into three strict phases β DBx-Core β DBx-Mid β DBx-Full β each one building functionality and stability.
What DBx Actually Is
DBx is the structured, three-stage database layer for LLM4S, specifically designed to power:
semantic search
vector-based retrieval
RAG context building
agent memory systems
structured knowledge bases
Everything builds on top of the Embedding Engine from Phase 1.
β DBx-Core β Establishing the Foundation
Goal: Make LLM4S capable of storing embeddings reliably.
What I built:
A clean PostgreSQL schema
pgvector column with fixed dimensional enforcement
Metadata storage (JSONB)
Insertion API
Basic querying utilities
Duplicate prevention using vector hashing
Why it matters:
Without Core, everything else collapses. Stability first.
β DBx-Mid β Speed, Filters, Indexing
This is where the system becomes useful.
Key capabilities added:
IVFFLAT indexing for scalable similarity search
Optional HNSW for high-speed interactive retrieval
JSONB-based metadata filtering
Hybrid search: vector + keyword
Query latency improvements
Chunk pruning and batch operations
Why it matters:
Real RAG systems need fast search under load, not academic demos.
β DBx-Full β The Complete Retrieval Engine
This phase turns the vector database into a RAG brain.
Features added:
Query embedding β similarity search β top-k retrieval
Final scoring + ranking
RAG-ready context packaging
Agent memory injection
Failover behavior when search confidence is low
Clean API for any LLM4S developer to use
This is where semantic search becomes production-ready.
Where DBx Sits Inside LLM4S
DBx is the bridge between embeddings and intelligence.
1. RAG pipeline
Query β embed β retrieve chunks β feed LLM.
2. Agent memory
Agents pull historical context via DBx-Full.
3. Document QA
Chunk-level metadata empowers filtration (author, type, date).
4. Knowledge bases
A unified embedding + storage + retrieval pipeline.
DBx makes LLM4S capable of βrecallingβ information semantically.
Key Challenges (and Solutions)
1. Latency issues for large corpora
β Tuned IVFFLAT lists + probes
β Added caching + pruning
2. Schema evolution complexity
β Migration scripts + backward compatibility
3. Providers with mismatched dimensions
β Dimension registry + strict validation
4. Query failures under load
β Retry logic + SQL-level safeguards
5. Designing APIs that are simple but powerful
β Layered design (Core β Mid β Full)
β Strong typing in Scala
Each push made DBx more robust and more maintainable.
β What DBx Unlocks
β Fast, accurate semantic search
β End-to-end RAG for LLM4S
β Multi-step agent workflows
β Stable knowledge base for enterprise use
β Foundation for future multimodal retrieval
LLM4S now has true semantic memory.
Related Pull Requests
- PR #246 β DBx Core: Initial scaffolding for a provider-agnostic Vector Store layer