Gopi Trinadh Maddikunta

Vector Database Integration with pgvector

📅 October 18, 2025 – Vector Database

From Embeddings to Retrieval: Building LLM4S’s Semantic Memory

After completing the Embedding Engine in Phase 1, the next major milestone in my GSoC journey was to give LLM4S semantic memory — the ability to store embeddings, search them efficiently, and retrieve the most meaningful information when a query is issued.

This meant building a vector database layer, and for LLM4S we chose something open-source, battle-tested, and developer-friendly:

PostgreSQL + pgvector

This article walks through why pgvector, what the system does, how I implemented it phase-by-phase, where it fits inside LLM4S, and the challenges that forced architectural improvements.

Why pgvector? (Not FAISS, Not Milvus, Not Pinecone)

Most RAG systems default to FAISS or cloud-hosted vector stores.
We intentionally avoided both for three reasons:

1. Developer Experience

Every Scala/Java/Python developer already knows PostgreSQL.
Using pgvector meant zero new infrastructure learning curve.

2. Portability and Open Source

No vendor lock-in.
No API limits.
Everything runs locally or in production through the same interface.

3. Future Flexibility

pgvector allows:

cosine similarity
L2 distance
inner product
HNSW + IVFFLAT indexing

As LLM4S evolves, we can choose indexing strategies without rewriting the whole system.

Bottom line: pgvector gave us the right mix of control, performance, and simplicity.

What the Vector Database Layer Does

Phase 2 transformed LLM4S from “embedding-only” into a full retrieval system:

Stores document embeddings with metadata
Supports semantic search via similarity
Retrieves top-k relevant chunks
Combines filters + metadata + vector similarity
Is fully provider-agnostic (OpenAI/VoyageAI/local embeddings)

In short: this is the “memory backend” for all future RAG and agent workflows.

Architecture (WordPress-Safe ASCII Diagram)

┌─────────────────────────┐
│ Embedding Engine │
│ (OpenAI / VoyageAI) │
└─────────────┬───────────┘
│ vectors + metadata
▼
┌──────────────────────────────────────────────────────┐
│ pgvector Layer │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Embeddings Table │ │
│ │ id | doc_id | chunk_id | vector | metadata │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ Indexing: IVFFLAT / HNSW │
│ Similarity: cosine / L2 / inner product │
└───────────────┬──────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────┐
│ Retrieval Module │
│ – Embed query │
│ – Search top-k vectors │
│ – Rank by similarity │
│ – Return text chunks + metadata │
└─────────────────┬──────────────────────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ Downstream Integrations │
│ – RAG Context Builder │
│ – Agent Memory Retrieval │
│ – Conversation Context Injection │
└────────────────────────────────────────────┘

DBx: The Three-Phase Journey to LLM4S Semantic Retrieval

Phase 1 of my GSoC project delivered the Embedding Engine: the system that turns documents into clean, semantic vectors.

Phase 2 was about building the memory system that makes those vectors useful.

This became a three-phase architecture called DBx:

✅ DBx-Core — minimal vector storage

✅ DBx-Mid — indexing, performance, hybrid filters

✅ DBx-Full — full retrieval engine powering RAG + agents

In this article, I explain why DBx exists, what each phase does, how the system is designed, and where it fits inside LLM4S.

Why the Database Layer Needed Three Phases

I could have dumped embeddings into PostgreSQL and called it a day, but that would have failed long-term.

Enterprise RAG and agent memory require:

scalable storage
fast similarity search
metadata filtering
hybrid ranking
consistent dimensions
high recall and predictable latency

Trying to build everything in one step would have produced a fragile system.

So I broke it into three strict phases — DBx-Core → DBx-Mid → DBx-Full — each one building functionality and stability.

What DBx Actually Is

DBx is the structured, three-stage database layer for LLM4S, specifically designed to power:

semantic search
vector-based retrieval
RAG context building
agent memory systems
structured knowledge bases

Everything builds on top of the Embedding Engine from Phase 1.

✅ DBx-Core — Establishing the Foundation

Goal: Make LLM4S capable of storing embeddings reliably.

What I built:

A clean PostgreSQL schema
pgvector column with fixed dimensional enforcement
Metadata storage (JSONB)
Insertion API
Basic querying utilities
Duplicate prevention using vector hashing

Why it matters:
Without Core, everything else collapses. Stability first.

✅ DBx-Mid — Speed, Filters, Indexing

This is where the system becomes useful.

Key capabilities added:

IVFFLAT indexing for scalable similarity search
Optional HNSW for high-speed interactive retrieval
JSONB-based metadata filtering
Hybrid search: vector + keyword
Query latency improvements
Chunk pruning and batch operations

Why it matters:
Real RAG systems need fast search under load, not academic demos.

✅ DBx-Full — The Complete Retrieval Engine

This phase turns the vector database into a RAG brain.

Features added:

Query embedding → similarity search → top-k retrieval
Final scoring + ranking
RAG-ready context packaging
Agent memory injection
Failover behavior when search confidence is low
Clean API for any LLM4S developer to use

This is where semantic search becomes production-ready.

Where DBx Sits Inside LLM4S

DBx is the bridge between embeddings and intelligence.

1. RAG pipeline

Query → embed → retrieve chunks → feed LLM.

2. Agent memory

Agents pull historical context via DBx-Full.

3. Document QA

Chunk-level metadata empowers filtration (author, type, date).

4. Knowledge bases

A unified embedding + storage + retrieval pipeline.

DBx makes LLM4S capable of “recalling” information semantically.

Key Challenges (and Solutions)

1. Latency issues for large corpora

→ Tuned IVFFLAT lists + probes
→ Added caching + pruning

2. Schema evolution complexity

→ Migration scripts + backward compatibility

3. Providers with mismatched dimensions

→ Dimension registry + strict validation

4. Query failures under load

→ Retry logic + SQL-level safeguards

5. Designing APIs that are simple but powerful

→ Layered design (Core → Mid → Full)
→ Strong typing in Scala

Each push made DBx more robust and more maintainable.

✅ What DBx Unlocks

→ Fast, accurate semantic search
→ End-to-end RAG for LLM4S
→ Multi-step agent workflows
→ Stable knowledge base for enterprise use
→ Foundation for future multimodal retrieval

LLM4S now has true semantic memory.

Related Pull Requests

PR #246 — DBx Core: Initial scaffolding for a provider-agnostic Vector Store layer