Gopi Trinadh Maddikunta

Gopi Trinadh Maddikunta

Copyright @ 2025 GT Groups.
All rights are reserved.

Vector Database Integration with pgvector

πŸ“… October 18, 2025 – Vector Database

From Embeddings to Retrieval: Building LLM4S’s Semantic Memory

After completing the Embedding Engine in Phase 1, the next major milestone in my GSoC journey was to give LLM4S semantic memory β€” the ability to store embeddings, search them efficiently, and retrieve the most meaningful information when a query is issued.

This meant building a vector database layer, and for LLM4S we chose something open-source, battle-tested, and developer-friendly:

PostgreSQL + pgvector

This article walks through why pgvector, what the system does, how I implemented it phase-by-phase, where it fits inside LLM4S, and the challenges that forced architectural improvements.

Why pgvector? (Not FAISS, Not Milvus, Not Pinecone)

Most RAG systems default to FAISS or cloud-hosted vector stores.
We intentionally avoided both for three reasons:

1. Developer Experience

Every Scala/Java/Python developer already knows PostgreSQL.
Using pgvector meant zero new infrastructure learning curve.

2. Portability and Open Source

No vendor lock-in.
No API limits.
Everything runs locally or in production through the same interface.

3. Future Flexibility

pgvector allows:

  • cosine similarity

  • L2 distance

  • inner product

  • HNSW + IVFFLAT indexing

As LLM4S evolves, we can choose indexing strategies without rewriting the whole system.

Bottom line: pgvector gave us the right mix of control, performance, and simplicity.

What the Vector Database Layer Does

Phase 2 transformed LLM4S from β€œembedding-only” into a full retrieval system:

  • Stores document embeddings with metadata

  • Supports semantic search via similarity

  • Retrieves top-k relevant chunks

  • Combines filters + metadata + vector similarity

  • Is fully provider-agnostic (OpenAI/VoyageAI/local embeddings)

In short: this is the β€œmemory backend” for all future RAG and agent workflows.

Architecture (WordPress-Safe ASCII Diagram)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Embedding Engine β”‚
β”‚ (OpenAI / VoyageAI) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ vectors + metadata
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ pgvector Layer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Embeddings Table β”‚ β”‚
β”‚ β”‚ id | doc_id | chunk_id | vector | metadata β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚
β”‚ Indexing: IVFFLAT / HNSW β”‚
β”‚ Similarity: cosine / L2 / inner product β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Retrieval Module β”‚
β”‚ – Embed query β”‚
β”‚ – Search top-k vectors β”‚
β”‚ – Rank by similarity β”‚
β”‚ – Return text chunks + metadata β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Downstream Integrations β”‚
β”‚ – RAG Context Builder β”‚
β”‚ – Agent Memory Retrieval β”‚
β”‚ – Conversation Context Injection β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

DBx: The Three-Phase Journey to LLM4S Semantic Retrieval

Phase 1 of my GSoC project delivered the Embedding Engine: the system that turns documents into clean, semantic vectors.

Phase 2 was about building the memory system that makes those vectors useful.

This became a three-phase architecture called DBx:

βœ… DBx-Core β€” minimal vector storage
βœ… DBx-Mid β€” indexing, performance, hybrid filters
βœ… DBx-Full β€” full retrieval engine powering RAG + agents

In this article, I explain why DBx exists, what each phase does, how the system is designed, and where it fits inside LLM4S.

Why the Database Layer Needed Three Phases

I could have dumped embeddings into PostgreSQL and called it a day, but that would have failed long-term.

Enterprise RAG and agent memory require:

  • scalable storage

  • fast similarity search

  • metadata filtering

  • hybrid ranking

  • consistent dimensions

  • high recall and predictable latency

Trying to build everything in one step would have produced a fragile system.

So I broke it into three strict phases β€” DBx-Core β†’ DBx-Mid β†’ DBx-Full β€” each one building functionality and stability.

What DBx Actually Is

DBx is the structured, three-stage database layer for LLM4S, specifically designed to power:

  • semantic search

  • vector-based retrieval

  • RAG context building

  • agent memory systems

  • structured knowledge bases

Everything builds on top of the Embedding Engine from Phase 1.

βœ… DBx-Core β€” Establishing the Foundation

Goal: Make LLM4S capable of storing embeddings reliably.

What I built:

  • A clean PostgreSQL schema

  • pgvector column with fixed dimensional enforcement

  • Metadata storage (JSONB)

  • Insertion API

  • Basic querying utilities

  • Duplicate prevention using vector hashing

Why it matters:
Without Core, everything else collapses. Stability first.

βœ… DBx-Mid β€” Speed, Filters, Indexing

This is where the system becomes useful.

Key capabilities added:

  • IVFFLAT indexing for scalable similarity search

  • Optional HNSW for high-speed interactive retrieval

  • JSONB-based metadata filtering

  • Hybrid search: vector + keyword

  • Query latency improvements

  • Chunk pruning and batch operations

Why it matters:
Real RAG systems need fast search under load, not academic demos.

βœ… DBx-Full β€” The Complete Retrieval Engine

This phase turns the vector database into a RAG brain.

Features added:

  • Query embedding β†’ similarity search β†’ top-k retrieval

  • Final scoring + ranking

  • RAG-ready context packaging

  • Agent memory injection

  • Failover behavior when search confidence is low

  • Clean API for any LLM4S developer to use

This is where semantic search becomes production-ready.

Where DBx Sits Inside LLM4S

DBx is the bridge between embeddings and intelligence.

1. RAG pipeline

Query β†’ embed β†’ retrieve chunks β†’ feed LLM.

2. Agent memory

Agents pull historical context via DBx-Full.

3. Document QA

Chunk-level metadata empowers filtration (author, type, date).

4. Knowledge bases

A unified embedding + storage + retrieval pipeline.

DBx makes LLM4S capable of β€œrecalling” information semantically.

Key Challenges (and Solutions)

1. Latency issues for large corpora

β†’ Tuned IVFFLAT lists + probes
β†’ Added caching + pruning

2. Schema evolution complexity

β†’ Migration scripts + backward compatibility

3. Providers with mismatched dimensions

β†’ Dimension registry + strict validation

4. Query failures under load

β†’ Retry logic + SQL-level safeguards

5. Designing APIs that are simple but powerful

β†’ Layered design (Core β†’ Mid β†’ Full)
β†’ Strong typing in Scala

Each push made DBx more robust and more maintainable.

βœ… What DBx Unlocks

β†’ Fast, accurate semantic search
β†’ End-to-end RAG for LLM4S
β†’ Multi-step agent workflows
β†’ Stable knowledge base for enterprise use
β†’ Foundation for future multimodal retrieval

LLM4S now has true semantic memory.


Related Pull Requests

  • PR #246 β€” DBx Core: Initial scaffolding for a provider-agnostic Vector Store layer