BUD
The complete engineering guide to building a self-hosted, multi-platform AI assistant from scratch — no LangChain, no abstractions, every single line explained.
Built by Gopi Trinadh · Learned from Dr. Raj Dandekar of Vizuara
First Principles — What Even Is an AI Agent?
Before writing a single line of code, let’s ground ourselves in the foundational concepts. If you understand these five ideas, you can build anything.
Chatbot vs. Agent — The Core Distinction
A chatbot is a text-in, text-out function. It has no memory, no tools, no autonomy. Ask it today and it won’t remember tomorrow. It can’t check your calendar, file a bug, or search your documents.
An agent operates on a fundamentally different loop. It perceives (receives events from Slack, Discord, Teams, Web), reasons (decides what action to take), acts (calls tools via MCP), and remembers (persists context across sessions). BUD is an agent.
How LLMs Actually Work (The 60-Second Version)
Large Language Models like Claude are neural networks trained on massive text data. Given a sequence of tokens, they predict the next most probable token. At scale — billions of parameters — this simple mechanism produces reasoning, code, tool invocations, and nuanced conversation.
The critical insight for agent builders: LLMs don’t execute code or call APIs. They generate structured text (JSON) that describes which tool to call with what arguments. Your code then executes the tool and feeds the result back. This is the “agent loop.”
The LLM is the brain, not the hands. It decides what to do but cannot act alone. Your code is the body — the hands that call APIs, the eyes that read databases, the memory that persists between conversations. Building an agent means building this body.
What Are Embeddings?
Embeddings convert text into dense numerical vectors — arrays of floating-point numbers that encode semantic meaning. Similar meanings produce vectors that are geometrically close in high-dimensional space. This is how BUD can search conversations by meaning, not just keywords.
BUD uses all-MiniLM-L6-v2 from sentence-transformers — runs locally with zero API cost, produces 384-dimensional vectors, sub-50ms latency. These power the RAG pipeline.
Model Context Protocol (MCP)
MCP is what makes BUD’s tool system scalable. Instead of hardcoding every integration, MCP defines a standard protocol (JSON-RPC over stdio) where tool servers announce their capabilities and the AI client discovers them automatically. Think USB for AI tools — plug any MCP server into any MCP client.
Why Build From Scratch? (No LangChain, No Frameworks)
BUD deliberately avoids LangChain, LlamaIndex, CrewAI, and every other framework. Not because they’re bad — because the goal is understanding. When you build the agent loop yourself, you understand exactly why retry logic matters, why memory needs layers, why RAG needs chunking strategies. In an interview, you can explain every component because you built every component.
Depth over convenience. Every file in this project is a learning resource. Every design decision is documented with the “why”, not just the “what”. A framework hides complexity. BUD exposes it.
System Architecture — The Complete Blueprint
BUD’s architecture follows five principles: educational-first, minimal folders, graceful degradation, file-based transparency, and single-process async.
Project File Structure
Graceful Degradation: If GitHub MCP is down, BUD still works — it just can’t create issues. If RAG is empty, Claude answers from its own knowledge. If Slack is disconnected, Discord keeps running. Nothing crashes the whole system.
The Agent Loop — Heart of BUD
Everything runs through one function: process_message(). This is the brain of the entire system.
The Core Code — agent/core.py
This is the most important file in the entire project. Let’s walk through the key function:
async def process_message(self, user_msg: str, channel: str, user_id: str) -> str:
# Step 1: Classify intent — avoid expensive API calls for simple messages
intent = classify_intent(user_msg)
if intent == "greeting":
return random_greeting(user_id) # $0 — no API call needed
# Step 2: Assemble context from all memory layers
memory_context = await self.memory.get_context(user_id, channel)
rag_results = await self.rag.search(user_msg, top_k=5)
tools = await self.mcp_manager.get_all_tools()
# Step 3: Build the system prompt
system = build_system_prompt(
soul=self.memory.soul,
memory=memory_context,
rag=rag_results,
tools_hint=summarize_tools(tools)
)
# Step 4: Call Claude with full context
messages = [{"role": "user", "content": user_msg}]
# Step 5: The Tool Loop — up to MAX_TOOL_ROUNDS iterations
for round in range(5): # Safety cap: max 5 rounds
response = await self.call_claude(
system=system,
messages=messages,
tools=tools,
timeout=30.0 # Triple-timeout protection
)
# If Claude returns text → we're done!
if response.stop_reason == "end_turn":
final_text = response.content[0].text
break
# If Claude returns tool_use → execute and loop
if response.stop_reason == "tool_use":
tool_call = extract_tool_call(response)
result = await self.mcp_manager.execute(
tool_call.name, tool_call.input
)
# Feed result back to Claude for next iteration
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": [tool_result_block]})
# Step 6: Remember this interaction
await self.memory.update(user_id, user_msg, final_text)
await self.rag.index_message(user_msg, channel)
return final_text
Triple-timeout protection: Per-API call timeout (30s), per-tool execution timeout (15s), and overall message timeout (60s). If any tier fires, BUD returns a graceful fallback message instead of hanging.
Intent Classification — Saving 85% on API Costs
Before calling Claude’s API ($3/MTok for Opus), BUD checks if the message even needs an LLM. Simple greetings, thank-yous, and status checks are handled by a zero-cost heuristic classifier:
def classify_intent(text: str) -> str:
lower = text.lower().strip()
# Greetings — no API needed
greetings = {"hi", "hello", "hey", "morning", "sup", "yo"}
if lower in greetings:
return "greeting"
# Tool-required tasks — needs Claude + MCP
tool_markers = ["create issue", "github", "notion", "schedule", "remind"]
if any(m in lower for m in tool_markers):
return "tool_task"
# Complex questions — needs Claude
if len(text.split()) > 5 or "?" in text:
return "complex"
return "simple" # Use Haiku (cheaper model)
This simple heuristic routes ~40% of messages away from the expensive API entirely, and another ~30% to the cheaper Haiku model, achieving that 85% cost reduction — from $15/month to $2/month.
MCP Servers & Tool System
BUD runs 4 MCP servers offering 24+ tools. Each server is a standalone process communicating via JSON-RPC over stdio.
GitHub MCP
Create issues, search repos, list PRs, get file contents, manage labels. 8 tools.
Notion MCP
Create pages, query databases, search workspace, update blocks. 6 tools.
Slack Tools MCP
Search messages, get channel history, list channels, send scheduled messages. 5 tools.
File Operations MCP
Read, write, list, search files on local filesystem. 5 tools.
How MCP Discovery Works
# BUD starts each MCP server as a subprocess
async def connect_mcp_server(self, server_path: str):
process = await asyncio.create_subprocess_exec(
"python", server_path,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE
)
# Step 1: Ask the server what tools it has
request = {"jsonrpc": "2.0", "method": "tools/list", "id": 1}
tools = await self.send_rpc(process, request)
# Step 2: Register all discovered tools
for tool in tools:
self.registry[tool["name"]] = {
"process": process,
"schema": tool["inputSchema"]
}
# Now Claude can see and use these tools automatically!
Adding a new integration takes minutes: write a new MCP server, drop it in mcp_servers/, and BUD auto-discovers it on next restart. No changes to core code. No new API wrappers. Just plug and play.
The 5-Layer Memory System
Human memory isn’t one thing — it’s multiple systems. BUD mirrors this with five complementary layers.
Why File-Based Memory?
Every memory file is plain Markdown you can read, edit, and version control. No hidden databases, no opaque vector stores for core identity. If BUD starts giving weird responses, you can open MEMORY.md in a text editor and see exactly what it “knows.” This is radical transparency by design.
# data/memory/SOUL.md — BUD's personality
## Core Identity
You are BUD, a helpful AI assistant built by Gopi Trinadh.
You are concise, proactive, and technically precise.
## Communication Style
- Use Slack-native formatting (bold, code blocks, lists)
- Be brief unless the user asks for detail
- Always confirm before destructive actions (deleting, overwriting)
## Boundaries
- Never share API keys or tokens in chat
- Redirect medical/legal questions to professionals
- If unsure, say so — don't hallucinate
RAG Pipeline — Search by Meaning
Retrieval-Augmented Generation lets BUD search past conversations and documents semantically — not by keywords, but by meaning.
The RAG Engine Code
from sentence_transformers import SentenceTransformer
import chromadb
class RAGEngine:
def __init__(self):
# Local model — runs on CPU, zero API cost
self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
self.db = chromadb.PersistentClient(path="data/chromadb")
self.collection = self.db.get_or_create_collection("messages")
async def index_message(self, text: str, channel: str):
embedding = self.embedder.encode(text).tolist()
self.collection.add(
documents=[text],
embeddings=[embedding],
metadatas=[{"channel": channel, "ts": time()}],
ids=[str(uuid4())]
)
async def search(self, query: str, top_k: int = 5) -> list:
embedding = self.embedder.encode(query).tolist()
results = self.collection.query(
query_embeddings=[embedding],
n_results=top_k
)
return results["documents"][0] # Top matching messages
| Metric | Value | Detail |
|---|---|---|
| Embedding Model | all-MiniLM-L6-v2 | 384 dimensions, 22M params |
| Vector DB | ChromaDB (local) | Persistent, file-backed |
| Index Capacity | 1,000+ messages | Scales linearly |
| Retrieval Accuracy | 92% | Top-5 relevance |
| Search Latency | <50ms (P95: 45ms) | On Mac Mini M4 |
| API Cost | $0 | Everything runs locally |
Multi-Platform Integration
BUD serves 5 platforms simultaneously from a single async process using asyncio.gather() — no microservices needed.
Slack
Socket Mode — persistent WebSocket. No public URL needed. Reacts to mentions, DMs, and threads.
Discord
Discord.py gateway — real-time events. Supports slash commands, mentions, and DMs.
Microsoft Teams
Bot Framework SDK — works inside Teams channels and 1:1 chats. Enterprise-ready.
Google Chat
Pub/Sub or HTTP handler. Works in Google Workspace environments.
Web UI
FastAPI + WebSocket — browser-based chat with real-time streaming and voice I/O.
The Single-Process Architecture
# main.py — BUD's entry point
async def main():
agent = await create_agent() # Initialize core + MCP + RAG
# Launch ALL platforms simultaneously
await asyncio.gather(
start_slack(agent), # Slack Socket Mode
start_discord(agent), # Discord gateway
start_teams(agent), # Teams bot framework
start_google_chat(agent), # Google Chat handler
start_web(agent), # FastAPI + WebSocket
start_heartbeat(agent), # Scheduler for cron tasks
return_exceptions=True # Crash isolation!
)
asyncio.run(main())
return_exceptions=True is the secret weapon. If Discord crashes, Slack keeps running. If Teams throws an error, the web UI is unaffected. Each platform is an independent coroutine. This is production resilience without Kubernetes complexity.
Code Deep-Dive — Selected Components
Let’s examine the trickiest parts: the streaming WebSocket, the memory manager, and the system prompt builder.
Real-Time Token Streaming (WebSocket)
The web UI doesn’t wait for the full response — it streams tokens as Claude generates them, creating a “typing” effect:
@app.websocket("/ws")
async def websocket_endpoint(ws: WebSocket):
await ws.accept()
while True:
user_msg = await ws.receive_text()
# Stream response token-by-token
async with client.messages.stream(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": user_msg}],
system=system_prompt,
max_tokens=4096
) as stream:
async for text in stream.text_stream:
await ws.send_json({
"type": "token",
"content": text
})
await ws.send_json({"type": "done"})
System Prompt Builder — Dynamic Context Assembly
def build_system_prompt(soul, memory, rag, tools_hint) -> str:
sections = [
soul, # SOUL.md — who you are
f"\n## Current Knowledge\n{memory}", # MEMORY.md
]
if rag:
sections.append(
f"\n## Relevant Context (from past conversations)\n"
+ "\n".join(f"- {r}" for r in rag)
)
if tools_hint:
sections.append(
f"\n## Available Tools\n{tools_hint}"
)
return "\n".join(sections)
Deploying BUD on a Mac Mini
The Mac Mini M4 is the recommended hardware: silent, 12W power draw, developer-friendly, and powerful enough for all BUD workloads.
Mac Mini M4 — BUD’s Recommended Home
Silent operation. 12W idle. Always-on AI assistant server.
Why Dedicated Hardware Over Cloud?
| Factor | Cloud (AWS/GCP) | Mac Mini |
|---|---|---|
| Monthly Cost | $15–50/month recurring | $0 after purchase |
| Latency | Variable (region-dependent) | Consistent, local |
| Data Privacy | Data leaves your network | Everything stays home |
| Control | Provider-limited | Full root access |
| Setup Time | Minutes | ~2 hours (one-time) |
Step-by-Step Mac Mini Setup
Install Developer Tools
# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install Python 3.12, Node.js, Git
brew install python@3.12 node git
Clone & Configure BUD
git clone https://github.com/GOPITRINADH3561/Project_OpenClaw.git
cd Project_OpenClaw
# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
nano .env # Add ANTHROPIC_API_KEY, SLACK_BOT_TOKEN, etc.
Test Run
python main.py
# You should see:
# ✅ Agent core initialized
# ✅ MCP servers connected (24 tools)
# ✅ RAG engine loaded (all-MiniLM-L6-v2)
# ✅ Slack connected
# ✅ Discord connected
# ✅ Web UI at http://localhost:8080
Auto-Start on Boot with launchd
<!-- ~/Library/LaunchAgents/com.bud.assistant.plist -->
<?xml version="1.0"?>
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.bud.assistant</string>
<key>ProgramArguments</key>
<array>
<string>/Users/you/Project_OpenClaw/.venv/bin/python</string>
<string>/Users/you/Project_OpenClaw/main.py</string>
</array>
<key>RunAtLoad</key> <true/>
<key>KeepAlive</key> <true/>
</dict>
</plist>
# Load the service — BUD starts on every boot
launchctl load ~/Library/LaunchAgents/com.bud.assistant.plist
Remote Access with Tailscale
# Install Tailscale — access BUD from anywhere
brew install tailscale
sudo tailscale up
# Now access BUD's Web UI from any device:
# http://100.x.y.z:8080
Benchmarked with 5 platforms active and 1,000+ indexed messages:
| Operation | Average | P95 |
|---|---|---|
| Simple message (no tools) | 1.5s | 3.0s |
| Single tool call | 3.5s | 6.0s |
| Multi-tool complex task | 6.0s | 10.0s |
| RAG retrieval (5 results) | 25ms | 45ms |
| Intent classification | <1ms | 1ms |
| Bot startup (warm) | 3s | 6s |
| Idle RAM usage | 180 MB | — |
Cost Analysis & Performance
BUD delivers 85% cost reduction versus naive API usage, and runs cheaper than any SaaS alternative.
Monthly Cost Comparison
How We Achieved 85% Reduction
Intent Classification
~40% of messages (greetings, thanks) handled with zero API calls.
Two-Model Routing
Simple questions → Haiku ($0.25/MTok). Complex → Sonnet ($3/MTok). 30% savings.
Token Budgeting
Dynamic max_tokens based on question complexity. Short question = short budget.
Conditional RAG
Only query ChromaDB when the message likely needs historical context. Saves embedding compute.
Usage Tiers
| Usage | Messages/Day | Monthly API | Annual (incl. hardware) |
|---|---|---|---|
| Light (personal) | 10 | $1.80 | $28/yr (after Year 1) |
| Medium (small team) | 50 | $9.60 | $121/yr |
| Heavy (active team) | 200 | $42.00 | $510/yr |
Future Impacts & What We’re Building Next
BUD isn’t finished — it’s a living platform. Here’s the roadmap of what’s actively being built and deployed soon.
Multi-Platform Agent Core
5-platform support, 24+ MCP tools, 5-layer memory, RAG pipeline, real-time streaming, voice I/O. Production-ready.
Local LLM Fallback (Ollama)
When Claude API is unreachable or for sensitive conversations, BUD will fall back to a local model via Ollama — zero data leaves your machine.
Computer Vision Pipeline
Upload images to BUD and get analysis — receipts, whiteboards, documents. Powered by Claude’s multimodal capabilities.
Docker Compose One-Click
Single docker compose up to run BUD + all MCP servers + ChromaDB + web UI. Zero manual setup.
HuggingFace Spaces Demo
Live web demo on HuggingFace where anyone can try BUD’s agent capabilities without installing anything.
Multi-Agent Orchestration
BUD spawning specialized sub-agents for complex tasks — a coding agent, a research agent, a writing agent — all coordinated by the core.
The Bigger Picture — Why This Matters
BUD represents a fundamental shift in how we interact with AI. Instead of visiting a website to chat with a model, the model comes to you — inside the tools you already use. The AI becomes ambient infrastructure, always available across every communication channel.
We’re building toward a world where every team and every individual has a personal AI that knows their context (memory), can take action (tools), lives where they work (multi-platform), and keeps getting smarter (RAG indexing). BUD is the open-source blueprint for that future.
Development Timeline
Foundation — First Principles Learning
Studied AI agent architecture from scratch. Completed Dr. Raj Dandekar’s Vizuara Labs coursework on MCP, embeddings, and tool-use patterns.
Core Build — Agent + MCP + RAG + Memory
Built the complete agent core from first principles. Implemented all 4 MCP servers, the 5-layer memory system, and RAG pipeline. Integrated 5 chat platforms.
Docker + HuggingFace + CI/CD
Containerize everything. Deploy live demo. GitHub Actions for automated testing and deployment.
Local LLM + Vision + Multi-Agent
Ollama fallback for offline mode. Image understanding. Sub-agent coordination for complex workflows.
Plugin Marketplace + Community
Open MCP server marketplace. Community-contributed tools. Self-updating capability.
How to Contribute
BUD is fully open source. The entire codebase, documentation, and this blog are available for anyone to learn from, fork, and extend:
HuggingFace
Live demo space coming soon — try BUD without installing anything.
Full Documentation
84-page technical blog, annotated README, inline code comments on every file.
Technology Stack at a Glance
| Layer | Technology | Why This Choice |
|---|---|---|
| AI Engine | Anthropic Claude (Sonnet + Haiku) | Best tool-use, long context, streaming |
| Tool Protocol | Model Context Protocol (MCP) | Standard, extensible, discoverable |
| Embeddings | all-MiniLM-L6-v2 (local) | Zero cost, fast, good accuracy |
| Vector DB | ChromaDB (persistent) | Simple, local, file-backed |
| Web Framework | FastAPI + WebSocket | Async-native, fast, great DX |
| Slack | Socket Mode (slack-bolt) | No public URL needed |
| Discord | discord.py | Mature, full-featured |
| Teams | Bot Framework SDK | Official Microsoft integration |
| Scheduling | APScheduler + SQLite | Persistent jobs, survives restarts |
| Memory | Markdown files + deque | Transparent, editable, version-controllable |
| Language | Python 3.12 (100% async) | Ecosystem, readability, async support |
| Deployment | Mac Mini M4 + launchd | Silent, 12W, always-on |
“Build everything from scratch. Understand every layer. Then — and only then — you can deploy with confidence and explain every decision in an interview.”
— Project BUD Design Manifesto
