BUD
The complete engineering guide to building a self-hosted, multi-platform AI assistant from scratch — no LangChain, no abstractions, every single line explained.
Built by Gopi Trinadh · Learned from Dr. Raj Dandekar of Vizuara
First Principles — What Even Is an AI Agent?
Before writing a single line of code, let’s ground ourselves in the foundational concepts. If you understand these five ideas, you can build anything.
Chatbot vs. Agent — The Core Distinction
A chatbot is a text-in, text-out function. It has no memory, no tools, no autonomy. Ask it today and it won’t remember tomorrow. It can’t check your calendar, file a bug, or search your documents.
An agent operates on a fundamentally different loop. It perceives (receives events from Slack, Discord, Teams, Web), reasons (decides what action to take), acts (calls tools via MCP), and remembers (persists context across sessions). BUD is an agent.
How LLMs Actually Work (The 60-Second Version)
Large Language Models like Claude are neural networks trained on massive text data. Given a sequence of tokens, they predict the next most probable token. At scale — billions of parameters — this simple mechanism produces reasoning, code, tool invocations, and nuanced conversation.
The critical insight for agent builders: LLMs don’t execute code or call APIs. They generate structured text (JSON) that describes which tool to call with what arguments. Your code then executes the tool and feeds the result back. This is the “agent loop.”
The LLM is the brain, not the hands. It decides what to do but cannot act alone. Your code is the body — the hands that call APIs, the eyes that read databases, the memory that persists between conversations. Building an agent means building this body.
What Are Embeddings?
Embeddings convert text into dense numerical vectors — arrays of floating-point numbers that encode semantic meaning. Similar meanings produce vectors that are geometrically close in high-dimensional space. This is how BUD can search conversations by meaning, not just keywords.
BUD uses all-MiniLM-L6-v2 from sentence-transformers — runs locally with zero API cost, produces 384-dimensional vectors, sub-50ms latency. These power the RAG pipeline.
Model Context Protocol (MCP)
MCP is what makes BUD’s tool system scalable. Instead of hardcoding every integration, MCP defines a standard protocol (JSON-RPC over stdio) where tool servers announce their capabilities and the AI client discovers them automatically. Think USB for AI tools — plug any MCP server into any MCP client.
Why Build From Scratch? (No LangChain, No Frameworks)
BUD deliberately avoids LangChain, LlamaIndex, CrewAI, and every other framework. Not because they’re bad — because the goal is understanding. When you build the agent loop yourself, you understand exactly why retry logic matters, why memory needs layers, why RAG needs chunking strategies. In an interview, you can explain every component because you built every component.
Depth over convenience. Every file in this project is a learning resource. Every design decision is documented with the “why”, not just the “what”. A framework hides complexity. BUD exposes it.
System Architecture — The Complete Blueprint
BUD’s architecture follows five principles: educational-first, minimal folders, graceful degradation, file-based transparency, and single-process async.
Project File Structure
Graceful Degradation: If GitHub MCP is down, BUD still works — it just can’t create issues. If RAG is empty, Claude answers from its own knowledge. If Slack is disconnected, Discord keeps running. Nothing crashes the whole system.
The Agent Loop — Heart of BUD
Everything runs through one function: process_message(). This is the brain of the entire system.
The Core Code — agent/core.py
This is the most important file in the entire project. Let’s walk through the key function:
<span class="hl-kw">async def</span> <span class="hl-fn">process_message</span>(self, user_msg: <span class="hl-type">str</span>, channel: <span class="hl-type">str</span>, user_id: <span class="hl-type">str</span>) -> <span class="hl-type">str</span>:
<span class="hl-cm"># Step 1: Classify intent — avoid expensive API calls for simple messages</span>
intent = <span class="hl-fn">classify_intent</span>(user_msg)
<span class="hl-kw">if</span> intent == <span class="hl-str">"greeting"</span>:
<span class="hl-kw">return</span> <span class="hl-fn">random_greeting</span>(user_id) <span class="hl-cm"># $0 — no API call needed</span>
<span class="hl-cm"># Step 2: Assemble context from all memory layers</span>
memory_context = <span class="hl-kw">await</span> self.memory.<span class="hl-fn">get_context</span>(user_id, channel)
rag_results = <span class="hl-kw">await</span> self.rag.<span class="hl-fn">search</span>(user_msg, top_k=<span class="hl-num">5</span>)
tools = <span class="hl-kw">await</span> self.mcp_manager.<span class="hl-fn">get_all_tools</span>()
<span class="hl-cm"># Step 3: Build the system prompt</span>
system = <span class="hl-fn">build_system_prompt</span>(
soul=self.memory.soul,
memory=memory_context,
rag=rag_results,
tools_hint=<span class="hl-fn">summarize_tools</span>(tools)
)
<span class="hl-cm"># Step 4: Call Claude with full context</span>
messages = [{<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: user_msg}]
<span class="hl-cm"># Step 5: The Tool Loop — up to MAX_TOOL_ROUNDS iterations</span>
<span class="hl-kw">for</span> round <span class="hl-kw">in</span> <span class="hl-fn">range</span>(<span class="hl-num">5</span>): <span class="hl-cm"># Safety cap: max 5 rounds</span>
response = <span class="hl-kw">await</span> self.<span class="hl-fn">call_claude</span>(
system=system,
messages=messages,
tools=tools,
timeout=<span class="hl-num">30.0</span> <span class="hl-cm"># Triple-timeout protection</span>
)
<span class="hl-cm"># If Claude returns text → we're done!</span>
<span class="hl-kw">if</span> response.stop_reason == <span class="hl-str">"end_turn"</span>:
final_text = response.content[<span class="hl-num">0</span>].text
<span class="hl-kw">break</span>
<span class="hl-cm"># If Claude returns tool_use → execute and loop</span>
<span class="hl-kw">if</span> response.stop_reason == <span class="hl-str">"tool_use"</span>:
tool_call = <span class="hl-fn">extract_tool_call</span>(response)
result = <span class="hl-kw">await</span> self.mcp_manager.<span class="hl-fn">execute</span>(
tool_call.name, tool_call.input
)
<span class="hl-cm"># Feed result back to Claude for next iteration</span>
messages.<span class="hl-fn">append</span>({<span class="hl-str">"role"</span>: <span class="hl-str">"assistant"</span>, <span class="hl-str">"content"</span>: response.content})
messages.<span class="hl-fn">append</span>({<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: [tool_result_block]})
<span class="hl-cm"># Step 6: Remember this interaction</span>
<span class="hl-kw">await</span> self.memory.<span class="hl-fn">update</span>(user_id, user_msg, final_text)
<span class="hl-kw">await</span> self.rag.<span class="hl-fn">index_message</span>(user_msg, channel)
<span class="hl-kw">return</span> final_text
Triple-timeout protection: Per-API call timeout (30s), per-tool execution timeout (15s), and overall message timeout (60s). If any tier fires, BUD returns a graceful fallback message instead of hanging.
Intent Classification — Saving 85% on API Costs
Before calling Claude’s API ($3/MTok for Opus), BUD checks if the message even needs an LLM. Simple greetings, thank-yous, and status checks are handled by a zero-cost heuristic classifier:
<span class="hl-kw">def</span> <span class="hl-fn">classify_intent</span>(text: <span class="hl-type">str</span>) -> <span class="hl-type">str</span>:
lower = text.<span class="hl-fn">lower</span>().<span class="hl-fn">strip</span>()
<span class="hl-cm"># Greetings — no API needed</span>
greetings = {<span class="hl-str">"hi"</span>, <span class="hl-str">"hello"</span>, <span class="hl-str">"hey"</span>, <span class="hl-str">"morning"</span>, <span class="hl-str">"sup"</span>, <span class="hl-str">"yo"</span>}
<span class="hl-kw">if</span> lower <span class="hl-kw">in</span> greetings:
<span class="hl-kw">return</span> <span class="hl-str">"greeting"</span>
<span class="hl-cm"># Tool-required tasks — needs Claude + MCP</span>
tool_markers = [<span class="hl-str">"create issue"</span>, <span class="hl-str">"github"</span>, <span class="hl-str">"notion"</span>, <span class="hl-str">"schedule"</span>, <span class="hl-str">"remind"</span>]
<span class="hl-kw">if</span> <span class="hl-fn">any</span>(m <span class="hl-kw">in</span> lower <span class="hl-kw">for</span> m <span class="hl-kw">in</span> tool_markers):
<span class="hl-kw">return</span> <span class="hl-str">"tool_task"</span>
<span class="hl-cm"># Complex questions — needs Claude</span>
<span class="hl-kw">if</span> <span class="hl-fn">len</span>(text.<span class="hl-fn">split</span>()) > <span class="hl-num">5</span> <span class="hl-kw">or</span> <span class="hl-str">"?"</span> <span class="hl-kw">in</span> text:
<span class="hl-kw">return</span> <span class="hl-str">"complex"</span>
<span class="hl-kw">return</span> <span class="hl-str">"simple"</span> <span class="hl-cm"># Use Haiku (cheaper model)</span>
This simple heuristic routes ~40% of messages away from the expensive API entirely, and another ~30% to the cheaper Haiku model, achieving that 85% cost reduction — from $15/month to $2/month.
MCP Servers & Tool System
BUD runs 4 MCP servers offering 24+ tools. Each server is a standalone process communicating via JSON-RPC over stdio.
GitHub MCP
Create issues, search repos, list PRs, get file contents, manage labels. 8 tools.
Notion MCP
Create pages, query databases, search workspace, update blocks. 6 tools.
Slack Tools MCP
Search messages, get channel history, list channels, send scheduled messages. 5 tools.
File Operations MCP
Read, write, list, search files on local filesystem. 5 tools.
How MCP Discovery Works
<span class="hl-cm"># BUD starts each MCP server as a subprocess</span>
<span class="hl-kw">async def</span> <span class="hl-fn">connect_mcp_server</span>(self, server_path: <span class="hl-type">str</span>):
process = <span class="hl-kw">await</span> asyncio.<span class="hl-fn">create_subprocess_exec</span>(
<span class="hl-str">"python"</span>, server_path,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE
)
<span class="hl-cm"># Step 1: Ask the server what tools it has</span>
request = {<span class="hl-str">"jsonrpc"</span>: <span class="hl-str">"2.0"</span>, <span class="hl-str">"method"</span>: <span class="hl-str">"tools/list"</span>, <span class="hl-str">"id"</span>: <span class="hl-num">1</span>}
tools = <span class="hl-kw">await</span> self.<span class="hl-fn">send_rpc</span>(process, request)
<span class="hl-cm"># Step 2: Register all discovered tools</span>
<span class="hl-kw">for</span> tool <span class="hl-kw">in</span> tools:
self.registry[tool[<span class="hl-str">"name"</span>]] = {
<span class="hl-str">"process"</span>: process,
<span class="hl-str">"schema"</span>: tool[<span class="hl-str">"inputSchema"</span>]
}
<span class="hl-cm"># Now Claude can see and use these tools automatically!</span>
Adding a new integration takes minutes: write a new MCP server, drop it in mcp_servers/, and BUD auto-discovers it on next restart. No changes to core code. No new API wrappers. Just plug and play.
The 5-Layer Memory System
Human memory isn’t one thing — it’s multiple systems. BUD mirrors this with five complementary layers.
Why File-Based Memory?
Every memory file is plain Markdown you can read, edit, and version control. No hidden databases, no opaque vector stores for core identity. If BUD starts giving weird responses, you can open MEMORY.md in a text editor and see exactly what it “knows.” This is radical transparency by design.
<span class="hl-cm"># data/memory/SOUL.md — BUD's personality</span>
<span class="hl-str">## Core Identity</span>
You are BUD, a helpful AI assistant built by Gopi Trinadh.
You are concise, proactive, and technically precise.
<span class="hl-str">## Communication Style</span>
- Use Slack-native formatting (bold, code blocks, lists)
- Be brief unless the user asks for detail
- Always confirm before destructive actions (deleting, overwriting)
<span class="hl-str">## Boundaries</span>
- Never share API keys or tokens in chat
- Redirect medical/legal questions to professionals
- If unsure, say so — don't hallucinate
RAG Pipeline — Search by Meaning
Retrieval-Augmented Generation lets BUD search past conversations and documents semantically — not by keywords, but by meaning.
The RAG Engine Code
<span class="hl-kw">from</span> sentence_transformers <span class="hl-kw">import</span> SentenceTransformer
<span class="hl-kw">import</span> chromadb
<span class="hl-kw">class</span> <span class="hl-type">RAGEngine</span>:
<span class="hl-kw">def</span> <span class="hl-fn">__init__</span>(self):
<span class="hl-cm"># Local model — runs on CPU, zero API cost</span>
self.embedder = <span class="hl-fn">SentenceTransformer</span>(<span class="hl-str">"all-MiniLM-L6-v2"</span>)
self.db = chromadb.<span class="hl-fn">PersistentClient</span>(path=<span class="hl-str">"data/chromadb"</span>)
self.collection = self.db.<span class="hl-fn">get_or_create_collection</span>(<span class="hl-str">"messages"</span>)
<span class="hl-kw">async def</span> <span class="hl-fn">index_message</span>(self, text: <span class="hl-type">str</span>, channel: <span class="hl-type">str</span>):
embedding = self.embedder.<span class="hl-fn">encode</span>(text).<span class="hl-fn">tolist</span>()
self.collection.<span class="hl-fn">add</span>(
documents=[text],
embeddings=[embedding],
metadatas=[{<span class="hl-str">"channel"</span>: channel, <span class="hl-str">"ts"</span>: <span class="hl-fn">time</span>()}],
ids=[<span class="hl-fn">str</span>(<span class="hl-fn">uuid4</span>())]
)
<span class="hl-kw">async def</span> <span class="hl-fn">search</span>(self, query: <span class="hl-type">str</span>, top_k: <span class="hl-type">int</span> = <span class="hl-num">5</span>) -> <span class="hl-type">list</span>:
embedding = self.embedder.<span class="hl-fn">encode</span>(query).<span class="hl-fn">tolist</span>()
results = self.collection.<span class="hl-fn">query</span>(
query_embeddings=[embedding],
n_results=top_k
)
<span class="hl-kw">return</span> results[<span class="hl-str">"documents"</span>][<span class="hl-num">0</span>] <span class="hl-cm"># Top matching messages</span>
| Metric | Value | Detail |
|---|---|---|
| Embedding Model | all-MiniLM-L6-v2 | 384 dimensions, 22M params |
| Vector DB | ChromaDB (local) | Persistent, file-backed |
| Index Capacity | 1,000+ messages | Scales linearly |
| Retrieval Accuracy | 92% | Top-5 relevance |
| Search Latency | <50ms (P95: 45ms) | On Mac Mini M4 |
| API Cost | $0 | Everything runs locally |
Multi-Platform Integration
BUD serves 5 platforms simultaneously from a single async process using asyncio.gather() — no microservices needed.
Slack
Socket Mode — persistent WebSocket. No public URL needed. Reacts to mentions, DMs, and threads.
Discord
Discord.py gateway — real-time events. Supports slash commands, mentions, and DMs.
Microsoft Teams
Bot Framework SDK — works inside Teams channels and 1:1 chats. Enterprise-ready.
Google Chat
Pub/Sub or HTTP handler. Works in Google Workspace environments.
Web UI
FastAPI + WebSocket — browser-based chat with real-time streaming and voice I/O.
The Single-Process Architecture
<span class="hl-cm"># main.py — BUD's entry point</span>
<span class="hl-kw">async def</span> <span class="hl-fn">main</span>():
agent = <span class="hl-kw">await</span> <span class="hl-fn">create_agent</span>() <span class="hl-cm"># Initialize core + MCP + RAG</span>
<span class="hl-cm"># Launch ALL platforms simultaneously</span>
<span class="hl-kw">await</span> asyncio.<span class="hl-fn">gather</span>(
<span class="hl-fn">start_slack</span>(agent), <span class="hl-cm"># Slack Socket Mode</span>
<span class="hl-fn">start_discord</span>(agent), <span class="hl-cm"># Discord gateway</span>
<span class="hl-fn">start_teams</span>(agent), <span class="hl-cm"># Teams bot framework</span>
<span class="hl-fn">start_google_chat</span>(agent), <span class="hl-cm"># Google Chat handler</span>
<span class="hl-fn">start_web</span>(agent), <span class="hl-cm"># FastAPI + WebSocket</span>
<span class="hl-fn">start_heartbeat</span>(agent), <span class="hl-cm"># Scheduler for cron tasks</span>
return_exceptions=<span class="hl-kw">True</span> <span class="hl-cm"># Crash isolation!</span>
)
asyncio.<span class="hl-fn">run</span>(<span class="hl-fn">main</span>())
return_exceptions=True is the secret weapon. If Discord crashes, Slack keeps running. If Teams throws an error, the web UI is unaffected. Each platform is an independent coroutine. This is production resilience without Kubernetes complexity.
Code Deep-Dive — Selected Components
Let’s examine the trickiest parts: the streaming WebSocket, the memory manager, and the system prompt builder.
Real-Time Token Streaming (WebSocket)
The web UI doesn’t wait for the full response — it streams tokens as Claude generates them, creating a “typing” effect:
<span class="hl-dec">@app.websocket</span>(<span class="hl-str">"/ws"</span>)
<span class="hl-kw">async def</span> <span class="hl-fn">websocket_endpoint</span>(ws: WebSocket):
<span class="hl-kw">await</span> ws.<span class="hl-fn">accept</span>()
<span class="hl-kw">while</span> <span class="hl-kw">True</span>:
user_msg = <span class="hl-kw">await</span> ws.<span class="hl-fn">receive_text</span>()
<span class="hl-cm"># Stream response token-by-token</span>
<span class="hl-kw">async with</span> client.messages.<span class="hl-fn">stream</span>(
model=<span class="hl-str">"claude-sonnet-4-20250514"</span>,
messages=[{<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: user_msg}],
system=system_prompt,
max_tokens=<span class="hl-num">4096</span>
) <span class="hl-kw">as</span> stream:
<span class="hl-kw">async for</span> text <span class="hl-kw">in</span> stream.<span class="hl-fn">text_stream</span>:
<span class="hl-kw">await</span> ws.<span class="hl-fn">send_json</span>({
<span class="hl-str">"type"</span>: <span class="hl-str">"token"</span>,
<span class="hl-str">"content"</span>: text
})
<span class="hl-kw">await</span> ws.<span class="hl-fn">send_json</span>({<span class="hl-str">"type"</span>: <span class="hl-str">"done"</span>})
System Prompt Builder — Dynamic Context Assembly
<span class="hl-kw">def</span> <span class="hl-fn">build_system_prompt</span>(soul, memory, rag, tools_hint) -> <span class="hl-type">str</span>:
sections = [
soul, <span class="hl-cm"># SOUL.md — who you are</span>
<span class="hl-str">f"\n## Current Knowledge\n{memory}"</span>, <span class="hl-cm"># MEMORY.md</span>
]
<span class="hl-kw">if</span> rag:
sections.<span class="hl-fn">append</span>(
<span class="hl-str">f"\n## Relevant Context (from past conversations)\n"</span>
+ <span class="hl-str">"\n"</span>.<span class="hl-fn">join</span>(<span class="hl-str">f"- {r}"</span> <span class="hl-kw">for</span> r <span class="hl-kw">in</span> rag)
)
<span class="hl-kw">if</span> tools_hint:
sections.<span class="hl-fn">append</span>(
<span class="hl-str">f"\n## Available Tools\n{tools_hint}"</span>
)
<span class="hl-kw">return</span> <span class="hl-str">"\n"</span>.<span class="hl-fn">join</span>(sections)
Deploying BUD on a Mac Mini
The Mac Mini M4 is the recommended hardware: silent, 12W power draw, developer-friendly, and powerful enough for all BUD workloads.
Mac Mini M4 — BUD’s Recommended Home
Silent operation. 12W idle. Always-on AI assistant server.
Why Dedicated Hardware Over Cloud?
| Factor | Cloud (AWS/GCP) | Mac Mini |
|---|---|---|
| Monthly Cost | $15–50/month recurring | $0 after purchase |
| Latency | Variable (region-dependent) | Consistent, local |
| Data Privacy | Data leaves your network | Everything stays home |
| Control | Provider-limited | Full root access |
| Setup Time | Minutes | ~2 hours (one-time) |
Step-by-Step Mac Mini Setup
Install Developer Tools
<span class="hl-cm"># Install Homebrew</span>
/bin/bash -c <span class="hl-str">"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</span>
<span class="hl-cm"># Install Python 3.12, Node.js, Git</span>
brew install python@<span class="hl-num">3.12</span> node git
Clone & Configure BUD
git clone https://github.com/GOPITRINADH3561/Project_OpenClaw.git
<span class="hl-kw">cd</span> Project_OpenClaw
<span class="hl-cm"># Create virtual environment</span>
python3.12 -m venv .venv
<span class="hl-kw">source</span> .venv/bin/activate
<span class="hl-cm"># Install dependencies</span>
pip install -r requirements.txt
<span class="hl-cm"># Configure API keys</span>
cp .env.example .env
nano .env <span class="hl-cm"># Add ANTHROPIC_API_KEY, SLACK_BOT_TOKEN, etc.</span>
Test Run
python main.py
<span class="hl-cm"># You should see:</span>
<span class="hl-cm"># ✅ Agent core initialized</span>
<span class="hl-cm"># ✅ MCP servers connected (24 tools)</span>
<span class="hl-cm"># ✅ RAG engine loaded (all-MiniLM-L6-v2)</span>
<span class="hl-cm"># ✅ Slack connected</span>
<span class="hl-cm"># ✅ Discord connected</span>
<span class="hl-cm"># ✅ Web UI at http://localhost:8080</span>
Auto-Start on Boot with launchd
<span class="hl-cm"><!-- ~/Library/LaunchAgents/com.bud.assistant.plist --></span>
<span class="hl-dec"><?xml</span> version="1.0"<span class="hl-dec">?></span>
<span class="hl-kw"><plist</span> version="1.0"<span class="hl-kw">></span>
<span class="hl-kw"><dict></span>
<span class="hl-kw"><key></span>Label<span class="hl-kw"></key></span>
<span class="hl-str"><string>com.bud.assistant</string></span>
<span class="hl-kw"><key></span>ProgramArguments<span class="hl-kw"></key></span>
<span class="hl-kw"><array></span>
<span class="hl-str"><string>/Users/you/Project_OpenClaw/.venv/bin/python</string></span>
<span class="hl-str"><string>/Users/you/Project_OpenClaw/main.py</string></span>
<span class="hl-kw"></array></span>
<span class="hl-kw"><key></span>RunAtLoad<span class="hl-kw"></key></span> <span class="hl-kw"><true/></span>
<span class="hl-kw"><key></span>KeepAlive<span class="hl-kw"></key></span> <span class="hl-kw"><true/></span>
<span class="hl-kw"></dict></span>
<span class="hl-kw"></plist></span>
<span class="hl-cm"># Load the service — BUD starts on every boot</span>
launchctl load ~/Library/LaunchAgents/com.bud.assistant.plist
Remote Access with Tailscale
<span class="hl-cm"># Install Tailscale — access BUD from anywhere</span>
brew install tailscale
sudo tailscale up
<span class="hl-cm"># Now access BUD's Web UI from any device:</span>
<span class="hl-cm"># http://100.x.y.z:8080</span>
Benchmarked with 5 platforms active and 1,000+ indexed messages:
| Operation | Average | P95 |
|---|---|---|
| Simple message (no tools) | 1.5s | 3.0s |
| Single tool call | 3.5s | 6.0s |
| Multi-tool complex task | 6.0s | 10.0s |
| RAG retrieval (5 results) | 25ms | 45ms |
| Intent classification | <1ms | 1ms |
| Bot startup (warm) | 3s | 6s |
| Idle RAM usage | 180 MB | — |
Cost Analysis & Performance
BUD delivers 85% cost reduction versus naive API usage, and runs cheaper than any SaaS alternative.
Monthly Cost Comparison
How We Achieved 85% Reduction
Intent Classification
~40% of messages (greetings, thanks) handled with zero API calls.
Two-Model Routing
Simple questions → Haiku ($0.25/MTok). Complex → Sonnet ($3/MTok). 30% savings.
Token Budgeting
Dynamic max_tokens based on question complexity. Short question = short budget.
Conditional RAG
Only query ChromaDB when the message likely needs historical context. Saves embedding compute.
Usage Tiers
| Usage | Messages/Day | Monthly API | Annual (incl. hardware) |
|---|---|---|---|
| Light (personal) | 10 | $1.80 | $28/yr (after Year 1) |
| Medium (small team) | 50 | $9.60 | $121/yr |
| Heavy (active team) | 200 | $42.00 | $510/yr |
Future Impacts & What We’re Building Next
BUD isn’t finished — it’s a living platform. Here’s the roadmap of what’s actively being built and deployed soon.
Multi-Platform Agent Core
5-platform support, 24+ MCP tools, 5-layer memory, RAG pipeline, real-time streaming, voice I/O. Production-ready.
Local LLM Fallback (Ollama)
When Claude API is unreachable or for sensitive conversations, BUD will fall back to a local model via Ollama — zero data leaves your machine.
Computer Vision Pipeline
Upload images to BUD and get analysis — receipts, whiteboards, documents. Powered by Claude’s multimodal capabilities.
Docker Compose One-Click
Single docker compose up to run BUD + all MCP servers + ChromaDB + web UI. Zero manual setup.
HuggingFace Spaces Demo
Live web demo on HuggingFace where anyone can try BUD’s agent capabilities without installing anything.
Multi-Agent Orchestration
BUD spawning specialized sub-agents for complex tasks — a coding agent, a research agent, a writing agent — all coordinated by the core.
The Bigger Picture — Why This Matters
BUD represents a fundamental shift in how we interact with AI. Instead of visiting a website to chat with a model, the model comes to you — inside the tools you already use. The AI becomes ambient infrastructure, always available across every communication channel.
We’re building toward a world where every team and every individual has a personal AI that knows their context (memory), can take action (tools), lives where they work (multi-platform), and keeps getting smarter (RAG indexing). BUD is the open-source blueprint for that future.
Development Timeline
Foundation — First Principles Learning
Studied AI agent architecture from scratch. Completed Dr. Raj Dandekar’s Vizuara Labs coursework on MCP, embeddings, and tool-use patterns.
Core Build — Agent + MCP + RAG + Memory
Built the complete agent core from first principles. Implemented all 4 MCP servers, the 5-layer memory system, and RAG pipeline. Integrated 5 chat platforms.
Docker + HuggingFace + CI/CD
Containerize everything. Deploy live demo. GitHub Actions for automated testing and deployment.
Local LLM + Vision + Multi-Agent
Ollama fallback for offline mode. Image understanding. Sub-agent coordination for complex workflows.
Plugin Marketplace + Community
Open MCP server marketplace. Community-contributed tools. Self-updating capability.
How to Contribute
BUD is fully open source. The entire codebase, documentation, and this blog are available for anyone to learn from, fork, and extend:
HuggingFace
Live demo space coming soon — try BUD without installing anything.
Full Documentation
84-page technical blog, annotated README, inline code comments on every file.
Technology Stack at a Glance
| Layer | Technology | Why This Choice |
|---|---|---|
| AI Engine | Anthropic Claude (Sonnet + Haiku) | Best tool-use, long context, streaming |
| Tool Protocol | Model Context Protocol (MCP) | Standard, extensible, discoverable |
| Embeddings | all-MiniLM-L6-v2 (local) | Zero cost, fast, good accuracy |
| Vector DB | ChromaDB (persistent) | Simple, local, file-backed |
| Web Framework | FastAPI + WebSocket | Async-native, fast, great DX |
| Slack | Socket Mode (slack-bolt) | No public URL needed |
| Discord | discord.py | Mature, full-featured |
| Teams | Bot Framework SDK | Official Microsoft integration |
| Scheduling | APScheduler + SQLite | Persistent jobs, survives restarts |
| Memory | Markdown files + deque | Transparent, editable, version-controllable |
| Language | Python 3.12 (100% async) | Ecosystem, readability, async support |
| Deployment | Mac Mini M4 + launchd | Silent, 12W, always-on |
“Build everything from scratch. Understand every layer. Then — and only then — you can deploy with confidence and explain every decision in an interview.”
— Project BUD Design Manifesto