Gopi Trinadh Maddikunta

BUD — Build, Understand, Deploy | Complete AI Assistant Engineering Guide

Open Source • From First Principles • Production-Ready

BUD

Build · Understand · Deploy

The complete engineering guide to building a self-hosted, multi-platform AI assistant from scratch — no LangChain, no abstractions, every single line explained.

Built by Gopi Trinadh · Learned from Dr. Raj Dandekar of Vizuara

24+MCP Tools

5Platforms

5Memory Layers

<50msRAG Latency

85%Cost Reduced

48Source Files

↓ scroll to begin ↓

Chapter 01

First Principles — What Even Is an AI Agent?

Before writing a single line of code, let’s ground ourselves in the foundational concepts. If you understand these five ideas, you can build anything.

Chatbot vs. Agent — The Core Distinction

A chatbot is a text-in, text-out function. It has no memory, no tools, no autonomy. Ask it today and it won’t remember tomorrow. It can’t check your calendar, file a bug, or search your documents.

An agent operates on a fundamentally different loop. It perceives (receives events from Slack, Discord, Teams, Web), reasons (decides what action to take), acts (calls tools via MCP), and remembers (persists context across sessions). BUD is an agent.

Chatbot — Single Turn

💬

User Input

→

🤖

LLM

→

📝

Text Output

Agent — Perceive → Reason → Act → Remember

👁️

Perceive

5 platforms

→

🧠

Reason

Claude + intent

→

⚡

Act

24+ MCP tools

→

💾

Remember

5-layer memory

How LLMs Actually Work (The 60-Second Version)

Large Language Models like Claude are neural networks trained on massive text data. Given a sequence of tokens, they predict the next most probable token. At scale — billions of parameters — this simple mechanism produces reasoning, code, tool invocations, and nuanced conversation.

The critical insight for agent builders: LLMs don’t execute code or call APIs. They generate structured text (JSON) that describes which tool to call with what arguments. Your code then executes the tool and feeds the result back. This is the “agent loop.”

🔑 Key Insight

The LLM is the brain, not the hands. It decides what to do but cannot act alone. Your code is the body — the hands that call APIs, the eyes that read databases, the memory that persists between conversations. Building an agent means building this body.

What Are Embeddings?

Embeddings convert text into dense numerical vectors — arrays of floating-point numbers that encode semantic meaning. Similar meanings produce vectors that are geometrically close in high-dimensional space. This is how BUD can search conversations by meaning, not just keywords.

Embedding Space — Similar Meanings → Close Vectors

“Login page is broken”

[0.23, -0.15, 0.87, …]

“Auth is failing”

[0.21, -0.12, 0.85, …]

↑ Cosine similarity = 0.96 — nearly identical! ↑

“Let’s order pizza”

[-0.54, 0.72, 0.11, …] — similarity = 0.12

BUD uses all-MiniLM-L6-v2 from sentence-transformers — runs locally with zero API cost, produces 384-dimensional vectors, sub-50ms latency. These power the RAG pipeline.

Model Context Protocol (MCP)

MCP is what makes BUD’s tool system scalable. Instead of hardcoding every integration, MCP defines a standard protocol (JSON-RPC over stdio) where tool servers announce their capabilities and the AI client discovers them automatically. Think USB for AI tools — plug any MCP server into any MCP client.

Why Build From Scratch? (No LangChain, No Frameworks)

BUD deliberately avoids LangChain, LlamaIndex, CrewAI, and every other framework. Not because they’re bad — because the goal is understanding. When you build the agent loop yourself, you understand exactly why retry logic matters, why memory needs layers, why RAG needs chunking strategies. In an interview, you can explain every component because you built every component.

💡 Philosophy

Depth over convenience. Every file in this project is a learning resource. Every design decision is documented with the “why”, not just the “what”. A framework hides complexity. BUD exposes it.

Chapter 02

System Architecture — The Complete Blueprint

BUD’s architecture follows five principles: educational-first, minimal folders, graceful degradation, file-based transparency, and single-process async.

BUD System Architecture — Full View

💬

Slack

Socket Mode

🎮

Discord

Gateway

👔

Teams

Bot Framework

💌

Google Chat

Pub/Sub

🌐

Web UI

FastAPI + WS

▼

🧠

Agent Core (agent/core.py)

Intent → Claude API → Tool Loop → Response

▼

🔧

MCP Tools

24+ tools, 4 servers

🔍

RAG Engine

ChromaDB + embeddings

💾

Memory

5 layers, file-backed

⏰

Heartbeat

Scheduler + health

Project File Structure

bud/ ├── main.py # Entry point — starts all platforms + agent ├── requirements.txt # All Python dependencies ├── .env.example # Template for API keys ├── agent/ │ ├── __init__.py │ ├── core.py # 🧠 The agent loop — heart of BUD │ ├── memory.py # 5-layer memory manager │ ├── intent.py # Heuristic intent classifier │ └── prompts.py # System prompt builder ├── config/ │ └── settings.py # Central config from .env ├── mcp_servers/ │ ├── github_server.py # GitHub MCP (issues, PRs, repos) │ ├── notion_server.py # Notion MCP (pages, DBs) │ ├── slack_tools_server.py # Slack MCP (search, history) │ └── file_server.py # File operations MCP ├── rag/ │ ├── engine.py # RAG pipeline (embed, index, search) │ └── indexer.py # Background message indexer ├── slack_app/ │ └── bot.py # Slack Socket Mode handler ├── discord_app/ │ └── bot.py # Discord.py gateway handler ├── teams_app/ │ └── bot.py # Microsoft Teams bot ├── google_chat_app/ │ └── bot.py # Google Chat Pub/Sub handler ├── web_app/ │ ├── server.py # FastAPI + WebSocket server │ └── static/index.html # Browser UI with voice I/O ├── heartbeat/ │ └── scheduler.py # APScheduler for cron jobs └── data/ └── memory/ ├── SOUL.md # Bot personality definition ├── MEMORY.md # Long-term knowledge base ├── USER.md # User profiles └── 2026-02-26.md # Daily event logs

📐 Design Principle

Graceful Degradation: If GitHub MCP is down, BUD still works — it just can’t create issues. If RAG is empty, Claude answers from its own knowledge. If Slack is disconnected, Discord keeps running. Nothing crashes the whole system.

Chapter 03

The Agent Loop — Heart of BUD

Everything runs through one function: process_message(). This is the brain of the entire system.

Agent Loop — Step by Step

1. Message Arrives

From any platform (Slack, Discord, Teams…)

▼

2. Intent Classification

Heuristic check — is this a simple greeting or a complex task?

▼

3. Context Assembly

Load memory, RAG results, tool definitions, system prompt

▼

4. Claude API Call

Send assembled context → Claude returns text OR tool_use

▼

5. Tool Execution Loop

If tool_use → execute → feed result → call Claude again (max 5 rounds)

▼

6. Respond + Remember

Send final response → update memory → index for RAG

The Core Code — `agent/core.py`

This is the most important file in the entire project. Let’s walk through the key function:

<span class="hl-kw">async def</span> <span class="hl-fn">process_message</span>(self, user_msg: <span class="hl-type">str</span>, channel: <span class="hl-type">str</span>, user_id: <span class="hl-type">str</span>) -> <span class="hl-type">str</span>:
    <span class="hl-cm"># Step 1: Classify intent — avoid expensive API calls for simple messages</span>
    intent = <span class="hl-fn">classify_intent</span>(user_msg)

    <span class="hl-kw">if</span> intent == <span class="hl-str">"greeting"</span>:
        <span class="hl-kw">return</span> <span class="hl-fn">random_greeting</span>(user_id)  <span class="hl-cm"># $0 — no API call needed</span>

    <span class="hl-cm"># Step 2: Assemble context from all memory layers</span>
    memory_context = <span class="hl-kw">await</span> self.memory.<span class="hl-fn">get_context</span>(user_id, channel)
    rag_results = <span class="hl-kw">await</span> self.rag.<span class="hl-fn">search</span>(user_msg, top_k=<span class="hl-num">5</span>)
    tools = <span class="hl-kw">await</span> self.mcp_manager.<span class="hl-fn">get_all_tools</span>()

    <span class="hl-cm"># Step 3: Build the system prompt</span>
    system = <span class="hl-fn">build_system_prompt</span>(
        soul=self.memory.soul,
        memory=memory_context,
        rag=rag_results,
        tools_hint=<span class="hl-fn">summarize_tools</span>(tools)
    )

    <span class="hl-cm"># Step 4: Call Claude with full context</span>
    messages = [{<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: user_msg}]

    <span class="hl-cm"># Step 5: The Tool Loop — up to MAX_TOOL_ROUNDS iterations</span>
    <span class="hl-kw">for</span> round <span class="hl-kw">in</span> <span class="hl-fn">range</span>(<span class="hl-num">5</span>):  <span class="hl-cm"># Safety cap: max 5 rounds</span>
        response = <span class="hl-kw">await</span> self.<span class="hl-fn">call_claude</span>(
            system=system,
            messages=messages,
            tools=tools,
            timeout=<span class="hl-num">30.0</span>  <span class="hl-cm"># Triple-timeout protection</span>
        )

        <span class="hl-cm"># If Claude returns text → we're done!</span>
        <span class="hl-kw">if</span> response.stop_reason == <span class="hl-str">"end_turn"</span>:
            final_text = response.content[<span class="hl-num">0</span>].text
            <span class="hl-kw">break</span>

        <span class="hl-cm"># If Claude returns tool_use → execute and loop</span>
        <span class="hl-kw">if</span> response.stop_reason == <span class="hl-str">"tool_use"</span>:
            tool_call = <span class="hl-fn">extract_tool_call</span>(response)
            result = <span class="hl-kw">await</span> self.mcp_manager.<span class="hl-fn">execute</span>(
                tool_call.name, tool_call.input
            )
            <span class="hl-cm"># Feed result back to Claude for next iteration</span>
            messages.<span class="hl-fn">append</span>({<span class="hl-str">"role"</span>: <span class="hl-str">"assistant"</span>, <span class="hl-str">"content"</span>: response.content})
            messages.<span class="hl-fn">append</span>({<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: [tool_result_block]})

    <span class="hl-cm"># Step 6: Remember this interaction</span>
    <span class="hl-kw">await</span> self.memory.<span class="hl-fn">update</span>(user_id, user_msg, final_text)
    <span class="hl-kw">await</span> self.rag.<span class="hl-fn">index_message</span>(user_msg, channel)

    <span class="hl-kw">return</span> final_text

⚡ Performance Detail

Triple-timeout protection: Per-API call timeout (30s), per-tool execution timeout (15s), and overall message timeout (60s). If any tier fires, BUD returns a graceful fallback message instead of hanging.

Intent Classification — Saving 85% on API Costs

Before calling Claude’s API ($3/MTok for Opus), BUD checks if the message even needs an LLM. Simple greetings, thank-yous, and status checks are handled by a zero-cost heuristic classifier:

<span class="hl-kw">def</span> <span class="hl-fn">classify_intent</span>(text: <span class="hl-type">str</span>) -> <span class="hl-type">str</span>:
    lower = text.<span class="hl-fn">lower</span>().<span class="hl-fn">strip</span>()

    <span class="hl-cm"># Greetings — no API needed</span>
    greetings = {<span class="hl-str">"hi"</span>, <span class="hl-str">"hello"</span>, <span class="hl-str">"hey"</span>, <span class="hl-str">"morning"</span>, <span class="hl-str">"sup"</span>, <span class="hl-str">"yo"</span>}
    <span class="hl-kw">if</span> lower <span class="hl-kw">in</span> greetings:
        <span class="hl-kw">return</span> <span class="hl-str">"greeting"</span>

    <span class="hl-cm"># Tool-required tasks — needs Claude + MCP</span>
    tool_markers = [<span class="hl-str">"create issue"</span>, <span class="hl-str">"github"</span>, <span class="hl-str">"notion"</span>, <span class="hl-str">"schedule"</span>, <span class="hl-str">"remind"</span>]
    <span class="hl-kw">if</span> <span class="hl-fn">any</span>(m <span class="hl-kw">in</span> lower <span class="hl-kw">for</span> m <span class="hl-kw">in</span> tool_markers):
        <span class="hl-kw">return</span> <span class="hl-str">"tool_task"</span>

    <span class="hl-cm"># Complex questions — needs Claude</span>
    <span class="hl-kw">if</span> <span class="hl-fn">len</span>(text.<span class="hl-fn">split</span>()) > <span class="hl-num">5</span> <span class="hl-kw">or</span> <span class="hl-str">"?"</span> <span class="hl-kw">in</span> text:
        <span class="hl-kw">return</span> <span class="hl-str">"complex"</span>

    <span class="hl-kw">return</span> <span class="hl-str">"simple"</span>  <span class="hl-cm"># Use Haiku (cheaper model)</span>

This simple heuristic routes ~40% of messages away from the expensive API entirely, and another ~30% to the cheaper Haiku model, achieving that 85% cost reduction — from $15/month to $2/month.

Chapter 04

MCP Servers & Tool System

BUD runs 4 MCP servers offering 24+ tools. Each server is a standalone process communicating via JSON-RPC over stdio.

🐙

GitHub MCP

Create issues, search repos, list PRs, get file contents, manage labels. 8 tools.

📝

Notion MCP

Create pages, query databases, search workspace, update blocks. 6 tools.

💬

Slack Tools MCP

Search messages, get channel history, list channels, send scheduled messages. 5 tools.

📁

File Operations MCP

Read, write, list, search files on local filesystem. 5 tools.

How MCP Discovery Works

<span class="hl-cm"># BUD starts each MCP server as a subprocess</span>
<span class="hl-kw">async def</span> <span class="hl-fn">connect_mcp_server</span>(self, server_path: <span class="hl-type">str</span>):
    process = <span class="hl-kw">await</span> asyncio.<span class="hl-fn">create_subprocess_exec</span>(
        <span class="hl-str">"python"</span>, server_path,
        stdin=asyncio.subprocess.PIPE,
        stdout=asyncio.subprocess.PIPE
    )

    <span class="hl-cm"># Step 1: Ask the server what tools it has</span>
    request = {<span class="hl-str">"jsonrpc"</span>: <span class="hl-str">"2.0"</span>, <span class="hl-str">"method"</span>: <span class="hl-str">"tools/list"</span>, <span class="hl-str">"id"</span>: <span class="hl-num">1</span>}
    tools = <span class="hl-kw">await</span> self.<span class="hl-fn">send_rpc</span>(process, request)

    <span class="hl-cm"># Step 2: Register all discovered tools</span>
    <span class="hl-kw">for</span> tool <span class="hl-kw">in</span> tools:
        self.registry[tool[<span class="hl-str">"name"</span>]] = {
            <span class="hl-str">"process"</span>: process,
            <span class="hl-str">"schema"</span>: tool[<span class="hl-str">"inputSchema"</span>]
        }
    <span class="hl-cm"># Now Claude can see and use these tools automatically!</span>

🔌 Why MCP Matters

Adding a new integration takes minutes: write a new MCP server, drop it in mcp_servers/, and BUD auto-discovers it on next restart. No changes to core code. No new API wrappers. Just plug and play.

Chapter 05

The 5-Layer Memory System

Human memory isn’t one thing — it’s multiple systems. BUD mirrors this with five complementary layers.

Short-Term (Working Memory)Current conversation context — last N messages in a deque

in-memory

Long-Term (Knowledge Base)Persistent facts: “User prefers Python”, “Sprint ends Friday”

MEMORY.md

Daily Episodic LogsWhat happened today — auto-generated summaries of events

data/memory/2026-02-26.md

User ProfilesPer-user preferences, roles, communication style

USER.md

Soul (Personality)Who BUD is — tone, boundaries, core behaviors

SOUL.md

Why File-Based Memory?

Every memory file is plain Markdown you can read, edit, and version control. No hidden databases, no opaque vector stores for core identity. If BUD starts giving weird responses, you can open MEMORY.md in a text editor and see exactly what it “knows.” This is radical transparency by design.

<span class="hl-cm"># data/memory/SOUL.md — BUD's personality</span>

<span class="hl-str">## Core Identity</span>
You are BUD, a helpful AI assistant built by Gopi Trinadh.
You are concise, proactive, and technically precise.

<span class="hl-str">## Communication Style</span>
- Use Slack-native formatting (bold, code blocks, lists)
- Be brief unless the user asks for detail
- Always confirm before destructive actions (deleting, overwriting)

<span class="hl-str">## Boundaries</span>
- Never share API keys or tokens in chat
- Redirect medical/legal questions to professionals
- If unsure, say so — don't hallucinate

Chapter 06

RAG Pipeline — Search by Meaning

Retrieval-Augmented Generation lets BUD search past conversations and documents semantically — not by keywords, but by meaning.

RAG Pipeline Flow

1. Index Phase (Background)

Messages → chunks → embed with MiniLM → store in ChromaDB

▼

2. Query Phase (Real-Time)

User question → embed → cosine search → top 5 matches

▼

3. Augment Phase

Inject matches into Claude’s context → grounded answer

The RAG Engine Code

<span class="hl-kw">from</span> sentence_transformers <span class="hl-kw">import</span> SentenceTransformer
<span class="hl-kw">import</span> chromadb

<span class="hl-kw">class</span> <span class="hl-type">RAGEngine</span>:
    <span class="hl-kw">def</span> <span class="hl-fn">__init__</span>(self):
        <span class="hl-cm"># Local model — runs on CPU, zero API cost</span>
        self.embedder = <span class="hl-fn">SentenceTransformer</span>(<span class="hl-str">"all-MiniLM-L6-v2"</span>)
        self.db = chromadb.<span class="hl-fn">PersistentClient</span>(path=<span class="hl-str">"data/chromadb"</span>)
        self.collection = self.db.<span class="hl-fn">get_or_create_collection</span>(<span class="hl-str">"messages"</span>)

    <span class="hl-kw">async def</span> <span class="hl-fn">index_message</span>(self, text: <span class="hl-type">str</span>, channel: <span class="hl-type">str</span>):
        embedding = self.embedder.<span class="hl-fn">encode</span>(text).<span class="hl-fn">tolist</span>()
        self.collection.<span class="hl-fn">add</span>(
            documents=[text],
            embeddings=[embedding],
            metadatas=[{<span class="hl-str">"channel"</span>: channel, <span class="hl-str">"ts"</span>: <span class="hl-fn">time</span>()}],
            ids=[<span class="hl-fn">str</span>(<span class="hl-fn">uuid4</span>())]
        )

    <span class="hl-kw">async def</span> <span class="hl-fn">search</span>(self, query: <span class="hl-type">str</span>, top_k: <span class="hl-type">int</span> = <span class="hl-num">5</span>) -> <span class="hl-type">list</span>:
        embedding = self.embedder.<span class="hl-fn">encode</span>(query).<span class="hl-fn">tolist</span>()
        results = self.collection.<span class="hl-fn">query</span>(
            query_embeddings=[embedding],
            n_results=top_k
        )
        <span class="hl-kw">return</span> results[<span class="hl-str">"documents"</span>][<span class="hl-num">0</span>]  <span class="hl-cm"># Top matching messages</span>

Metric	Value	Detail
Embedding Model	all-MiniLM-L6-v2	384 dimensions, 22M params
Vector DB	ChromaDB (local)	Persistent, file-backed
Index Capacity	1,000+ messages	Scales linearly
Retrieval Accuracy	92%	Top-5 relevance
Search Latency	<50ms (P95: 45ms)	On Mac Mini M4
API Cost	$0	Everything runs locally

Chapter 07

Multi-Platform Integration

BUD serves 5 platforms simultaneously from a single async process using asyncio.gather() — no microservices needed.

💬

Slack

Socket Mode — persistent WebSocket. No public URL needed. Reacts to mentions, DMs, and threads.

🎮

Discord

Discord.py gateway — real-time events. Supports slash commands, mentions, and DMs.

👔

Microsoft Teams

Bot Framework SDK — works inside Teams channels and 1:1 chats. Enterprise-ready.

💌

Google Chat

Pub/Sub or HTTP handler. Works in Google Workspace environments.

🌐

Web UI

FastAPI + WebSocket — browser-based chat with real-time streaming and voice I/O.

The Single-Process Architecture

<span class="hl-cm"># main.py — BUD's entry point</span>
<span class="hl-kw">async def</span> <span class="hl-fn">main</span>():
    agent = <span class="hl-kw">await</span> <span class="hl-fn">create_agent</span>()  <span class="hl-cm"># Initialize core + MCP + RAG</span>

    <span class="hl-cm"># Launch ALL platforms simultaneously</span>
    <span class="hl-kw">await</span> asyncio.<span class="hl-fn">gather</span>(
        <span class="hl-fn">start_slack</span>(agent),        <span class="hl-cm"># Slack Socket Mode</span>
        <span class="hl-fn">start_discord</span>(agent),      <span class="hl-cm"># Discord gateway</span>
        <span class="hl-fn">start_teams</span>(agent),        <span class="hl-cm"># Teams bot framework</span>
        <span class="hl-fn">start_google_chat</span>(agent),  <span class="hl-cm"># Google Chat handler</span>
        <span class="hl-fn">start_web</span>(agent),          <span class="hl-cm"># FastAPI + WebSocket</span>
        <span class="hl-fn">start_heartbeat</span>(agent),    <span class="hl-cm"># Scheduler for cron tasks</span>
        return_exceptions=<span class="hl-kw">True</span>   <span class="hl-cm"># Crash isolation!</span>
    )

asyncio.<span class="hl-fn">run</span>(<span class="hl-fn">main</span>())

🛡️ Crash Isolation

return_exceptions=True is the secret weapon. If Discord crashes, Slack keeps running. If Teams throws an error, the web UI is unaffected. Each platform is an independent coroutine. This is production resilience without Kubernetes complexity.

Chapter 08

Code Deep-Dive — Selected Components

Let’s examine the trickiest parts: the streaming WebSocket, the memory manager, and the system prompt builder.

Real-Time Token Streaming (WebSocket)

The web UI doesn’t wait for the full response — it streams tokens as Claude generates them, creating a “typing” effect:

<span class="hl-dec">@app.websocket</span>(<span class="hl-str">"/ws"</span>)
<span class="hl-kw">async def</span> <span class="hl-fn">websocket_endpoint</span>(ws: WebSocket):
    <span class="hl-kw">await</span> ws.<span class="hl-fn">accept</span>()
    <span class="hl-kw">while</span> <span class="hl-kw">True</span>:
        user_msg = <span class="hl-kw">await</span> ws.<span class="hl-fn">receive_text</span>()

        <span class="hl-cm"># Stream response token-by-token</span>
        <span class="hl-kw">async with</span> client.messages.<span class="hl-fn">stream</span>(
            model=<span class="hl-str">"claude-sonnet-4-20250514"</span>,
            messages=[{<span class="hl-str">"role"</span>: <span class="hl-str">"user"</span>, <span class="hl-str">"content"</span>: user_msg}],
            system=system_prompt,
            max_tokens=<span class="hl-num">4096</span>
        ) <span class="hl-kw">as</span> stream:
            <span class="hl-kw">async for</span> text <span class="hl-kw">in</span> stream.<span class="hl-fn">text_stream</span>:
                <span class="hl-kw">await</span> ws.<span class="hl-fn">send_json</span>({
                    <span class="hl-str">"type"</span>: <span class="hl-str">"token"</span>,
                    <span class="hl-str">"content"</span>: text
                })

        <span class="hl-kw">await</span> ws.<span class="hl-fn">send_json</span>({<span class="hl-str">"type"</span>: <span class="hl-str">"done"</span>})

System Prompt Builder — Dynamic Context Assembly

<span class="hl-kw">def</span> <span class="hl-fn">build_system_prompt</span>(soul, memory, rag, tools_hint) -> <span class="hl-type">str</span>:
    sections = [
        soul,  <span class="hl-cm"># SOUL.md — who you are</span>
        <span class="hl-str">f"\n## Current Knowledge\n{memory}"</span>,  <span class="hl-cm"># MEMORY.md</span>
    ]

    <span class="hl-kw">if</span> rag:
        sections.<span class="hl-fn">append</span>(
            <span class="hl-str">f"\n## Relevant Context (from past conversations)\n"</span>
            + <span class="hl-str">"\n"</span>.<span class="hl-fn">join</span>(<span class="hl-str">f"- {r}"</span> <span class="hl-kw">for</span> r <span class="hl-kw">in</span> rag)
        )

    <span class="hl-kw">if</span> tools_hint:
        sections.<span class="hl-fn">append</span>(
            <span class="hl-str">f"\n## Available Tools\n{tools_hint}"</span>
        )

    <span class="hl-kw">return</span> <span class="hl-str">"\n"</span>.<span class="hl-fn">join</span>(sections)

Chapter 09

Deploying BUD on a Mac Mini

The Mac Mini M4 is the recommended hardware: silent, 12W power draw, developer-friendly, and powerful enough for all BUD workloads.

🖥️

Mac Mini M4 — BUD’s Recommended Home

Silent operation. 12W idle. Always-on AI assistant server.

10-Core CPU

16 GB

Unified Memory

256 GB

NVMe SSD

12W

Idle Power

$599

Starting Price

Why Dedicated Hardware Over Cloud?

Factor	Cloud (AWS/GCP)	Mac Mini
Monthly Cost	$15–50/month recurring	$0 after purchase
Latency	Variable (region-dependent)	Consistent, local
Data Privacy	Data leaves your network	Everything stays home
Control	Provider-limited	Full root access
Setup Time	Minutes	~2 hours (one-time)

Step-by-Step Mac Mini Setup

Install Developer Tools

<span class="hl-cm"># Install Homebrew</span>
/bin/bash -c <span class="hl-str">"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"</span>

<span class="hl-cm"># Install Python 3.12, Node.js, Git</span>
brew install python@<span class="hl-num">3.12</span> node git

Clone & Configure BUD

git clone https://github.com/GOPITRINADH3561/Project_OpenClaw.git
<span class="hl-kw">cd</span> Project_OpenClaw

<span class="hl-cm"># Create virtual environment</span>
python3.12 -m venv .venv
<span class="hl-kw">source</span> .venv/bin/activate

<span class="hl-cm"># Install dependencies</span>
pip install -r requirements.txt

<span class="hl-cm"># Configure API keys</span>
cp .env.example .env
nano .env  <span class="hl-cm"># Add ANTHROPIC_API_KEY, SLACK_BOT_TOKEN, etc.</span>

Test Run

python main.py
<span class="hl-cm"># You should see:</span>
<span class="hl-cm"># ✅ Agent core initialized</span>
<span class="hl-cm"># ✅ MCP servers connected (24 tools)</span>
<span class="hl-cm"># ✅ RAG engine loaded (all-MiniLM-L6-v2)</span>
<span class="hl-cm"># ✅ Slack connected</span>
<span class="hl-cm"># ✅ Discord connected</span>
<span class="hl-cm"># ✅ Web UI at http://localhost:8080</span>

Auto-Start on Boot with launchd

<span class="hl-cm"><!-- ~/Library/LaunchAgents/com.bud.assistant.plist --></span>
<span class="hl-dec"><?xml</span> version="1.0"<span class="hl-dec">?></span>
<span class="hl-kw"><plist</span> version="1.0"<span class="hl-kw">></span>
<span class="hl-kw"><dict></span>
  <span class="hl-kw"><key></span>Label<span class="hl-kw"></key></span>
  <span class="hl-str"><string>com.bud.assistant</string></span>
  <span class="hl-kw"><key></span>ProgramArguments<span class="hl-kw"></key></span>
  <span class="hl-kw"><array></span>
    <span class="hl-str"><string>/Users/you/Project_OpenClaw/.venv/bin/python</string></span>
    <span class="hl-str"><string>/Users/you/Project_OpenClaw/main.py</string></span>
  <span class="hl-kw"></array></span>
  <span class="hl-kw"><key></span>RunAtLoad<span class="hl-kw"></key></span>  <span class="hl-kw"><true/></span>
  <span class="hl-kw"><key></span>KeepAlive<span class="hl-kw"></key></span>  <span class="hl-kw"><true/></span>
<span class="hl-kw"></dict></span>
<span class="hl-kw"></plist></span>

<span class="hl-cm"># Load the service — BUD starts on every boot</span>
launchctl load ~/Library/LaunchAgents/com.bud.assistant.plist

Remote Access with Tailscale

<span class="hl-cm"># Install Tailscale — access BUD from anywhere</span>
brew install tailscale
sudo tailscale up

<span class="hl-cm"># Now access BUD's Web UI from any device:</span>
<span class="hl-cm"># http://100.x.y.z:8080</span>

📊 Performance on Mac Mini M4

Benchmarked with 5 platforms active and 1,000+ indexed messages:

Operation	Average	P95
Simple message (no tools)	1.5s	3.0s
Single tool call	3.5s	6.0s
Multi-tool complex task	6.0s	10.0s
RAG retrieval (5 results)	25ms	45ms
Intent classification	<1ms	1ms
Bot startup (warm)	3s	6s
Idle RAM usage	180 MB	—

Chapter 10

Cost Analysis & Performance

BUD delivers 85% cost reduction versus naive API usage, and runs cheaper than any SaaS alternative.

Monthly Cost Comparison

ChatGPT Team (per user)$25/mo

$25.00

BUD — Naive API usage$15/mo

$15.00

BUD — With optimizations$2/mo

$2.00

How We Achieved 85% Reduction

🎯

Intent Classification

~40% of messages (greetings, thanks) handled with zero API calls.

🔀

Two-Model Routing

Simple questions → Haiku ($0.25/MTok). Complex → Sonnet ($3/MTok). 30% savings.

📐

Token Budgeting

Dynamic max_tokens based on question complexity. Short question = short budget.

🔍

Conditional RAG

Only query ChromaDB when the message likely needs historical context. Saves embedding compute.

Usage Tiers

Usage	Messages/Day	Monthly API	Annual (incl. hardware)
Light (personal)	10	$1.80	$28/yr (after Year 1)
Medium (small team)	50	$9.60	$121/yr
Heavy (active team)	200	$42.00	$510/yr

Chapter 11

Future Impacts & What We’re Building Next

BUD isn’t finished — it’s a living platform. Here’s the roadmap of what’s actively being built and deployed soon.

✅ Live Now

Multi-Platform Agent Core

5-platform support, 24+ MCP tools, 5-layer memory, RAG pipeline, real-time streaming, voice I/O. Production-ready.

🔨 Building Now

Local LLM Fallback (Ollama)

When Claude API is unreachable or for sensitive conversations, BUD will fall back to a local model via Ollama — zero data leaves your machine.

🔨 Building Now

Computer Vision Pipeline

Upload images to BUD and get analysis — receipts, whiteboards, documents. Powered by Claude’s multimodal capabilities.

📋 Deploying Soon

Docker Compose One-Click

Single docker compose up to run BUD + all MCP servers + ChromaDB + web UI. Zero manual setup.

📋 Deploying Soon

HuggingFace Spaces Demo

Live web demo on HuggingFace where anyone can try BUD’s agent capabilities without installing anything.

🔬 Research Phase

Multi-Agent Orchestration

BUD spawning specialized sub-agents for complex tasks — a coding agent, a research agent, a writing agent — all coordinated by the core.

The Bigger Picture — Why This Matters

BUD represents a fundamental shift in how we interact with AI. Instead of visiting a website to chat with a model, the model comes to you — inside the tools you already use. The AI becomes ambient infrastructure, always available across every communication channel.

🔮 Future Vision

We’re building toward a world where every team and every individual has a personal AI that knows their context (memory), can take action (tools), lives where they work (multi-platform), and keeps getting smarter (RAG indexing). BUD is the open-source blueprint for that future.

Development Timeline

January 2026

Foundation — First Principles Learning

Studied AI agent architecture from scratch. Completed Dr. Raj Dandekar’s Vizuara Labs coursework on MCP, embeddings, and tool-use patterns.

February 2026

Core Build — Agent + MCP + RAG + Memory

Built the complete agent core from first principles. Implemented all 4 MCP servers, the 5-layer memory system, and RAG pipeline. Integrated 5 chat platforms.

March 2026 — Planned

Docker + HuggingFace + CI/CD

Containerize everything. Deploy live demo. GitHub Actions for automated testing and deployment.

April 2026 — Planned

Local LLM + Vision + Multi-Agent

Ollama fallback for offline mode. Image understanding. Sub-agent coordination for complex workflows.

Q3 2026 — Vision

Plugin Marketplace + Community

Open MCP server marketplace. Community-contributed tools. Self-updating capability.

How to Contribute

BUD is fully open source. The entire codebase, documentation, and this blog are available for anyone to learn from, fork, and extend:

🐙

GitHub Repo

GOPITRINADH3561/Project_OpenClaw — Star it, fork it, contribute.

🤗

HuggingFace

Live demo space coming soon — try BUD without installing anything.

📖

Full Documentation

84-page technical blog, annotated README, inline code comments on every file.

Quick Reference

Technology Stack at a Glance

Layer	Technology	Why This Choice
AI Engine	Anthropic Claude (Sonnet + Haiku)	Best tool-use, long context, streaming
Tool Protocol	Model Context Protocol (MCP)	Standard, extensible, discoverable
Embeddings	all-MiniLM-L6-v2 (local)	Zero cost, fast, good accuracy
Vector DB	ChromaDB (persistent)	Simple, local, file-backed
Web Framework	FastAPI + WebSocket	Async-native, fast, great DX
Slack	Socket Mode (slack-bolt)	No public URL needed
Discord	discord.py	Mature, full-featured
Teams	Bot Framework SDK	Official Microsoft integration
Scheduling	APScheduler + SQLite	Persistent jobs, survives restarts
Memory	Markdown files + deque	Transparent, editable, version-controllable
Language	Python 3.12 (100% async)	Ecosystem, readability, async support
Deployment	Mac Mini M4 + launchd	Silent, 12W, always-on

🎯 The Core Philosophy

“Build everything from scratch. Understand every layer. Then — and only then — you can deploy with confidence and explain every decision in an interview.”

— Project BUD Design Manifesto

Gopi Trinadh Maddikunta

Gopi Trinadh Maddikunta

BUD

First Principles — What Even Is an AI Agent?

Chatbot vs. Agent — The Core Distinction

How LLMs Actually Work (The 60-Second Version)

What Are Embeddings?

Model Context Protocol (MCP)

Why Build From Scratch? (No LangChain, No Frameworks)

System Architecture — The Complete Blueprint

Project File Structure

The Agent Loop — Heart of BUD

The Core Code — agent/core.py

Intent Classification — Saving 85% on API Costs

MCP Servers & Tool System

GitHub MCP

Notion MCP

Slack Tools MCP

File Operations MCP

How MCP Discovery Works

The 5-Layer Memory System

Why File-Based Memory?

RAG Pipeline — Search by Meaning

The RAG Engine Code

Multi-Platform Integration

Slack

Discord

Microsoft Teams

Google Chat

Web UI

The Single-Process Architecture

Code Deep-Dive — Selected Components

Real-Time Token Streaming (WebSocket)

System Prompt Builder — Dynamic Context Assembly

Deploying BUD on a Mac Mini

Mac Mini M4 — BUD’s Recommended Home

Why Dedicated Hardware Over Cloud?

Step-by-Step Mac Mini Setup

Install Developer Tools

Clone & Configure BUD

Test Run

Auto-Start on Boot with launchd

Remote Access with Tailscale

Cost Analysis & Performance

Monthly Cost Comparison

How We Achieved 85% Reduction

Intent Classification

Two-Model Routing

Token Budgeting

Conditional RAG

Usage Tiers

Future Impacts & What We’re Building Next

Multi-Platform Agent Core

Local LLM Fallback (Ollama)

Computer Vision Pipeline

Docker Compose One-Click

HuggingFace Spaces Demo

Multi-Agent Orchestration

The Bigger Picture — Why This Matters

Development Timeline

Foundation — First Principles Learning

Core Build — Agent + MCP + RAG + Memory

Docker + HuggingFace + CI/CD

Local LLM + Vision + Multi-Agent

Plugin Marketplace + Community

How to Contribute

GitHub Repo

HuggingFace

Full Documentation

Technology Stack at a Glance

The Core Code — `agent/core.py`