BUD — Build, Understand, Deploy | Complete AI Assistant Engineering Guide

Open Source • From First Principles • Production-Ready

BUD

Build · Understand · Deploy

The complete engineering guide to building a self-hosted, multi-platform AI assistant from scratch — no LangChain, no abstractions, every single line explained.

Built by Gopi Trinadh · Learned from Dr. Raj Dandekar of Vizuara

24+MCP Tools

5Platforms

5Memory Layers

<50msRAG Latency

85%Cost Reduced

48Source Files

↓ scroll to begin ↓

Chapter 01

First Principles — What Even Is an AI Agent?

Before writing a single line of code, let’s ground ourselves in the foundational concepts. If you understand these five ideas, you can build anything.

Chatbot vs. Agent — The Core Distinction

A chatbot is a text-in, text-out function. It has no memory, no tools, no autonomy. Ask it today and it won’t remember tomorrow. It can’t check your calendar, file a bug, or search your documents.

An agent operates on a fundamentally different loop. It perceives (receives events from Slack, Discord, Teams, Web), reasons (decides what action to take), acts (calls tools via MCP), and remembers (persists context across sessions). BUD is an agent.

Chatbot — Single Turn

💬

User Input

→

🤖

LLM

→

📝

Text Output

Agent — Perceive → Reason → Act → Remember

👁️

Perceive

5 platforms

→

🧠

Reason

Claude + intent

→

⚡

Act

24+ MCP tools

→

💾

Remember

5-layer memory

How LLMs Actually Work (The 60-Second Version)

Large Language Models like Claude are neural networks trained on massive text data. Given a sequence of tokens, they predict the next most probable token. At scale — billions of parameters — this simple mechanism produces reasoning, code, tool invocations, and nuanced conversation.

The critical insight for agent builders: LLMs don’t execute code or call APIs. They generate structured text (JSON) that describes which tool to call with what arguments. Your code then executes the tool and feeds the result back. This is the “agent loop.”

🔑 Key Insight

The LLM is the brain, not the hands. It decides what to do but cannot act alone. Your code is the body — the hands that call APIs, the eyes that read databases, the memory that persists between conversations. Building an agent means building this body.

What Are Embeddings?

Embeddings convert text into dense numerical vectors — arrays of floating-point numbers that encode semantic meaning. Similar meanings produce vectors that are geometrically close in high-dimensional space. This is how BUD can search conversations by meaning, not just keywords.

Embedding Space — Similar Meanings → Close Vectors

“Login page is broken”

[0.23, -0.15, 0.87, …]

“Auth is failing”

[0.21, -0.12, 0.85, …]

↑ Cosine similarity = 0.96 — nearly identical! ↑

“Let’s order pizza”

[-0.54, 0.72, 0.11, …] — similarity = 0.12

BUD uses all-MiniLM-L6-v2 from sentence-transformers — runs locally with zero API cost, produces 384-dimensional vectors, sub-50ms latency. These power the RAG pipeline.

Model Context Protocol (MCP)

MCP is what makes BUD’s tool system scalable. Instead of hardcoding every integration, MCP defines a standard protocol (JSON-RPC over stdio) where tool servers announce their capabilities and the AI client discovers them automatically. Think USB for AI tools — plug any MCP server into any MCP client.

Why Build From Scratch? (No LangChain, No Frameworks)

BUD deliberately avoids LangChain, LlamaIndex, CrewAI, and every other framework. Not because they’re bad — because the goal is understanding. When you build the agent loop yourself, you understand exactly why retry logic matters, why memory needs layers, why RAG needs chunking strategies. In an interview, you can explain every component because you built every component.

💡 Philosophy

Depth over convenience. Every file in this project is a learning resource. Every design decision is documented with the “why”, not just the “what”. A framework hides complexity. BUD exposes it.

Chapter 02

System Architecture — The Complete Blueprint

BUD’s architecture follows five principles: educational-first, minimal folders, graceful degradation, file-based transparency, and single-process async.

BUD System Architecture — Full View

💬

Slack

Socket Mode

🎮

Discord

Gateway

👔

Teams

Bot Framework

💌

Google Chat

Pub/Sub

🌐

Web UI

FastAPI + WS

▼

🧠

Agent Core (agent/core.py)

Intent → Claude API → Tool Loop → Response

▼

🔧

MCP Tools

24+ tools, 4 servers

🔍

RAG Engine

ChromaDB + embeddings

💾

Memory

5 layers, file-backed

⏰

Heartbeat

Scheduler + health

Project File Structure

bud/ ├── main.py # Entry point — starts all platforms + agent ├── requirements.txt # All Python dependencies ├── .env.example # Template for API keys ├── agent/ │ ├── __init__.py │ ├── core.py # 🧠 The agent loop — heart of BUD │ ├── memory.py # 5-layer memory manager │ ├── intent.py # Heuristic intent classifier │ └── prompts.py # System prompt builder ├── config/ │ └── settings.py # Central config from .env ├── mcp_servers/ │ ├── github_server.py # GitHub MCP (issues, PRs, repos) │ ├── notion_server.py # Notion MCP (pages, DBs) │ ├── slack_tools_server.py # Slack MCP (search, history) │ └── file_server.py # File operations MCP ├── rag/ │ ├── engine.py # RAG pipeline (embed, index, search) │ └── indexer.py # Background message indexer ├── slack_app/ │ └── bot.py # Slack Socket Mode handler ├── discord_app/ │ └── bot.py # Discord.py gateway handler ├── teams_app/ │ └── bot.py # Microsoft Teams bot ├── google_chat_app/ │ └── bot.py # Google Chat Pub/Sub handler ├── web_app/ │ ├── server.py # FastAPI + WebSocket server │ └── static/index.html # Browser UI with voice I/O ├── heartbeat/ │ └── scheduler.py # APScheduler for cron jobs └── data/ └── memory/ ├── SOUL.md # Bot personality definition ├── MEMORY.md # Long-term knowledge base ├── USER.md # User profiles └── 2026-02-26.md # Daily event logs

📐 Design Principle

Graceful Degradation: If GitHub MCP is down, BUD still works — it just can’t create issues. If RAG is empty, Claude answers from its own knowledge. If Slack is disconnected, Discord keeps running. Nothing crashes the whole system.

Chapter 03

The Agent Loop — Heart of BUD

Everything runs through one function: process_message(). This is the brain of the entire system.

Agent Loop — Step by Step

1. Message Arrives

From any platform (Slack, Discord, Teams…)

▼

2. Intent Classification

Heuristic check — is this a simple greeting or a complex task?

▼

3. Context Assembly

Load memory, RAG results, tool definitions, system prompt

▼

4. Claude API Call

Send assembled context → Claude returns text OR tool_use

▼

5. Tool Execution Loop

If tool_use → execute → feed result → call Claude again (max 5 rounds)

▼

6. Respond + Remember

Send final response → update memory → index for RAG

The Core Code — `agent/core.py`

This is the most important file in the entire project. Let’s walk through the key function:

async def process_message(self, user_msg: str, channel: str, user_id: str) -> str:
    # Step 1: Classify intent — avoid expensive API calls for simple messages
    intent = classify_intent(user_msg)

    if intent == "greeting":
        return random_greeting(user_id)  # $0 — no API call needed

    # Step 2: Assemble context from all memory layers
    memory_context = await self.memory.get_context(user_id, channel)
    rag_results = await self.rag.search(user_msg, top_k=5)
    tools = await self.mcp_manager.get_all_tools()

    # Step 3: Build the system prompt
    system = build_system_prompt(
        soul=self.memory.soul,
        memory=memory_context,
        rag=rag_results,
        tools_hint=summarize_tools(tools)
    )

    # Step 4: Call Claude with full context
    messages = [{"role": "user", "content": user_msg}]

    # Step 5: The Tool Loop — up to MAX_TOOL_ROUNDS iterations
    for round in range(5):  # Safety cap: max 5 rounds
        response = await self.call_claude(
            system=system,
            messages=messages,
            tools=tools,
            timeout=30.0  # Triple-timeout protection
        )

        # If Claude returns text → we're done!
        if response.stop_reason == "end_turn":
            final_text = response.content[0].text
            break

        # If Claude returns tool_use → execute and loop
        if response.stop_reason == "tool_use":
            tool_call = extract_tool_call(response)
            result = await self.mcp_manager.execute(
                tool_call.name, tool_call.input
            )
            # Feed result back to Claude for next iteration
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": [tool_result_block]})

    # Step 6: Remember this interaction
    await self.memory.update(user_id, user_msg, final_text)
    await self.rag.index_message(user_msg, channel)

    return final_text

⚡ Performance Detail

Triple-timeout protection: Per-API call timeout (30s), per-tool execution timeout (15s), and overall message timeout (60s). If any tier fires, BUD returns a graceful fallback message instead of hanging.

Intent Classification — Saving 85% on API Costs

Before calling Claude’s API ($3/MTok for Opus), BUD checks if the message even needs an LLM. Simple greetings, thank-yous, and status checks are handled by a zero-cost heuristic classifier:

def classify_intent(text: str) -> str:
    lower = text.lower().strip()

    # Greetings — no API needed
    greetings = {"hi", "hello", "hey", "morning", "sup", "yo"}
    if lower in greetings:
        return "greeting"

    # Tool-required tasks — needs Claude + MCP
    tool_markers = ["create issue", "github", "notion", "schedule", "remind"]
    if any(m in lower for m in tool_markers):
        return "tool_task"

    # Complex questions — needs Claude
    if len(text.split()) > 5 or "?" in text:
        return "complex"

    return "simple"  # Use Haiku (cheaper model)

This simple heuristic routes ~40% of messages away from the expensive API entirely, and another ~30% to the cheaper Haiku model, achieving that 85% cost reduction — from $15/month to $2/month.

Chapter 04

MCP Servers & Tool System

BUD runs 4 MCP servers offering 24+ tools. Each server is a standalone process communicating via JSON-RPC over stdio.

🐙

GitHub MCP

Create issues, search repos, list PRs, get file contents, manage labels. 8 tools.

📝

Notion MCP

Create pages, query databases, search workspace, update blocks. 6 tools.

💬

Slack Tools MCP

Search messages, get channel history, list channels, send scheduled messages. 5 tools.

📁

File Operations MCP

Read, write, list, search files on local filesystem. 5 tools.

How MCP Discovery Works

# BUD starts each MCP server as a subprocess
async def connect_mcp_server(self, server_path: str):
    process = await asyncio.create_subprocess_exec(
        "python", server_path,
        stdin=asyncio.subprocess.PIPE,
        stdout=asyncio.subprocess.PIPE
    )

    # Step 1: Ask the server what tools it has
    request = {"jsonrpc": "2.0", "method": "tools/list", "id": 1}
    tools = await self.send_rpc(process, request)

    # Step 2: Register all discovered tools
    for tool in tools:
        self.registry[tool["name"]] = {
            "process": process,
            "schema": tool["inputSchema"]
        }
    # Now Claude can see and use these tools automatically!

🔌 Why MCP Matters

Adding a new integration takes minutes: write a new MCP server, drop it in mcp_servers/, and BUD auto-discovers it on next restart. No changes to core code. No new API wrappers. Just plug and play.

Chapter 05

The 5-Layer Memory System

Human memory isn’t one thing — it’s multiple systems. BUD mirrors this with five complementary layers.

Short-Term (Working Memory)Current conversation context — last N messages in a deque

in-memory

Long-Term (Knowledge Base)Persistent facts: “User prefers Python”, “Sprint ends Friday”

MEMORY.md

Daily Episodic LogsWhat happened today — auto-generated summaries of events

data/memory/2026-02-26.md

User ProfilesPer-user preferences, roles, communication style

USER.md

Soul (Personality)Who BUD is — tone, boundaries, core behaviors

SOUL.md

Why File-Based Memory?

Every memory file is plain Markdown you can read, edit, and version control. No hidden databases, no opaque vector stores for core identity. If BUD starts giving weird responses, you can open MEMORY.md in a text editor and see exactly what it “knows.” This is radical transparency by design.

# data/memory/SOUL.md — BUD's personality

## Core Identity
You are BUD, a helpful AI assistant built by Gopi Trinadh.
You are concise, proactive, and technically precise.

## Communication Style
- Use Slack-native formatting (bold, code blocks, lists)
- Be brief unless the user asks for detail
- Always confirm before destructive actions (deleting, overwriting)

## Boundaries
- Never share API keys or tokens in chat
- Redirect medical/legal questions to professionals
- If unsure, say so — don't hallucinate

Chapter 06

RAG Pipeline — Search by Meaning

Retrieval-Augmented Generation lets BUD search past conversations and documents semantically — not by keywords, but by meaning.

RAG Pipeline Flow

1. Index Phase (Background)

Messages → chunks → embed with MiniLM → store in ChromaDB

▼

2. Query Phase (Real-Time)

User question → embed → cosine search → top 5 matches

▼

3. Augment Phase

Inject matches into Claude’s context → grounded answer

The RAG Engine Code

from sentence_transformers import SentenceTransformer
import chromadb

class RAGEngine:
    def __init__(self):
        # Local model — runs on CPU, zero API cost
        self.embedder = SentenceTransformer("all-MiniLM-L6-v2")
        self.db = chromadb.PersistentClient(path="data/chromadb")
        self.collection = self.db.get_or_create_collection("messages")

    async def index_message(self, text: str, channel: str):
        embedding = self.embedder.encode(text).tolist()
        self.collection.add(
            documents=[text],
            embeddings=[embedding],
            metadatas=[{"channel": channel, "ts": time()}],
            ids=[str(uuid4())]
        )

    async def search(self, query: str, top_k: int = 5) -> list:
        embedding = self.embedder.encode(query).tolist()
        results = self.collection.query(
            query_embeddings=[embedding],
            n_results=top_k
        )
        return results["documents"][0]  # Top matching messages

Metric	Value	Detail
Embedding Model	all-MiniLM-L6-v2	384 dimensions, 22M params
Vector DB	ChromaDB (local)	Persistent, file-backed
Index Capacity	1,000+ messages	Scales linearly
Retrieval Accuracy	92%	Top-5 relevance
Search Latency	<50ms (P95: 45ms)	On Mac Mini M4
API Cost	$0	Everything runs locally

Chapter 07

Multi-Platform Integration

BUD serves 5 platforms simultaneously from a single async process using asyncio.gather() — no microservices needed.

💬

Slack

Socket Mode — persistent WebSocket. No public URL needed. Reacts to mentions, DMs, and threads.

🎮

Discord

Discord.py gateway — real-time events. Supports slash commands, mentions, and DMs.

👔

Microsoft Teams

Bot Framework SDK — works inside Teams channels and 1:1 chats. Enterprise-ready.

💌

Google Chat

Pub/Sub or HTTP handler. Works in Google Workspace environments.

🌐

Web UI

FastAPI + WebSocket — browser-based chat with real-time streaming and voice I/O.

The Single-Process Architecture

# main.py — BUD's entry point
async def main():
    agent = await create_agent()  # Initialize core + MCP + RAG

    # Launch ALL platforms simultaneously
    await asyncio.gather(
        start_slack(agent),        # Slack Socket Mode
        start_discord(agent),      # Discord gateway
        start_teams(agent),        # Teams bot framework
        start_google_chat(agent),  # Google Chat handler
        start_web(agent),          # FastAPI + WebSocket
        start_heartbeat(agent),    # Scheduler for cron tasks
        return_exceptions=True   # Crash isolation!
    )

asyncio.run(main())

🛡️ Crash Isolation

return_exceptions=True is the secret weapon. If Discord crashes, Slack keeps running. If Teams throws an error, the web UI is unaffected. Each platform is an independent coroutine. This is production resilience without Kubernetes complexity.

Chapter 08

Code Deep-Dive — Selected Components

Let’s examine the trickiest parts: the streaming WebSocket, the memory manager, and the system prompt builder.

Real-Time Token Streaming (WebSocket)

The web UI doesn’t wait for the full response — it streams tokens as Claude generates them, creating a “typing” effect:

@app.websocket("/ws")
async def websocket_endpoint(ws: WebSocket):
    await ws.accept()
    while True:
        user_msg = await ws.receive_text()

        # Stream response token-by-token
        async with client.messages.stream(
            model="claude-sonnet-4-20250514",
            messages=[{"role": "user", "content": user_msg}],
            system=system_prompt,
            max_tokens=4096
        ) as stream:
            async for text in stream.text_stream:
                await ws.send_json({
                    "type": "token",
                    "content": text
                })

        await ws.send_json({"type": "done"})

System Prompt Builder — Dynamic Context Assembly

def build_system_prompt(soul, memory, rag, tools_hint) -> str:
    sections = [
        soul,  # SOUL.md — who you are
        f"\n## Current Knowledge\n{memory}",  # MEMORY.md
    ]

    if rag:
        sections.append(
            f"\n## Relevant Context (from past conversations)\n"
            + "\n".join(f"- {r}" for r in rag)
        )

    if tools_hint:
        sections.append(
            f"\n## Available Tools\n{tools_hint}"
        )

    return "\n".join(sections)

Chapter 09

Deploying BUD on a Mac Mini

The Mac Mini M4 is the recommended hardware: silent, 12W power draw, developer-friendly, and powerful enough for all BUD workloads.

🖥️

Mac Mini M4 — BUD’s Recommended Home

Silent operation. 12W idle. Always-on AI assistant server.

10-Core CPU

16 GB

Unified Memory

256 GB

NVMe SSD

12W

Idle Power

$599

Starting Price

Why Dedicated Hardware Over Cloud?

Factor	Cloud (AWS/GCP)	Mac Mini
Monthly Cost	$15–50/month recurring	$0 after purchase
Latency	Variable (region-dependent)	Consistent, local
Data Privacy	Data leaves your network	Everything stays home
Control	Provider-limited	Full root access
Setup Time	Minutes	~2 hours (one-time)

Step-by-Step Mac Mini Setup

Install Developer Tools

# Install Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Python 3.12, Node.js, Git
brew install python@3.12 node git

Clone & Configure BUD

git clone https://github.com/GOPITRINADH3561/Project_OpenClaw.git
cd Project_OpenClaw

# Create virtual environment
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
nano .env  # Add ANTHROPIC_API_KEY, SLACK_BOT_TOKEN, etc.

Test Run

python main.py
# You should see:
# ✅ Agent core initialized
# ✅ MCP servers connected (24 tools)
# ✅ RAG engine loaded (all-MiniLM-L6-v2)
# ✅ Slack connected
# ✅ Discord connected
# ✅ Web UI at http://localhost:8080

Auto-Start on Boot with launchd

<!-- ~/Library/LaunchAgents/com.bud.assistant.plist -->
<?xml version="1.0"?>
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.bud.assistant</string>
  <key>ProgramArguments</key>
  <array>
    <string>/Users/you/Project_OpenClaw/.venv/bin/python</string>
    <string>/Users/you/Project_OpenClaw/main.py</string>
  </array>
  <key>RunAtLoad</key>  <true/>
  <key>KeepAlive</key>  <true/>
</dict>
</plist>

# Load the service — BUD starts on every boot
launchctl load ~/Library/LaunchAgents/com.bud.assistant.plist

Remote Access with Tailscale

# Install Tailscale — access BUD from anywhere
brew install tailscale
sudo tailscale up

# Now access BUD's Web UI from any device:
# http://100.x.y.z:8080

📊 Performance on Mac Mini M4

Benchmarked with 5 platforms active and 1,000+ indexed messages:

Operation	Average	P95
Simple message (no tools)	1.5s	3.0s
Single tool call	3.5s	6.0s
Multi-tool complex task	6.0s	10.0s
RAG retrieval (5 results)	25ms	45ms
Intent classification	<1ms	1ms
Bot startup (warm)	3s	6s
Idle RAM usage	180 MB	—

Chapter 10

Cost Analysis & Performance

BUD delivers 85% cost reduction versus naive API usage, and runs cheaper than any SaaS alternative.

Monthly Cost Comparison

ChatGPT Team (per user)$25/mo

$25.00

BUD — Naive API usage$15/mo

$15.00

BUD — With optimizations$2/mo

$2.00

How We Achieved 85% Reduction

🎯

Intent Classification

~40% of messages (greetings, thanks) handled with zero API calls.

🔀

Two-Model Routing

Simple questions → Haiku ($0.25/MTok). Complex → Sonnet ($3/MTok). 30% savings.

📐

Token Budgeting

Dynamic max_tokens based on question complexity. Short question = short budget.

🔍

Conditional RAG

Only query ChromaDB when the message likely needs historical context. Saves embedding compute.

Usage Tiers

Usage	Messages/Day	Monthly API	Annual (incl. hardware)
Light (personal)	10	$1.80	$28/yr (after Year 1)
Medium (small team)	50	$9.60	$121/yr
Heavy (active team)	200	$42.00	$510/yr

Chapter 11

Future Impacts & What We’re Building Next

BUD isn’t finished — it’s a living platform. Here’s the roadmap of what’s actively being built and deployed soon.

✅ Live Now

Multi-Platform Agent Core

5-platform support, 24+ MCP tools, 5-layer memory, RAG pipeline, real-time streaming, voice I/O. Production-ready.

🔨 Building Now

Local LLM Fallback (Ollama)

When Claude API is unreachable or for sensitive conversations, BUD will fall back to a local model via Ollama — zero data leaves your machine.

🔨 Building Now

Computer Vision Pipeline

Upload images to BUD and get analysis — receipts, whiteboards, documents. Powered by Claude’s multimodal capabilities.

📋 Deploying Soon

Docker Compose One-Click

Single docker compose up to run BUD + all MCP servers + ChromaDB + web UI. Zero manual setup.

📋 Deploying Soon

HuggingFace Spaces Demo

Live web demo on HuggingFace where anyone can try BUD’s agent capabilities without installing anything.

🔬 Research Phase

Multi-Agent Orchestration

BUD spawning specialized sub-agents for complex tasks — a coding agent, a research agent, a writing agent — all coordinated by the core.

The Bigger Picture — Why This Matters

BUD represents a fundamental shift in how we interact with AI. Instead of visiting a website to chat with a model, the model comes to you — inside the tools you already use. The AI becomes ambient infrastructure, always available across every communication channel.

🔮 Future Vision

We’re building toward a world where every team and every individual has a personal AI that knows their context (memory), can take action (tools), lives where they work (multi-platform), and keeps getting smarter (RAG indexing). BUD is the open-source blueprint for that future.

Development Timeline

January 2026

Foundation — First Principles Learning

Studied AI agent architecture from scratch. Completed Dr. Raj Dandekar’s Vizuara Labs coursework on MCP, embeddings, and tool-use patterns.

February 2026

Core Build — Agent + MCP + RAG + Memory

Built the complete agent core from first principles. Implemented all 4 MCP servers, the 5-layer memory system, and RAG pipeline. Integrated 5 chat platforms.

March 2026 — Planned

Docker + HuggingFace + CI/CD

Containerize everything. Deploy live demo. GitHub Actions for automated testing and deployment.

April 2026 — Planned

Local LLM + Vision + Multi-Agent

Ollama fallback for offline mode. Image understanding. Sub-agent coordination for complex workflows.

Q3 2026 — Vision

Plugin Marketplace + Community

Open MCP server marketplace. Community-contributed tools. Self-updating capability.

How to Contribute

BUD is fully open source. The entire codebase, documentation, and this blog are available for anyone to learn from, fork, and extend:

🐙

GitHub Repo

GOPITRINADH3561/Project_OpenClaw — Star it, fork it, contribute.

🤗

HuggingFace

Live demo space coming soon — try BUD without installing anything.

📖

Full Documentation

84-page technical blog, annotated README, inline code comments on every file.

Quick Reference

Technology Stack at a Glance

Layer	Technology	Why This Choice
AI Engine	Anthropic Claude (Sonnet + Haiku)	Best tool-use, long context, streaming
Tool Protocol	Model Context Protocol (MCP)	Standard, extensible, discoverable
Embeddings	all-MiniLM-L6-v2 (local)	Zero cost, fast, good accuracy
Vector DB	ChromaDB (persistent)	Simple, local, file-backed
Web Framework	FastAPI + WebSocket	Async-native, fast, great DX
Slack	Socket Mode (slack-bolt)	No public URL needed
Discord	discord.py	Mature, full-featured
Teams	Bot Framework SDK	Official Microsoft integration
Scheduling	APScheduler + SQLite	Persistent jobs, survives restarts
Memory	Markdown files + deque	Transparent, editable, version-controllable
Language	Python 3.12 (100% async)	Ecosystem, readability, async support
Deployment	Mac Mini M4 + launchd	Silent, 12W, always-on

🎯 The Core Philosophy

“Build everything from scratch. Understand every layer. Then — and only then — you can deploy with confidence and explain every decision in an interview.”

— Project BUD Design Manifesto

bud

BUD

First Principles — What Even Is an AI Agent?

Chatbot vs. Agent — The Core Distinction

How LLMs Actually Work (The 60-Second Version)

What Are Embeddings?

Model Context Protocol (MCP)

Why Build From Scratch? (No LangChain, No Frameworks)

System Architecture — The Complete Blueprint

Project File Structure

The Agent Loop — Heart of BUD

The Core Code — agent/core.py

Intent Classification — Saving 85% on API Costs

MCP Servers & Tool System

GitHub MCP

Notion MCP

Slack Tools MCP

File Operations MCP

How MCP Discovery Works

The 5-Layer Memory System

Why File-Based Memory?

RAG Pipeline — Search by Meaning

The RAG Engine Code

Multi-Platform Integration

Slack

Discord

Microsoft Teams

Google Chat

Web UI

The Single-Process Architecture

Code Deep-Dive — Selected Components

Real-Time Token Streaming (WebSocket)

System Prompt Builder — Dynamic Context Assembly

Deploying BUD on a Mac Mini

Mac Mini M4 — BUD’s Recommended Home

Why Dedicated Hardware Over Cloud?

Step-by-Step Mac Mini Setup

Install Developer Tools

Clone & Configure BUD

Test Run

Auto-Start on Boot with launchd

Remote Access with Tailscale

Cost Analysis & Performance

Monthly Cost Comparison

How We Achieved 85% Reduction

Intent Classification

Two-Model Routing

Token Budgeting

Conditional RAG

Usage Tiers

Future Impacts & What We’re Building Next

Multi-Platform Agent Core

Local LLM Fallback (Ollama)

Computer Vision Pipeline

Docker Compose One-Click

HuggingFace Spaces Demo

Multi-Agent Orchestration

The Bigger Picture — Why This Matters

Development Timeline

Foundation — First Principles Learning

Core Build — Agent + MCP + RAG + Memory

Docker + HuggingFace + CI/CD

Local LLM + Vision + Multi-Agent

Plugin Marketplace + Community

How to Contribute

GitHub Repo

HuggingFace

Full Documentation

Technology Stack at a Glance

Leave a Reply Cancel reply

The Core Code — `agent/core.py`