Your AI Agent Has Amnesia. ByteRover Fixes It

01 — The Problem

RAG promised AI memory. It lied.

We've been building RAG systems the same way for two years. Chunk your docs. Embed them into vectors. Store in a vector database. At query time, compute similarity and retrieve the top-k chunks. Feed to LLM. Hope for the best.

It works. Until it doesn't. And when it stops working, you have no idea why.

I've built RAG pipelines with ChromaDB, Pinecone, and Weaviate. I've implemented HyDE, reranking, query rewriting, MMR, and hybrid search. Every optimisation added complexity and every deployment added new failure modes I couldn't debug. The core problem was never the retrieval algorithm — it was that the knowledge store was a black box.

You cannot cat an embedding vector and understand what your agent knows. You cannot open Pinecone in VS Code and read what it stored. When retrieval fails silently, you are left guessing.

Three real pain points that drove me to look for something different:

Traditional RAG Pain Points

① Similarity Search Degrades Silently

As your knowledge base grows, cosine similarity loses precision. A query about "refund policy" starts returning chunks about "return window" and "shipping" because everything is semantically close. You don't get an error — you get subtly wrong answers.

② Your Agent Forgets Between Sessions

Traditional RAG has no persistent agent memory. Every session starts cold. The agent doesn't remember what it learned last time, what conventions it figured out, or what the customer told it yesterday.

③ The Knowledge Store is a Black Box

Vector databases store opaque numerical representations. When retrieval fails you cannot inspect what was stored, how it was chunked, or why a query matched the wrong document. Debugging means adding more logs, not reading the data.

Then I came across the PageIndex approach — a paper-thin idea that turned my thinking upside down. Instead of embedding documents, what if an LLM simply read them, reasoned about them, and stored the knowledge in a human-readable hierarchy? No similarity search. No vectors. Just files.

ByteRover productionises exactly this idea. And after using it for a few weeks, I'm not going back.

02 — The Idea

No embeddings. No similarity search. Just files.

The insight is embarrassingly simple. When you want to find information in a technical manual, you don't compute cosine similarity between your query and every page. You open the table of contents, navigate to the right chapter, then the right section.

ByteRover makes your AI agent do the same thing — but for any knowledge base you give it.

Vector RAG vs ByteRover — Side by Side

Traditional RAG

📄 Raw Document

✂️ Chunk into 512 tokens

🔢 Compute Embeddings

🗄️ Store in Vector DB

🔍 Query arrives

📐 Cosine Similarity (⚠️ black box)

📦 Top-k chunks (maybe wrong)

🤖 LLM generates answer

ByteRover

📄 Raw Document

🧠 LLM reads & reasons

🗂️ Organises into hierarchy

📁 Stored as .md files

🔍 Query arrives

⚡ BM25 + Cache (transparent)

📄 Right markdown file

🤖 LLM generates answer

The critical difference: everything ByteRover stores is human-readable. You can open .brv/context-tree/ in your file explorer and read exactly what your agent knows. You can git diff what changed after a curation. You can delete a file if the knowledge is wrong. It's just files.

Key Insight

ByteRover is not a retrieval optimisation on top of vector search. It is a fundamentally different approach: LLM-curated, file-based knowledge that you can read, edit, version-control, and debug without any special tooling.

03 — How It Works

The architecture, explained simply

ByteRover has two main operations: Curation (writing to memory) and Query (reading from memory). Both run entirely on your local machine. No cloud required.

The Context Tree — Local File Structure

All knowledge lives in .brv/context-tree/ as a three-level hierarchy. Every node is a folder with a context.md describing its purpose. Every piece of knowledge is a plain markdown file.

Context Tree Structure

.brv/context-tree/ ├── orders/ # Domain │ ├── context.md # auto-generated overview │ ├── _index.md # auto-generated summary │ ├── cancellation/ # Topic │ │ ├── context.md │ │ └── order_cancellation_policy.md │ └── wrong-items/ │ └── wrong_item_resolution.md ├── payments/ │ ├── context.md │ ├── payment-methods/ │ │ └── supported_payment_methods.md │ └── refunds/ │ ├── refund_timelines.md │ └── cod_refund_policy.md ├── shipping/ │ ├── context.md │ └── delivery-timelines/ │ └── city_delivery_windows.md └── returns/ ├── context.md └── return-window/ └── return_eligibility.md

Curation — Writing to Memory

When you call brv curate -d docs/, ByteRover's LLM runs in a sandbox with a ToolsSDK. It reads your documents, reasons about the content, and writes structured knowledge using five atomic operations:

Curation Operations

ADD

Create new entry

UPDATE

Modify existing

UPSERT

Add or update

MERGE

Combine entries

DELETE

Remove entry

The LLM gets feedback on every operation. If a MERGE fails, it sees the error and retries with a corrected approach.

Query — The 5-Tier Retrieval Pipeline

This is where ByteRover really shines. Every query goes through a tiered strategy that starts with the cheapest possible path and escalates only when needed. Most queries resolve in under 200ms — without touching an LLM at all.

ByteRover 5-Tier Query Strategy

TIER 0

Exact Cache

MD5 fingerprint match — same query, cached result (60s TTL)

~0ms

TIER 1

Fuzzy Cache

≥60% token similarity to a recent query

~50ms

TIER 2

BM25 Direct

Full-text search — if top result scores high, return immediately. No LLM.

~200ms

TIER 3

LLM Pre-fetch

BM25 results injected as context for a single LLM synthesis call

<5s

TIER 4

Agentic Loop

Full multi-step reasoning: reads files, follows relations, iterates

8–15s

Compound score = (0.6 × BM25) + (0.25 × importance) + (0.15 × recency) × tierBoost

Session Learning — Memory That Grows

After every session, ByteRover automatically extracts durable knowledge from the conversation — patterns you used, decisions you made, preferences you expressed — and persists them as agent memories. Your agent gets smarter over time without any extra work.

Two Layers of Memory

ByteRover Memory

What ShopEase policies say. Persists forever across all sessions.

Session History

What this customer said today. Lives in the process, resets between sessions.

04 — Building the Agent

A customer support agent with ByteRover memory

Let's build a real thing. A customer support agent for ShopEase — a fictional Indian e-commerce platform. The agent will answer questions about orders, payments, shipping, and refunds using ByteRover as its memory layer and Groq (Llama 3.3 70B) for responses.

No vector database. No embeddings. The entire retrieval happens through brv query — a subprocess call that returns the right markdown context in under a second.

Agent Architecture

👤

Customer Question

"How long does a UPI refund take?"

spawn subprocess

🧠

ByteRover Memory

brv query → 5-tier retrieval → returns markdown context

context + history

⚡

Groq · Llama 3.3 70B

system prompt + retrieved context + conversation history

grounded answer

💬

Support Response

"UPI refunds take 2–3 business days…"

Project Structure

project structure

support-agent/
├── docs/
│   ├── faq.md                # ShopEase FAQ
│   ├── shipping-policy.md    # Shipping rules
│   └── refund-policy.md      # Refund rules
├── src/
│   ├── curate.js             # Feeds docs → ByteRover
│   ├── query.js              # Queries ByteRover memory
│   └── agent.js              # Main chat loop
├── .env.example
└── package.json

Step 1 — Install ByteRover

terminal

# Install ByteRover CLI (runs a local daemon)
curl -fsSL https://www.byterover.dev/install.sh | sh

# Verify it's running
brv status

# Install project dependencies
npm install

Step 2 — The Query Helper

This is the core of the integration. brv query runs as a subprocess against the local ByteRover daemon. It returns plain text — the retrieved context from your markdown knowledge tree. Twenty lines of code. No SDK. No API key.

src/query.js

import { exec } from "child_process";
import { promisify } from "util";

const execAsync = promisify(exec);

/**
 * Queries ByteRover's local context tree.
 * ByteRover's 5-tier strategy handles routing:
 *   Tier 0/1 → cache hit     ~0ms
 *   Tier 2   → BM25 direct   ~200ms  (no LLM)
 *   Tier 3/4 → LLM reasoning <15s
 */
export async function queryMemory(question) {
  try {
    const sanitized = question.replace(/"/g, '\\"');
    const { stdout, stderr } = await execAsync(
      `brv query "${sanitized}"`,
      { timeout: 30000 } // 30s covers Tier 4 agentic loop
    );

    if (stderr && !stdout) {
      console.error("[ByteRover] Error:", stderr);
      return null;
    }

    return stdout.trim();
  } catch (err) {
    console.error("[ByteRover] Query failed:", err.message);
    return null;
  }
}

Step 3 — Curate Your Docs

Run this once before starting the agent. ByteRover reads all files in docs/, reasons about the content, and organises everything into the context tree. After this, you can cat .brv/context-tree/payments/refunds/refund_timelines.md and read exactly what your agent knows.

src/curate.js

import { exec } from "child_process";
import { promisify } from "util";
import path from "path";
import { fileURLToPath } from "url";

const execAsync = promisify(exec);
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const DOCS_DIR = path.resolve(__dirname, "../docs");

async function curateDocs() {
  console.log("🔄 Curating support docs into ByteRover memory...");

  const { stdout } = await execAsync(
    // -d flag: curate entire directory at once
    `brv curate "ShopEase support docs: FAQs, shipping, refunds" -d ${DOCS_DIR}`,
    { timeout: 120000 } // 2 min — LLM reasoning over all files
  );

  console.log(stdout);
  console.log("✅ Done. Run `npm run chat` to start the agent.");
  console.log("   Tip: run `brv vc status` to see what was stored.");
}

curateDocs().catch(console.error);

Step 4 — The Main Agent

src/agent.js

import Groq from "groq-sdk";
import readline from "readline";
import { queryMemory } from "./query.js";
import "dotenv/config";

const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const MODEL = "llama-3.3-70b-versatile";

// Session memory — what this customer said today
const conversationHistory = [];

const SYSTEM_PROMPT = `You are a helpful customer support agent for ShopEase.
Use ONLY the retrieved context to answer. Never invent policies.
If context doesn't cover the question, direct to 1800-123-4567.`;

async function chat(userMessage) {
  // Step 1 — Query ByteRover for relevant context
  process.stdout.write("\n🔍 Querying ByteRover memory...");
  const context = await queryMemory(userMessage);
  process.stdout.write(" done\n\n");

  if (!context) return "Knowledge base unavailable. Call 1800-123-4567.";

  // Step 2 — Inject context into the message
  const enrichedMessage = `RETRIEVED CONTEXT:
---
${context}
---
CUSTOMER QUESTION: ${userMessage}`;

  conversationHistory.push({ role: "user", content: enrichedMessage });

  // Step 3 — Call Groq with full conversation history
  const response = await groq.chat.completions.create({
    model: MODEL,
    messages: [
      { role: "system", content: SYSTEM_PROMPT },
      ...conversationHistory,
    ],
    temperature: 0.3, // Low — factual support, not creative
    max_tokens: 512,
  });

  const reply = response.choices[0].message.content;

  // Store reply in history (without injected context)
  conversationHistory.push({ role: "assistant", content: reply });
  return reply;
}

async function main() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  console.log("ShopEase Support Agent (ByteRover + Groq Llama 3.3 70B)");
  console.log('Type your question. Type "exit" to quit.\n');

  const ask = () => {
    rl.question("You: ", async (input) => {
      const msg = input.trim();
      if (!msg) return ask();
      if (msg === "exit") { rl.close(); return; }

      const reply = await chat(msg);
      console.log(`Agent: ${reply}\n`);
      ask();
    });
  };
  ask();
}

main();

Running It

terminal

# 1. Curate docs into ByteRover memory (once)
npm run curate

# 2. Inspect what ByteRover stored (optional but satisfying)
brv vc status
brv query "refund policy UPI"

# 3. Start the agent
npm run chat

The wow moment: After running npm run curate, open .brv/context-tree/ in your file explorer. You'll see your support docs organised into readable markdown files by domain and topic — exactly what your agent knows, readable by a human, editable with any text editor.

07 — Wrapping Up

The model is interchangeable. The memory architecture isn't.

I've been building RAG systems for two years. The assumption I carried throughout — that embeddings and vector similarity search are the only serious approach — turned out to be wrong.

ByteRover showed me a different model: curate knowledge once, query it intelligently forever. Files you can read. Structure you can understand. Memory that persists across sessions. A retrieval system that escalates through tiers rather than blindly computing similarity over everything.

The support agent we built in this post has three files of actual code. The memory system is handled by ByteRover. The response generation is handled by Groq. The only thing you write is the glue — and the glue is twenty lines.

The bottom line: When your agent makes a mistake, you shouldn't have to guess why. You should be able to open a markdown file and read what it knew. ByteRover makes that possible.

Try it yourself. Install ByteRover, point it at any folder of docs, run brv query, and see what comes back. The context tree it builds will tell you everything.

get started

# Install ByteRover
curl -fsSL https://www.byterover.dev/install.sh | sh

# Clone this article's code
git clone https://github.com/altafshaikh/byterover-support-agent

# Curate your own docs
brv curate -d ./your-docs/

Your AI Agent
Has Amnesia.
ByteRover Fixes It
Without a Vector Database

RAG promised AI memory. It lied.

No embeddings. No similarity search. Just files.

The architecture, explained simply

The Context Tree — Local File Structure

Curation — Writing to Memory

Query — The 5-Tier Retrieval Pipeline

Session Learning — Memory That Grows

A customer support agent with ByteRover memory

Project Structure

Step 1 — Install ByteRover

Step 2 — The Query Helper

Step 3 — Curate Your Docs

Step 4 — The Main Agent

Running It

92.8% accuracy. 1.6s latency. Flash model beats Pro.

When to use it. When not to.

✓ Use ByteRover when

✗ Skip it when

The model is interchangeable. The memory architecture isn't.

Your AI Agent Has Amnesia. ByteRover Fixes It Without a Vector Database

RAG promised AI memory. It lied.

No embeddings. No similarity search. Just files.

The architecture, explained simply

The Context Tree — Local File Structure

Curation — Writing to Memory

Query — The 5-Tier Retrieval Pipeline

Session Learning — Memory That Grows

A customer support agent with ByteRover memory

Project Structure

Step 1 — Install ByteRover

Step 2 — The Query Helper

Step 3 — Curate Your Docs

Step 4 — The Main Agent

Running It

92.8% accuracy. 1.6s latency. Flash model beats Pro.

When to use it. When not to.

✓ Use ByteRover when

✗ Skip it when

The model is interchangeable. The memory architecture isn't.

Your AI Agent
Has Amnesia.
ByteRover Fixes It
Without a Vector Database