How I built a customer support agent that remembers everything — using plain markdown files, BM25 search, and zero embeddings.
We've been building RAG systems the same way for two years. Chunk your docs. Embed them into vectors. Store in a vector database. At query time, compute similarity and retrieve the top-k chunks. Feed to LLM. Hope for the best.
It works. Until it doesn't. And when it stops working, you have no idea why.
I've built RAG pipelines with ChromaDB, Pinecone, and Weaviate. I've implemented HyDE, reranking, query rewriting, MMR, and hybrid search. Every optimisation added complexity and every deployment added new failure modes I couldn't debug. The core problem was never the retrieval algorithm — it was that the knowledge store was a black box.
cat an embedding vector and understand what your agent knows. You cannot open Pinecone in VS Code and read what it stored. When retrieval fails silently, you are left guessing.
Three real pain points that drove me to look for something different:
Then I came across the PageIndex approach — a paper-thin idea that turned my thinking upside down. Instead of embedding documents, what if an LLM simply read them, reasoned about them, and stored the knowledge in a human-readable hierarchy? No similarity search. No vectors. Just files.
ByteRover productionises exactly this idea. And after using it for a few weeks, I'm not going back.
The insight is embarrassingly simple. When you want to find information in a technical manual, you don't compute cosine similarity between your query and every page. You open the table of contents, navigate to the right chapter, then the right section.
ByteRover makes your AI agent do the same thing — but for any knowledge base you give it.
The critical difference: everything ByteRover stores is human-readable. You can open .brv/context-tree/ in your file explorer and read exactly what your agent knows. You can git diff what changed after a curation. You can delete a file if the knowledge is wrong. It's just files.
ByteRover is not a retrieval optimisation on top of vector search. It is a fundamentally different approach: LLM-curated, file-based knowledge that you can read, edit, version-control, and debug without any special tooling.
ByteRover has two main operations: Curation (writing to memory) and Query (reading from memory). Both run entirely on your local machine. No cloud required.
All knowledge lives in .brv/context-tree/ as a three-level hierarchy. Every node is a folder with a context.md describing its purpose. Every piece of knowledge is a plain markdown file.
When you call brv curate -d docs/, ByteRover's LLM runs in a sandbox with a ToolsSDK. It reads your documents, reasons about the content, and writes structured knowledge using five atomic operations:
The LLM gets feedback on every operation. If a MERGE fails, it sees the error and retries with a corrected approach.
This is where ByteRover really shines. Every query goes through a tiered strategy that starts with the cheapest possible path and escalates only when needed. Most queries resolve in under 200ms — without touching an LLM at all.
Compound score = (0.6 × BM25) + (0.25 × importance) + (0.15 × recency) × tierBoost
After every session, ByteRover automatically extracts durable knowledge from the conversation — patterns you used, decisions you made, preferences you expressed — and persists them as agent memories. Your agent gets smarter over time without any extra work.
Let's build a real thing. A customer support agent for ShopEase — a fictional Indian e-commerce platform. The agent will answer questions about orders, payments, shipping, and refunds using ByteRover as its memory layer and Groq (Llama 3.3 70B) for responses.
No vector database. No embeddings. The entire retrieval happens through brv query — a subprocess call that returns the right markdown context in under a second.
support-agent/ ├── docs/ │ ├── faq.md # ShopEase FAQ │ ├── shipping-policy.md # Shipping rules │ └── refund-policy.md # Refund rules ├── src/ │ ├── curate.js # Feeds docs → ByteRover │ ├── query.js # Queries ByteRover memory │ └── agent.js # Main chat loop ├── .env.example └── package.json
# Install ByteRover CLI (runs a local daemon) curl -fsSL https://www.byterover.dev/install.sh | sh # Verify it's running brv status # Install project dependencies npm install
This is the core of the integration. brv query runs as a subprocess against the local ByteRover daemon. It returns plain text — the retrieved context from your markdown knowledge tree. Twenty lines of code. No SDK. No API key.
import { exec } from "child_process"; import { promisify } from "util"; const execAsync = promisify(exec); /** * Queries ByteRover's local context tree. * ByteRover's 5-tier strategy handles routing: * Tier 0/1 → cache hit ~0ms * Tier 2 → BM25 direct ~200ms (no LLM) * Tier 3/4 → LLM reasoning <15s */ export async function queryMemory(question) { try { const sanitized = question.replace(/"/g, '\\"'); const { stdout, stderr } = await execAsync( `brv query "${sanitized}"`, { timeout: 30000 } // 30s covers Tier 4 agentic loop ); if (stderr && !stdout) { console.error("[ByteRover] Error:", stderr); return null; } return stdout.trim(); } catch (err) { console.error("[ByteRover] Query failed:", err.message); return null; } }
Run this once before starting the agent. ByteRover reads all files in docs/, reasons about the content, and organises everything into the context tree. After this, you can cat .brv/context-tree/payments/refunds/refund_timelines.md and read exactly what your agent knows.
import { exec } from "child_process"; import { promisify } from "util"; import path from "path"; import { fileURLToPath } from "url"; const execAsync = promisify(exec); const __dirname = path.dirname(fileURLToPath(import.meta.url)); const DOCS_DIR = path.resolve(__dirname, "../docs"); async function curateDocs() { console.log("🔄 Curating support docs into ByteRover memory..."); const { stdout } = await execAsync( // -d flag: curate entire directory at once `brv curate "ShopEase support docs: FAQs, shipping, refunds" -d ${DOCS_DIR}`, { timeout: 120000 } // 2 min — LLM reasoning over all files ); console.log(stdout); console.log("✅ Done. Run `npm run chat` to start the agent."); console.log(" Tip: run `brv vc status` to see what was stored."); } curateDocs().catch(console.error);
import Groq from "groq-sdk"; import readline from "readline"; import { queryMemory } from "./query.js"; import "dotenv/config"; const groq = new Groq({ apiKey: process.env.GROQ_API_KEY }); const MODEL = "llama-3.3-70b-versatile"; // Session memory — what this customer said today const conversationHistory = []; const SYSTEM_PROMPT = `You are a helpful customer support agent for ShopEase. Use ONLY the retrieved context to answer. Never invent policies. If context doesn't cover the question, direct to 1800-123-4567.`; async function chat(userMessage) { // Step 1 — Query ByteRover for relevant context process.stdout.write("\n🔍 Querying ByteRover memory..."); const context = await queryMemory(userMessage); process.stdout.write(" done\n\n"); if (!context) return "Knowledge base unavailable. Call 1800-123-4567."; // Step 2 — Inject context into the message const enrichedMessage = `RETRIEVED CONTEXT: --- ${context} --- CUSTOMER QUESTION: ${userMessage}`; conversationHistory.push({ role: "user", content: enrichedMessage }); // Step 3 — Call Groq with full conversation history const response = await groq.chat.completions.create({ model: MODEL, messages: [ { role: "system", content: SYSTEM_PROMPT }, ...conversationHistory, ], temperature: 0.3, // Low — factual support, not creative max_tokens: 512, }); const reply = response.choices[0].message.content; // Store reply in history (without injected context) conversationHistory.push({ role: "assistant", content: reply }); return reply; } async function main() { const rl = readline.createInterface({ input: process.stdin, output: process.stdout, }); console.log("ShopEase Support Agent (ByteRover + Groq Llama 3.3 70B)"); console.log('Type your question. Type "exit" to quit.\n'); const ask = () => { rl.question("You: ", async (input) => { const msg = input.trim(); if (!msg) return ask(); if (msg === "exit") { rl.close(); return; } const reply = await chat(msg); console.log(`Agent: ${reply}\n`); ask(); }); }; ask(); } main();
# 1. Curate docs into ByteRover memory (once) npm run curate # 2. Inspect what ByteRover stored (optional but satisfying) brv vc status brv query "refund policy UPI" # 3. Start the agent npm run chat
npm run curate, open .brv/context-tree/ in your file explorer. You'll see your support docs organised into readable markdown files by domain and topic — exactly what your agent knows, readable by a human, editable with any text editor.
ByteRover was evaluated on LongMemEval-S — an ICLR 2025 benchmark that buries 1-3 relevant sessions inside 48 distractor sessions per question, across a context tree of 23,867 documents. GPT-4o given the full conversation history scores 60.6%. ByteRover scores 92.8%.
The most interesting finding: Run 1 used Gemini Flash (cheapest). Run 2 used Gemini Pro (expensive). Run 1 won — 92.8% vs 92.2%. The architecture does the heavy lifting, not the model. This is the entire thesis of ByteRover: build the retrieval right and you don't need to throw expensive models at the problem.
ByteRover is not a drop-in replacement for all RAG use cases. Be honest about the fit.
I've been building RAG systems for two years. The assumption I carried throughout — that embeddings and vector similarity search are the only serious approach — turned out to be wrong.
ByteRover showed me a different model: curate knowledge once, query it intelligently forever. Files you can read. Structure you can understand. Memory that persists across sessions. A retrieval system that escalates through tiers rather than blindly computing similarity over everything.
The support agent we built in this post has three files of actual code. The memory system is handled by ByteRover. The response generation is handled by Groq. The only thing you write is the glue — and the glue is twenty lines.
Try it yourself. Install ByteRover, point it at any folder of docs, run brv query, and see what comes back. The context tree it builds will tell you everything.
# Install ByteRover curl -fsSL https://www.byterover.dev/install.sh | sh # Clone this article's code git clone https://github.com/altafshaikh/byterover-support-agent # Curate your own docs brv curate -d ./your-docs/