Day 0: Project Setup & My Coding Agent Workflow

Building an open-source knowledge management system for Hugo and Stefan's LLM course students - documenting my setup and workflow with coding agents.


Building an open-source knowledge management system for Hugo and Stefan's LLM course students - documenting my setup and workflow with coding agents.

šŸ“ŗ Video:

| šŸ“‚ Repository: buildwai_kms


What Am I Building and Why?

Hey! So I'm building an AI-powered knowledge management system (or "ragbot") for students in Hugo and Stefan's course. I love this course - taking it for the third time actually, and helping the teachers this time.

The Problem: We already have pipelines that take workshop transcripts and create wikis for students to recap the workshops. But I wanted to do something built in public, something with an open-source model.

My Two Motivations:

  1. For the course: Create a knowledge management system for students using workshop transcripts
  2. For my startup: I'm building a local knowledge management system and currently just using major vendor APIs (Claude/GPT). I want to integrate open-source models.

The Credits Situation: As part of the course, we get Modal credits. I've taken so many courses I have $2,500+ in Modal credits that I've never used. I've never actually deployed an open-source model, so this is my excuse to finally do it!

My Beginner's Disclaimer

Important: I don't have a software engineering background. If you're a software engineer watching this, it might be a bit rough. I'm a beginner leaning heavily on coding agents, trying to do things properly. That's part of why I'm building in public - it forces me to understand what I'm doing instead of producing "slob" (which I've done a lot of).

If you're comfortable with your coding environment and don't like relying on coding agents, you might want to skip ahead to Day 1.

My Coding Agent Setup (The Important Part!)

This changes every time I start a new project, but here's my current setup that might be helpful:

The Three-Agent Approach

I use three different agents pointed at the same repo:

  1. Claude Code in Cursor (two-pane setup)
  2. Codex (also in Cursor)
  3. Droid (Factory.ai's terminal bench) in a separate window

Why Droid is tricky: Unlike Codex and Claude that work within the repo, I have to sit Droid outside the repository and point it in, otherwise it gobbles up too much context and gets screwed. (If anyone has tips for this, please let me know!)

Why three agents? Even if I used three instances of Claude Code, having fresh eyes is helpful. Different models are better at different things, and having to explain something three different ways helps me understand it better.

My Documentation System

The Context Document: Similar to a CLAUDE.md, but since I'm using three different agents, it's the most compressed representation of my thinking. With coding agents, I always end up saying "more simple, more simple, don't understand, make it more simple" - so we get down to this condensed context document.

Agent Kickstart: This is what I copy-paste into each agent before starting work:

  • Read the context document
  • Read the project overview
  • Create a new Git branch (never work on main)
  • Always create a PR
  • Confirm with me before starting work

Project Structure I Created

BUILDWAI_KMS/
ā”œā”€ā”€ content/           # Daily progress logs (this file!)
ā”œā”€ā”€ docs/             # Project documentation
│   ā”œā”€ā”€ agent_kickstart.md    # What I paste to agents
│   ā”œā”€ā”€ project-overview.md   # The living plan
│   └── dev/                  # PR templates
└── context.md        # High-level compressed goals

The Technical Plan (5-Step Journey to 7-Step)

Here's what we figured out after many hours with Claude, Codex, and Droid. Started with 14 steps, whittled down to 5 (now expanded to 7):

The Model Decision: Why Qwen3-4B?

Original idea: I wanted to try a different model, maybe one of the Zhipu models Final decision: Qwen3-4B-Instruct-2507 (the "No Think" model)

Why Qwen3-4B?

  • Fits on the smaller GPU for Modal (A10G with 24GB VRAM)
  • That's what Ben used in his demonstration (so I can annoy him with questions!)
  • The models coming out of China are so good right now
  • 4B parameters, not 3B - perfect size for learning

The Ben Connection

There's this guy Ben who gave a guest lecture from Modal - he's fantastic. He just built and shared an open-source RAG bot that's voice-enabled. Voice might be a bit ambitious for us initially, but maybe we'll get there.

I want to copy his architecture: He used Qwen + ChromaDB. ChromaDB is also a sponsor of the course, so it's perfect to use ChromaDB as the backend knowledge base for the workshops and embeddings.

The Plan: Copy that system without voice at first, then build a simple chat interface. I have a bit of experience with Next.js, so that's probably the route.

My Planning Process

I like to spend more time planning because I'm not great with software engineering. I need to go back and forth many times to make sure I understand each step.

Bonus step: I put the plan into Grok voice mode and go for a walk. Grok's voice mode is really good for hashing out plans!

The Process:

  1. Each step in the overview becomes a PR
  2. Each PR hopefully becomes a day in the project
  3. I drop the kickstart message to an agent
  4. Agent reads context, creates branch, we hash out the plan
  5. I make sure I understand everything top to bottom

The Simple 7-Step Plan

The Goal: Go from zero to a working knowledge management system that course students can actually use.

Step 1: Learn Modal + Download Qwen Weights

  • Download Qwen3-4B weights from Hugging Face to my local computer
  • Upload to Modal volume (I want to understand how Modal works)
  • Why manually? Educational value over convenience - I want to see the weight files!

Step 2: Deploy vLLM Service

  • Get vLLM running with the uploaded model on A10G GPU
  • Test it works with curl commands
  • Goal: Basic inference endpoint working

Step 3: Set Up ChromaDB Backend

  • Copy Ben's ChromaDB + embeddings setup
  • Use all-minilm-l6-v2 for embeddings
  • Ingest workshop transcripts and test search

Step 4: Build Simple Chat Interface

  • Create Next.js frontend (what I know)
  • Route between Modal RAG system OR Claude/GPT APIs
  • Start simple: Just text chat, no voice yet

Step 5: Add Evaluation System

  • Integrate Pydantic Logfire for tracking performance
  • Set up dashboards to compare RAG vs API responses
  • Question: Is self-hosted worth the complexity?

Step 6: Add Voice (The Ambitious Part)

  • Integrate Pipecat framework like Ben's setup
  • Voice-to-voice interaction with <1 second latency
  • Maybe we'll get there!

Step 7: Scale Content Sources

  • Add Discord Q&A from course community
  • Jupyter notebooks, more course materials
  • Automated refresh pipeline

My "Dumb" Questions (The Learning Part)

I've shared these embarrassing beginner questions in the repo because they help me understand:

What is Hugging Face? I couldn't remember what Hugging Face was, but I'd never actually downloaded weights before.

How does Modal work? Do they own the GPUs? Spoiler: They don't own the GPUs. They're also renting GPUs (from AWS/etc.) and making it simple for us.

Is Qwen3 a reasoning model? No, it's not a reasoning model. (I was confusing things)

Can I use tools with it? And other basic questions that help build understanding.

Yeah, these are really basic questions that are a little embarrassing to share, but that's what you've got to do when you're learning.

What's Next?

Day 1: I'm going to actually download the Qwen weights from Hugging Face and look at them (always wanted to do that!), then upload to Modal volume and get the model running.

Cost Reality Check: It's super quick and actually quite cheap. Cost me a couple bucks... actually cost me $12 because I screwed up the initial setup (which I'll share in Day 1).

For Course Students: If anybody wants to follow along, you're more than welcome to! I'm open-sourcing everything.

Resources


This was raw and unrehersed (as you probably noticed), but hopefully helpful for understanding my beginner's approach to building with AI coding agents. See you in Day 1!