The $7/Month Bot That Takes Our Course, Finds the Bugs, and Fixes Them Itself

by Alien Brain Trust AI Learning
The $7/Month Bot That Takes Our Course, Finds the Bugs, and Fixes Them Itself

The $7/Month Bot That Takes Our Course, Finds the Bugs, and Fixes Them Itself

Meta Description: An autonomous AI bot simulates a student taking our security course daily, finds issues, and creates PRs to fix them — all for $7/month on AWS.

We have a bot named J Calone. Every morning at 10:00 UTC, it enrolls in our Secure Prompt Builder course, clones the student repo, reads every module, runs the test scripts, validates the skills, and sends us a report card. When something fails, we type /fix in Telegram and the bot writes the code, commits it, and opens a pull request. Then we type /approve 2 and it merges.

Total infrastructure cost: $7/month.

What Problem We Were Solving

We’re building a free course that teaches prompt security through hands-on exercises. The course has 5 modules, 23 prompt templates, a jailbreak test suite, and 6 Claude Code skills. That’s a lot of surface area for bugs.

The classic approach: manually walk through the course after every change and hope you catch the broken link, the missing file, the truncated content. We were doing this. It wasn’t working. We’d fix Module 3’s test scripts and accidentally break Module 1’s onboarding path.

We needed a student who takes the course every single day, never gets bored, and tells us exactly what’s broken.

How It Works: 6 Phases of Simulated Enrollment

The bot runs a 678-line bash script that simulates the complete student journey:

PhaseWhat It TestsExample Catches
1. EnrollmentHits the Cloudflare Worker endpoint, validates form responseEnrollment URL returning 404 after deploy
2. GitHub AccessAccepts invite, clones private repo, verifies structurePAT permissions wrong after rotation
3. Email SequenceChecks that files referenced in onboarding emails existFAQ.md missing (referenced but never created)
4. Module ContentAI evaluation of clarity, completeness, actionabilityContent truncated at 3KB, jargon undefined
5. Test ScriptsRuns npm install and test-runner.js from Module 3Package.json dependency mismatch
6. Skills ValidationChecks all 6 Claude Code skills have correct structurePath prefix wrong for student repo layout

Phase 4 is where it gets interesting. The bot uses Claude Haiku to evaluate each module on a 1-10 scale across three dimensions: Is the content clear? Is it complete? Can a student actually do the exercises? That’s not a file-exists check. That’s an AI reading comprehension test on our courseware, every day.

The Infrastructure: Absurdly Cheap

AWS t4g.micro (ARM64)     $5-7/month
EBS 10GB gp3              $1.60/month
Claude Haiku API           $0.30-1.50/month
Data transfer              ~$0
─────────────────────────────────────
Total                      ~$7-9/month

The instance runs in a VPC with SSM access (no SSH, no open ports). Secrets are in SSM Parameter Store encrypted with KMS. The bot’s GitHub PAT is a classic token scoped to the student repo only. Terraform manages the whole stack.

We evaluated Ollama for local inference but Claude Haiku at $0.25/million input tokens made the math simple. The API cost for daily evaluation of 5 modules is less than a dollar a month.

The Autonomous Agent: /fix

This is the part that changed everything.

The original bot was a reporter. It ran tests, generated a markdown report, and we read it. Useful, but we still had to manually fix every issue. For a solo operation, that meant context-switching from whatever we were working on to go fix a missing heading in Module 4.

Now the bot has an autonomous agent with 8 sandboxed tools:

  • read_file / write_file / edit_file — direct file operations
  • list_files / search_files — navigation
  • run_shell — allowlisted commands only (git, ls, grep, find, cat, diff, tree)
  • git_ops — branch, commit, push (locked to bot/ prefix)
  • create_pull_request — GitHub API

The workflow is three Telegram messages:

You:  /fix
Bot:  [15-25 turns of reading, editing, committing]
Bot:  PR #3 created: https://github.com/base-bit/secure-prompt-vault/pull/3

You:  /approve 3
Bot:  PR #3 merged successfully

That’s it. The bot reads its own test report, identifies the failures, reads the problematic files, writes fixes, creates a branch (bot/fix-2026-02-15), pushes, and opens a PR with a description of what it changed and why.

7 Safety Layers (Because Autonomous Agents Need Guardrails)

An AI agent with write access to your production repo needs constraints:

  1. Filesystem isolation — Path resolution locked to the student repo directory
  2. Command allowlist — Only git, ls, find, grep, cat, head, tail, wc, diff, tree
  3. Git remote validation — Origin must contain “secure-prompt-vault”
  4. Account scope — PAT only has access to the student repo
  5. Branch naming — Must start with bot/ (can’t push to main)
  6. Turn limit — Max 40 agent turns per session (prevents loops)
  7. Human approval gate — PRs require explicit /approve to merge

The bot can’t delete branches, force push, modify CI/CD, or touch any repo except the student-facing one. It operates on a principle we borrowed from enterprise security: least privilege, blast radius containment, human-in-the-loop for irreversible actions.

What the Bot Has Actually Fixed

Real fixes driven by bot reports and the /fix agent:

The 3KB eval window bug. The bot’s content evaluation was reading only 3KB of each module file. Every module was getting truncated mid-sentence, and the AI evaluator was correctly noting “content appears incomplete.” We increased the window to 20KB and scores jumped from 5/10 to 7/10 across all modules in one commit.

The missing FAQ.md. Our onboarding email sequence referenced a FAQ file. The file didn’t exist. A human tester might miss this because they’d skip the email and go straight to Module 1. The bot follows the exact path a new student would, so it caught it immediately.

Module 3 Getting Started guide. The test suite module assumed students knew how to set up Node.js and install dependencies. The bot’s clarity score for Module 3 was consistently low. The fix added a step-by-step getting started section. Score: 5/10 to 7/10.

Skills path prefix. Our Claude Code skills referenced files with 01-Course Content/Secure-Prompt-Vault/ prefixed paths. That’s the private repo structure. Students clone a repo where modules are at the root. Every skill was pointing to non-existent paths. The bot caught this because it actually clones the student repo and validates the paths.

Security outcome validation. The test suite was checking that prompts existed but not that they worked. The bot now runs actual jailbreak tests and validates that secure prompts produce low risk scores. This caught a regression where a hardened prompt was accidentally overwritten with its insecure predecessor.

The Score Trajectory

Module scores over time (10-point scale, AI-evaluated):

ModuleDec 2025Jan 2026Feb 2026
1 - Foundations4/105/107/10
2 - Prompt Library5/106/107/10
3 - Testing4/105/107/10
4 - Case Studies3/104/107/10
5 - Team Audit3/104/107/10

Every point of improvement maps to a specific commit. The bot doesn’t give vague feedback. It says “Module 4 references ‘prompt injection’ without defining it in the glossary” and we fix it.

Why This Matters Beyond Our Course

The pattern here isn’t specific to courseware. It’s:

  1. Automated user simulation — A bot that follows the same path as your actual users
  2. AI-powered evaluation — Not just “does this file exist” but “does this content make sense”
  3. Autonomous remediation — The same AI that finds the problem can fix it
  4. Human approval gate — Trust but verify, one Telegram message to merge or reject

This works for documentation, onboarding flows, API getting-started guides, developer experience — anywhere the gap between “what you ship” and “what users experience” matters.

The total investment was about 2 weeks of setup (Terraform, scripts, Telegram integration, safety layers) and $7/month ongoing. The bot has been running daily for 2 months. It’s found and fixed more issues than our manual QA caught in the preceding 3 months.

The Commands

For the curious, here’s the full Telegram interface:

/run       — Trigger immediate test (bypasses cron)
/report    — Latest test report summary
/scores    — Module evaluation scores
/status    — Instance health (uptime, disk, memory)
/fix       — Autonomous agent: read report, fix issues, create PR
/fix <msg> — Fix specific issue (skip report parsing)
/prs       — List open bot PRs
/approve N — Merge PR #N
/reject N  — Close PR #N

Plus free-form chat — you can ask the bot questions about the course and it answers using Claude with the course content as context.

What’s Next

The bot currently evaluates content quality but doesn’t complete exercises. Next phase: the bot actually does the Module 1 checklist, spots the vulnerability in Module 4’s case studies, and fills out the Module 5 audit template. Full student simulation, not just content review.

We’re also looking at running a fresh enrollment test monthly with a brand new GitHub account — simulating a true first-time student with zero context.


The entire bot infrastructure is in 02-Infrastructure/ of our private repo. Terraform, bash scripts, Telegram bot, all of it. The course it tests is free at base-bit/secure-prompt-vault.