The $7/Month Bot That Takes Our Course, Finds the Bugs, and Fixes Them Itself
The $7/Month Bot That Takes Our Course, Finds the Bugs, and Fixes Them Itself
Meta Description: An autonomous AI bot simulates a student taking our security course daily, finds issues, and creates PRs to fix them — all for $7/month on AWS.
We have a bot named J Calone. Every morning at 10:00 UTC, it enrolls in our Secure Prompt Builder course, clones the student repo, reads every module, runs the test scripts, validates the skills, and sends us a report card. When something fails, we type /fix in Telegram and the bot writes the code, commits it, and opens a pull request. Then we type /approve 2 and it merges.
Total infrastructure cost: $7/month.
What Problem We Were Solving
We’re building a free course that teaches prompt security through hands-on exercises. The course has 5 modules, 23 prompt templates, a jailbreak test suite, and 6 Claude Code skills. That’s a lot of surface area for bugs.
The classic approach: manually walk through the course after every change and hope you catch the broken link, the missing file, the truncated content. We were doing this. It wasn’t working. We’d fix Module 3’s test scripts and accidentally break Module 1’s onboarding path.
We needed a student who takes the course every single day, never gets bored, and tells us exactly what’s broken.
How It Works: 6 Phases of Simulated Enrollment
The bot runs a 678-line bash script that simulates the complete student journey:
| Phase | What It Tests | Example Catches |
|---|---|---|
| 1. Enrollment | Hits the Cloudflare Worker endpoint, validates form response | Enrollment URL returning 404 after deploy |
| 2. GitHub Access | Accepts invite, clones private repo, verifies structure | PAT permissions wrong after rotation |
| 3. Email Sequence | Checks that files referenced in onboarding emails exist | FAQ.md missing (referenced but never created) |
| 4. Module Content | AI evaluation of clarity, completeness, actionability | Content truncated at 3KB, jargon undefined |
| 5. Test Scripts | Runs npm install and test-runner.js from Module 3 | Package.json dependency mismatch |
| 6. Skills Validation | Checks all 6 Claude Code skills have correct structure | Path prefix wrong for student repo layout |
Phase 4 is where it gets interesting. The bot uses Claude Haiku to evaluate each module on a 1-10 scale across three dimensions: Is the content clear? Is it complete? Can a student actually do the exercises? That’s not a file-exists check. That’s an AI reading comprehension test on our courseware, every day.
The Infrastructure: Absurdly Cheap
AWS t4g.micro (ARM64) $5-7/month
EBS 10GB gp3 $1.60/month
Claude Haiku API $0.30-1.50/month
Data transfer ~$0
─────────────────────────────────────
Total ~$7-9/month
The instance runs in a VPC with SSM access (no SSH, no open ports). Secrets are in SSM Parameter Store encrypted with KMS. The bot’s GitHub PAT is a classic token scoped to the student repo only. Terraform manages the whole stack.
We evaluated Ollama for local inference but Claude Haiku at $0.25/million input tokens made the math simple. The API cost for daily evaluation of 5 modules is less than a dollar a month.
The Autonomous Agent: /fix
This is the part that changed everything.
The original bot was a reporter. It ran tests, generated a markdown report, and we read it. Useful, but we still had to manually fix every issue. For a solo operation, that meant context-switching from whatever we were working on to go fix a missing heading in Module 4.
Now the bot has an autonomous agent with 8 sandboxed tools:
read_file/write_file/edit_file— direct file operationslist_files/search_files— navigationrun_shell— allowlisted commands only (git, ls, grep, find, cat, diff, tree)git_ops— branch, commit, push (locked tobot/prefix)create_pull_request— GitHub API
The workflow is three Telegram messages:
You: /fix
Bot: [15-25 turns of reading, editing, committing]
Bot: PR #3 created: https://github.com/base-bit/secure-prompt-vault/pull/3
You: /approve 3
Bot: PR #3 merged successfully
That’s it. The bot reads its own test report, identifies the failures, reads the problematic files, writes fixes, creates a branch (bot/fix-2026-02-15), pushes, and opens a PR with a description of what it changed and why.
7 Safety Layers (Because Autonomous Agents Need Guardrails)
An AI agent with write access to your production repo needs constraints:
- Filesystem isolation — Path resolution locked to the student repo directory
- Command allowlist — Only git, ls, find, grep, cat, head, tail, wc, diff, tree
- Git remote validation — Origin must contain “secure-prompt-vault”
- Account scope — PAT only has access to the student repo
- Branch naming — Must start with
bot/(can’t push to main) - Turn limit — Max 40 agent turns per session (prevents loops)
- Human approval gate — PRs require explicit
/approveto merge
The bot can’t delete branches, force push, modify CI/CD, or touch any repo except the student-facing one. It operates on a principle we borrowed from enterprise security: least privilege, blast radius containment, human-in-the-loop for irreversible actions.
What the Bot Has Actually Fixed
Real fixes driven by bot reports and the /fix agent:
The 3KB eval window bug. The bot’s content evaluation was reading only 3KB of each module file. Every module was getting truncated mid-sentence, and the AI evaluator was correctly noting “content appears incomplete.” We increased the window to 20KB and scores jumped from 5/10 to 7/10 across all modules in one commit.
The missing FAQ.md. Our onboarding email sequence referenced a FAQ file. The file didn’t exist. A human tester might miss this because they’d skip the email and go straight to Module 1. The bot follows the exact path a new student would, so it caught it immediately.
Module 3 Getting Started guide. The test suite module assumed students knew how to set up Node.js and install dependencies. The bot’s clarity score for Module 3 was consistently low. The fix added a step-by-step getting started section. Score: 5/10 to 7/10.
Skills path prefix. Our Claude Code skills referenced files with 01-Course Content/Secure-Prompt-Vault/ prefixed paths. That’s the private repo structure. Students clone a repo where modules are at the root. Every skill was pointing to non-existent paths. The bot caught this because it actually clones the student repo and validates the paths.
Security outcome validation. The test suite was checking that prompts existed but not that they worked. The bot now runs actual jailbreak tests and validates that secure prompts produce low risk scores. This caught a regression where a hardened prompt was accidentally overwritten with its insecure predecessor.
The Score Trajectory
Module scores over time (10-point scale, AI-evaluated):
| Module | Dec 2025 | Jan 2026 | Feb 2026 |
|---|---|---|---|
| 1 - Foundations | 4/10 | 5/10 | 7/10 |
| 2 - Prompt Library | 5/10 | 6/10 | 7/10 |
| 3 - Testing | 4/10 | 5/10 | 7/10 |
| 4 - Case Studies | 3/10 | 4/10 | 7/10 |
| 5 - Team Audit | 3/10 | 4/10 | 7/10 |
Every point of improvement maps to a specific commit. The bot doesn’t give vague feedback. It says “Module 4 references ‘prompt injection’ without defining it in the glossary” and we fix it.
Why This Matters Beyond Our Course
The pattern here isn’t specific to courseware. It’s:
- Automated user simulation — A bot that follows the same path as your actual users
- AI-powered evaluation — Not just “does this file exist” but “does this content make sense”
- Autonomous remediation — The same AI that finds the problem can fix it
- Human approval gate — Trust but verify, one Telegram message to merge or reject
This works for documentation, onboarding flows, API getting-started guides, developer experience — anywhere the gap between “what you ship” and “what users experience” matters.
The total investment was about 2 weeks of setup (Terraform, scripts, Telegram integration, safety layers) and $7/month ongoing. The bot has been running daily for 2 months. It’s found and fixed more issues than our manual QA caught in the preceding 3 months.
The Commands
For the curious, here’s the full Telegram interface:
/run — Trigger immediate test (bypasses cron)
/report — Latest test report summary
/scores — Module evaluation scores
/status — Instance health (uptime, disk, memory)
/fix — Autonomous agent: read report, fix issues, create PR
/fix <msg> — Fix specific issue (skip report parsing)
/prs — List open bot PRs
/approve N — Merge PR #N
/reject N — Close PR #N
Plus free-form chat — you can ask the bot questions about the course and it answers using Claude with the course content as context.
What’s Next
The bot currently evaluates content quality but doesn’t complete exercises. Next phase: the bot actually does the Module 1 checklist, spots the vulnerability in Module 4’s case studies, and fills out the Module 5 audit template. Full student simulation, not just content review.
We’re also looking at running a fresh enrollment test monthly with a brand new GitHub account — simulating a true first-time student with zero context.
The entire bot infrastructure is in 02-Infrastructure/ of our private repo. Terraform, bash scripts, Telegram bot, all of it. The course it tests is free at base-bit/secure-prompt-vault.