Our $7/Month Simulated Student Catches What We Can't

by Alien Brain Trust AI Learning
Our $7/Month Simulated Student Catches What We Can't

Our $7/Month Simulated Student Catches What We Can’t

Meta Description: We built a simulated student on a $7/month EC2 instance that enrolls in our course, follows onboarding emails, and grades every module. It found 15 issues we missed.

You can’t QA your own course. You wrote it. You know where everything is. You skip the confusing parts because they aren’t confusing to you.

So we built a student.

The Problem with Testing Your Own Course

We launched the Secure Prompt Vault course—5 modules teaching prompt security. We read every file. We tested every skill. We thought it was ready.

It wasn’t.

A file referenced in onboarding email 3 didn’t exist. Module 3 mentioned a tool (/sp t) without explaining how to install it. Module 5 promised a 9-point checklist but only delivered 4 points. The word “security blanket” appeared in every module but was never defined.

We missed all of it. Because we knew what we meant.

The Simulated Student

The bot runs on a t4g.micro ARM64 instance in us-east-1. Cost: $7.01/month.

Every morning at 10:00 UTC, it:

  1. Enrolls — Hits our Cloudflare Worker enrollment endpoint with test data
  2. Accepts the GitHub invite — Uses the GitHub API to accept the repo collaboration invite
  3. Pulls the repo — Gets the latest course content
  4. Follows the emails — Checks that every file referenced in onboarding emails actually exists
  5. Grades each module — Sends content to Claude Haiku for evaluation on clarity, completeness, and actionability
  6. Sends a report — Telegram message with pass/fail counts and scores

The whole run takes about 90 seconds.

What It Tests

The test script runs 21 checks across 4 phases:

Phase 1: Enrollment Flow

  • Is the enrollment endpoint reachable?
  • Does it validate required fields?
  • Does form submission work?

Phase 2: GitHub Access

  • Can the bot accept the repo invite?
  • Is the repo accessible?
  • Does git pull work?

Phase 3: Email Instructions

  • Does every file mentioned in onboarding emails exist?
  • Are module directories where emails say they are?
  • Do installation guides exist where promised?

Phase 4: AI Evaluation

  • Each module gets scored 0-10 on clarity, completeness, and actionability
  • The AI acts as a “complete beginner” and flags jargon, missing context, and broken references
  • Specific feedback on what confused it and what’s broken

The Scores Tell the Story

First run: 5/10 average. Every module flagged as “truncated” and “incomplete.”

Turned out the eval script was only reading the first 3KB of each file. The files are 13-21KB. The AI graded the introduction and assumed the rest was missing.

After fixing the eval window and adding key definitions glossaries:

ModuleBeforeAfter
1: Secure Prompt Foundations5/107/10
2: Safe Prompt Library4/107/10
3: Automated Testing4/107/10
4: Real-World Case Studies5/107/10
5: Team Audit Checklist5/107/10

The Infrastructure Stack

Nothing fancy. That’s the point.

  • EC2 t4g.micro — ARM64, 1 vCPU, 1GB RAM. Runs the test script and the Telegram bot.
  • S3 — Stores the test script and bot code. Deploy = upload to S3 + SSM restart.
  • SSM Parameter Store — Holds the Anthropic API key and GitHub PAT. Encrypted with KMS.
  • Telegram Bot API — Sends reports. Receives commands. No web server needed.
  • Cron0 10 * * * as ubuntu user. That’s it.

No containers. No Kubernetes. No Lambda. No orchestration layer. One bash script, one Python bot, one cron entry.

What the Bot Actually Found

Over the first week, the bot identified:

  • 2 missing files referenced in onboarding emails (FAQ.md, INSTALLATION.md)
  • 5 modules with undefined jargon (“security blanket,” “jailbreak suite,” “OWASP LLM Top 10”)
  • 3 incomplete sections that cut off mid-paragraph
  • 1 missing Getting Started guide in Module 3
  • 4 cross-module references that assumed knowledge from unread modules

Every one of these would have confused a real student. None of them were obvious to us as course authors.

The Cost Breakdown

ComponentMonthly Cost
EC2 t4g.micro (on-demand)$7.01
S3 storage (scripts + reports)$0.02
SSM Parameter Store (3 params)Free
KMS (1 key, ~30 decryptions/month)$1.00
Anthropic API (Haiku, daily evals)$1-3
Total~$10/month

For context: a single hour of manual QA testing costs more than a month of the bot running.

Lessons from Building the Bot

Start with the student’s path, not the content. Our first test version just checked if files existed. The real value came when we made it follow the actual onboarding email sequence—enroll, get invite, pull repo, find Module 1. That’s when the missing pieces showed up.

AI evaluation is noisy but directional. Claude Haiku’s scores vary by 1-2 points between runs. Don’t chase 10/10. Use it to find the 4/10 modules that need real work.

The cheapest instance is usually enough. We almost upgraded to t4g.small “for headroom.” Glad we didn’t. The bot uses 200MB of RAM. The instance has 1GB. Plenty.

Secrets management matters even for bots. The GitHub PAT and Anthropic key live in SSM Parameter Store, encrypted with KMS. The bot decrypts them at startup. No .env files on disk. No secrets in code. This isn’t paranoia—it’s the same security hygiene we teach in the course.

What’s Next

The bot now has autonomous fix capabilities (that’s tomorrow’s post). But the QA pipeline alone—daily tests, Telegram reports, trend tracking—has already caught more bugs than a month of manual review.

If you’re building a course, a product, or any content pipeline: build the student first. Then build the content.


Infrastructure costs based on AWS us-east-1 on-demand pricing as of February 2026. AI evaluation scores from Claude Haiku (claude-haiku-4-5-20251001).