Choosing the Right Tool: Lessons from 72 Hours of Infrastructure Hell
Choosing the Right Tool: Lessons from 72 Hours of Infrastructure Hell
Meta Description: We wasted 72 hours on Azure before 20-minute Cloudflare fix. Here’s the decision framework we should have used.
Three days fighting Azure. Twenty minutes on Cloudflare. The technical details are in our previous posts. This one is about the decision-making failure that cost us three days.
The Mistake We Made
We chose Azure Static Web Apps because:
- We already had an Azure account
- We’d used Azure Functions before
- The documentation looked straightforward
- It was “free tier”
None of these are good reasons to choose a tool. They’re reasons of convenience and familiarity, not fit.
The Questions We Should Have Asked
1. What’s the Minimum Viable Infrastructure?
Our requirements:
- Receive HTTP POST request
- Validate JSON body
- Make two API calls (Airtable, GitHub)
- Return JSON response
That’s a single function. Not a “Static Web App with Managed Functions and GitHub Integration.” Not a platform. A function.
The simplest solutions:
- Cloudflare Workers (what we used)
- Vercel Serverless Functions
- Netlify Functions
- AWS Lambda with API Gateway
- A $5/month VPS running Express
Azure Static Web Apps adds: deployment pipelines, configuration sprawl, portal/workflow sync, managed build systems. For what? A 100-line function.
Lesson: Match complexity to requirements. If you need a function, use a function platform, not a website platform with functions bolted on.
2. How Quickly Can I Verify the Deploy Pipeline?
This should have been our first test:
export default {
async fetch(request) {
return new Response('Hello World');
}
};
Deploy it. Hit the endpoint. Confirm you see “Hello World.”
With Cloudflare Workers, this takes 2 minutes.
With Azure Static Web Apps, we didn’t have a working deploy pipeline after 72 hours. That should have been our signal on hour 2, not hour 72.
Lesson: Before writing business logic, prove you can deploy and see changes. If deploy is broken, nothing else matters.
3. Where Does Configuration Live?
In Cloudflare Workers, configuration lives in two places:
wrangler.toml(project settings)- Dashboard (secrets)
That’s it.
In Azure Static Web Apps, configuration can be in:
staticwebapp.config.json(routing, headers)host.json(function runtime settings)function.json(per-function bindings).github/workflows/*.yml(build/deploy settings)- Azure Portal → Configuration (overrides? defaults? who knows)
- Azure Portal → Environment Variables
- Implicit defaults that aren’t documented
When something breaks, where do you look? Everywhere. And some settings override others in undocumented ways.
Lesson: Fewer configuration surfaces = faster debugging. Every config file is a potential source of mismatch.
4. What Do Error Messages Look Like?
Cloudflare Worker error:
INVALID_VALUE_FOR_COLUMN: Cannot parse value "testuser" for field GitHub User Name
Azure Static Web Apps error:
The content server has rejected the request with: BadRequest
Reason: No matching Static Web App was found or the api key was invalid.
The first tells you exactly what’s wrong (field type mismatch). The second could mean anything—wrong token, deleted resource, misconfigured workflow, wrong region, expired secret.
Lesson: Try to break the tool intentionally. See what error messages look like. If they’re useless, debugging production issues will be painful.
5. How Do I Add Secrets?
Cloudflare: Dashboard → Settings → Variables and Secrets → Add → Encrypt.
Azure Static Web Apps (our journey):
- Started with environment variables in portal
- Learned about Azure Key Vault for “proper” secret management
- Needed Managed Identity to connect Key Vault to Functions
- Realized Free tier might not support Managed Identity
- Gave up and used plain environment variables
- Environment variables didn’t propagate to deployed functions
- Considered using GitHub Secrets + workflow variables
- Never got far enough to test this because deploys kept failing
Lesson: Secret management should be trivial for simple use cases. If it requires multiple services and identity systems, you’re overengineered.
The Decision Framework We Use Now
Before choosing infrastructure, we now ask:
Fit Questions
- What’s the smallest thing that could work? (Start there)
- Does this tool add capabilities I’ll actually use? (If not, it’s complexity tax)
- Can I deploy hello world in under 10 minutes? (If not, red flag)
Debug Questions
- How many places can configuration live? (Fewer = better)
- What do error messages look like? (Test this intentionally)
- How do I view logs in production? (If it’s complicated, problems will be too)
Escape Questions
- How painful is migration if this doesn’t work? (Low lock-in = safety)
- What’s my fallback option? (Know it before you start)
- At what point do I abandon this and try the fallback? (Set a timebox)
If we’d asked these questions before starting, we would have:
- Noticed Azure Static Web Apps has configuration in 6+ places
- Tested deploy with hello world (and discovered it was broken)
- Set a 4-hour timebox before trying alternatives
- Had Cloudflare Workers as a documented fallback
Instead, we spent 72 hours because “we’d already invested time” and “Azure should work.”
The Sunk Cost Problem
This was our real failure. By hour 8, we should have asked: “Is continuing to debug Azure the best use of time?”
We didn’t because:
- We’d already written the function code (sunk cost)
- We’d already learned the config file formats (sunk cost)
- We’d “almost” figured out the deploy (sunk cost)
- Switching felt like admitting defeat (ego)
None of these are rational reasons to continue. The function code was portable. The config knowledge was worthless if deploys didn’t work. “Almost working” after 8 hours means fundamentally broken.
Lesson: Set explicit checkpoints. “If this isn’t working in 4 hours, we try the next option.” Treat it as process, not failure.
What We’d Do Differently
-
Timebox the first attempt. 4 hours max before evaluating alternatives.
-
Test deploy before writing code. Hello world first. Always.
-
Document the fallback before starting. “If Azure doesn’t work, we’ll try Cloudflare Workers.” Having it written down makes the switch easier.
-
Count configuration surfaces. More than 3? Probably overengineered for simple use cases.
-
Value portability. Standard JavaScript functions are portable. Platform-specific bindings aren’t. Write portable code when possible.
The Silver Lining
We now have:
- A working enrollment system on Cloudflare
- A blog post series documenting the journey
- A decision framework for future infrastructure choices
- A strong opinion about Azure Static Web Apps (not for simple APIs)
And the Learn Labs course is live, which was the whole point.
Sometimes the long way teaches you more than the short way would have. But we’d still prefer to learn lessons in hours, not days.
Related posts: