The LLM Playbook: Beyond Code Gen and Into System Orchestration

Categories: AI/LLM, Tutorial
Tags: AI, LLM, SWE

I write this post, like many of the things I write, primarily for myself. It started as a messy collection of disparate notes and a bloated folder of browser bookmarks. But the longer I build alongside AI, the more I realize the need to take a step back and actively review the process.

As software engineers in the AI age, everything we do is shifting away from pure syntax and moving toward process, architecture, and orchestration. This article is my formal playbook for approaching technical problems when you have an LLM, quite literally, at your fingertips.

AI started as an advanced line-completer. Essentially, it was a syntax helper to speed up typing out what we had already decided in our heads. Those days are firmly in the past. Now we have LLMs capable of reasoning across entire systems. We basically have a team of eager entry-level engineers at our disposal twenty-four seven.

Side note: We still desperately need real human entry-level engineers. Hit me up, I’d love to grab a coffee and discuss exactly why.

Here is the TL;DR. Once you understand how to treat AI as an architectural collaborator rather than a magic wand, this high-level checklist is honestly all you need to reference before cracking open a ticket.

The TL;DR Playbook

  • Step 1: Discovery & Triage: Gather telemetry, set scope, and avoid the “garbage in, garbage out” context trap.
  • Step 2: System-Level Brainstorming: Zoom out to infrastructure, caching, and data layers. Isolate external, non-code factors.
  • Step 3: The Collaborative Blueprint: Pitch your hypothesis to the AI, pressure-test it via iterative feedback, and co-author the execution plan.
  • Step 4: Hardening & Code Review: Hunt for edge cases, write regression tests, and bake in observability.

Step 1: Establish the Baseline & Verify Intent

Get your head in the game first. Regardless of whether you’re diving into a greenfield repository or touching a legacy monolith you’ve worked on for years, you have to align your mental model of the system with reality before any code is written.

Jumping straight into code generation is a massive trap. LLMs are inherently people-pleasers; if you feed them incomplete context or unverified assumptions, they will confidently hallucinate an equally broken solution that looks incredibly convincing.

At a junior level, this step stops you from spending days building the completely wrong feature. At a senior level, this is where you map telemetry, verify business constraints, and separate surface-level symptoms from the actual architectural root cause.

The Execution

  • Gather the Telemetry: Grab the exact ticket requirements, raw error logs, APM/observability alerts, or user reports.
  • Audit the Chronology: Look at the history. How long has this been happening? Did the issue coincide with an upstream dependency bump, a quiet infrastructure tweak, or a sudden spike in traffic?
  • Isolate the Delta: Clearly define the gap between what the system is currently doing versus what it is expected to do.

The Baseline Prompt

We need to tell the LLM who it is and give it the tools to succeed. Along with the prompt below, feed it any structural data you have on hand. Think dependency files (package.json, requirements.txt, or composer.json), configuration files (Dockerfile, vite.config.ts), explicit framework runtime versions, or a quick directory layout.

Because LLMs love to agree with you, you have to explicitly order them to be critical and skeptical. This is the exact baseline prompt I kick off with (it works incredibly well if you drop it into a custom project instructions file like a claude.md).

I am prepping to work on a task. Before writing any code, I need to establish a strong baseline understanding of the current system state and verify the true engineering requirements. 

Here is the context I have (Ticket details, logs, or requirements):
[PASTE YOUR TICKET/LOGS/DATA HERE]

Act as a Senior Systems Architect and analyze this data. Answer these three questions for me:
1. Based on this information, what are the hidden technical assumptions I might be making about how the system currently behaves?
2. What critical telemetry, observability metrics, or historical context (e.g., traffic changes, recent deployments elsewhere) are missing that I should verify before proceeding?
3. How can we cleanly separate the immediate surface symptoms from the potential systemic root cause?

Step 2: Define the Architectural Boundaries (Macro & Micro Scope)

Now we’re cooking with fire. The AI is sufficiently skeptical, and now we set the table.

The biggest mistake developers make at this stage is dumping an entire codebase into an LLM or blindly attaching every file in their workspace. While modern models have massive context windows, throwing too much noise at them causes severe attention dilution and retrieval degradation. This is sometimes called the “lost in the middle” phenomenon. A senior approach is about strategic containment. We need to map out the macro-architecture first, and then zoom in on a precise, localized micro-scope.

The Execution

  • Map the Macro-Topology: Start by giving the AI a bird’s-eye view of your project’s geography. This means showing it the directory structure, entry points, and primary configuration files.
  • Refine to the Micro-Scope: Once the AI understands the layout, isolate the specific neighborhood where the changes will live. If you are dealing with a latency issue in a specific microservice or adding a feature to a precise data-fetching layer, truncate your context to only those relevant modules.
  • Audit the System Lore: Briefly feed the AI any local lore; READMEs, markdown notes, or architectural decision records (ADRs). These might hold historical hints or strict team constraints.

The Scope Audit Prompt

Before you ask the AI to change anything, ask it to analyze how your project is wired together. We want to identify the architectural blast radius before we write code.

Act as a Principal Engineer. I am going to provide the structural blueprint of the system I'm working on so we can isolate our scope of work. 

Here is the directory structure and core architectural layout:
[PASTE OUTPUT OF A `tree` COMMAND, MONOREPO STRUCTURE, OR KEY ENTRY POINTS]

Here are the specific files/modules handling the logic in question:
[PASTE LOGIC/CODE SAMPLES FOR THE LOCALIZED SERVICE OR RELEVANT COMPONENT]

Based on this architecture:
1. What is the potential blast radius if I modify this localized scope? Which upstream or downstream files are at the highest risk for side effects or breaking changes?
2. Are there any architectural patterns, hidden dependencies, or constraints visible in this layout that I need to strictly adhere to?
3. What helper utilities, existing middleware, or internal abstractions already exist in this project that I should leverage instead of rewriting from scratch?

Step 3: The Collaborative Blueprint

This is where the magic actually happens. At this stage, most developers make the mistake of asking the AI to just “write the code” for the feature or fix. If you do that, you’re abdicating your job as an engineer and handing the steering wheel to a statistical model.

Instead, you need to use the AI as an architectural sounding board. You pitch your proposed solution, your hypothesis, and your architectural approach, and then you force the LLM to aggressively tear it apart. You want it to act like a cynical, highly pedantic Principal Engineer conducting a design review.

Watch out for confirmation bias here: if your fundamental approach is flawed, a people-pleasing model might just try to optimize your bad plan instead of steering you to the right one. Force it to speak up.

The Execution

  • Formulate Your Hypothesis: Clearly state how you intend to solve the problem or build the feature. Break down your logic step by step.
  • Force a Friction Analysis: Explicitly ask the AI to hunt for edge cases, race conditions, memory bottlenecks, or state synchronization issues that your plan might have overlooked.
  • Co-Author the Incremental Plan: Do not let the AI emit a massive, single-file code dump. Instruct it to break the implementation down into small, isolated, testable milestones. If you can’t verify correctness at step one, you shouldn’t be moving to step two.

The Blueprint Prompt

By presenting your solution first, you dictate the technical direction while using the AI to catch your blind spots.

I have analyzed the system scope, and I have a proposed implementation plan for this task. I want you to act as a cynical, highly critical Principal Staff Engineer and review my approach. Do not blindly agree with me; look for flaws.

My proposed plan is:
[INSERT YOUR STEP-BY-STEP IMPLEMENTATION PLAN OR HYPOTHESIS]

Review this approach and answer the following:
1. What hidden regressions, edge cases, or architectural friction points (e.g., race conditions, scale bottlenecks, unhandled error states) am I introducing with this plan? Is my fundamental engineering pattern correct, or should I be looking at an alternative architecture entirely?
2. Are there alternative engineering patterns or cleaner implementations that would accomplish this more elegantly?
3. Convert this validated approach into a 3-stage, incremental execution plan. Each stage must be isolated, small, and include a clear way for me to verify or test that it works before moving to the next stage.

Step 4: Hardening & Code Review

You’ve written the code, the incremental steps worked, and your local test suite is green. Naturally, it’s time to open a Pull Request and take a much-needed break, right?

Not yet. Although a coffee top-off might be smart, a quick mental break is a good idea here. This is the exact moment where bugs slip into production because we are too close to our own logic to see the obvious.

Instead of treating the AI as an autocomplete tool, this is where you treat it as a ruthless, tireless quality assurance engineer. The goal here isn’t just to make sure the code works right now; it’s to make sure that six months from now, when a teammate (or future you) has to modify this file at 4:45 PM on a Friday, it doesn’t blow up in their face.

The Execution

  • Enforce Defensive Constraints: Have the AI inspect the raw code changes specifically looking for things humans gloss over when they’re tired: type safety, silent failures, unhandled promise rejections, or missing boundary checks.
  • Write the Regression Traps: Don’t just ask the AI to “write tests.” Ask it to identify the most fragile parts of your new logic and generate targeted integration or unit tests designed to break it.
  • Bake In the Smoke Signals: If this logic fails in production, how will you know? Use the AI to audit your logging and observability. If a service drops a connection or receives bad payload data, you want an explicit, readable alert in your APM (Application Performance Monitoring) tool, not a cryptic undefined is not a function buried deep in a massive log dump.

The Hardening Prompt

When you hand your code to the AI for a final pass, explicitly tell it to ignore pedantic formatting arguments (that’s what your linter is for) and focus entirely on structural durability.

The implementation is complete and working locally. I need a brutal, principal-level code review focused entirely on software durability, security, and edge-case handling. Ignore minor formatting or variable naming style arguments. 

Here is the code I wrote:
[PASTE YOUR NEW/MODIFIED CODE HERE]

Act as a strict QA Automation and Security Engineer and audit this code:
1. What specific edge cases, memory leaks, unhandled errors, or typing vulnerabilities did I leave exposed?
2. If this code fails silently or crashes in production, what observability is missing? Suggest exactly where we need to add explicit logs, metrics, or alerts so we can triage issues instantly.
3. Write 3 targeted, high-value unit or integration test cases designed specifically to stress-test the most fragile boundaries of this new logic.

Step 4.5: The Automation Layer (How We Standardize This at Scale)

Everything I just walked you through is incredibly powerful when you’re manually working through a ticket on your second monitor. But as a senior engineer, the ultimate goal is to take a solid process and turn it into automation so the whole workflow scales.

For my personal projects, I have actually automated this final hardening step into a custom tool. I built a custom /review-pr command (which sits cleanly in my .cursor/SKILL.md file) that handles this via a multi-agent pipeline right from my local development environment.

Instead of relying on a single, generic AI model to catch everything, executing that command kicks off a parallel, coordinated effort behind the scenes:

  • An Orchestrator Agent analyzes the raw git diff to map out the blast radius and system topology.
  • It spawns a specialized Security Sub-Agent whose only job in life is to ruthlessly hunt for vulnerabilities, credentials exposure, and validation gaps.
  • It simultaneously spins up a Performance Sub-Agent focused strictly on runtime efficiency, memory footprint, and potential database bottlenecks.
  • Finally, a Critic Agent cross-references their findings, aggressively weeds out false positives, and outputs a single, clean architectural review.

By encoding these engineering standards directly into automated, local developer tools, we eliminate “prompt drift. We’re ensuring that every single line of code gets the exact same principal-level scrutiny before it ever gets staged for a final commit or peer review.

Conclusion: Driving the Machine

The thread connecting this entire framework is simple: You are the pilot; the AI is the engine.

If you use LLMs as a fancy copy-paste machine to generate code you don’t fully understand, you are playing engineering roulette. You’re trading temporary velocity today for a massive technical debt headache tomorrow (and maybe even some scary vulnerabilities).

But when you shift your mindset and treat the AI as a hyper-competent, slightly cynical architectural partne, everything changes. Suddenly, you aren’t just coding faster. You are leveraging a tool to manage massive cognitive load, map system boundaries, stress-test your design logic, and ruthlessly audit your implementation before another human ever looks at it.

As software engineering continues to evolve, we will be measured less by our pure typing speed and more by our process, our system architecture, and our ability to orchestrate complex solutions.

So the next time you crack open a ticket, don’t just ask the AI for the answer. Show it your blueprint, tell it to be critical, and force it to help you build a more durable system. Future you will thank you for it.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.