Vibe Coding with Control: A Workflow with Cline

Natural Language coding, and how it came about

“Vibe coding” emerged in early 2025, coined by Andrej Karpathy to describe coding by feel, with AI copilots handling the details. Instead of writing precise instructions, developers describe intent — “Add a search bar,” “Shrink the padding” — and let LLMs generate or refactor code. The term quickly went viral as tools like GPT-4o, Claude 3, Cursor, and Devin crossed a threshold where this style became viable.

In Andrej Karpathy's framing, software development has moved through three major eras:

Software 1.0: Classical programming, where humans wrote explicit instructions line-by-line in languages like C, Java, and Python.
Software 2.0: The deep learning era, where neural networks learned behaviors from data instead of explicit instructions.
Software 3.0: The age of Large Language Models (LLMs), where we "program" using natural language instructions (prompts), and models generate code based on intent.

Vibe coding is the practical manifestation of Software 3.0: we iterate conversationally with AI agents to transform intent into working code.

Current Vibe Coding Tools: Quick Overview

Tool	Pros	Cons
Cursor	Deep IDE integration, structured “Cursor Rules” for context, tight feedback loop	Can struggle in very large repos; cross-file logic sometimes brittle
GitHub Copilot (Chat/Voice)	Excellent autocomplete, natural-language refactors, improving voice interface	Limited memory for complex workflows, global reasoning still weaker
Devin (Cognition)	Full agentic workflow—plans, runs shell commands, opens PRs	Early phase; brittle if tasks aren’t well-scoped
Replit AI / Ghostwriter	Instant web-based prototyping, hobby-friendly	Less suitable for production-level infra
Cline	Autonomous VS Code agent: edits files, runs terminals & headless browser, snapshots workspace, uses MCP tools	Still needs careful context setup and human review; can spiral when refactoring across many edits

While many tools exist to support vibe coding, we chose Cline because of the following:

Open-source & fully owned — no platform lock-in; we self-host and pay only for API calls.
Easy integration — lightweight VSCode extension that fits directly into our existing workflow.
Customizable & extendable — maintain Cline Rules easily; integrate external tools via MCP servers as the project grows.

Our Vibe Coding Workflow

We follow a system built around clear principles for AI-assisted coding. Cline serves as the platform we use to enforce these principles, but the methodology applies universally.

1. Start With Ruthless Clarity

"The worst bug is building the wrong thing perfectly."

Before any coding begins:

We ensure that the problem statement, feature requirements, architectural considerations, and possible edge cases are fully specified in natural language.
Use AI assistants like ChatGPT (GPT-4o) or Gemini Pro to collaboratively refine the problem definition into a detailed Markdown spec.
This spec becomes part of the repo, functioning as both prompt input and future documentation.

💡 In Cline: The Markdown spec is stored in the repo. Cline reads this spec during task execution to stay anchored to intent.

2. Establish and Maintain Project Context

One of the biggest weaknesses of AI coding assistants is their limited project-wide memory. Each code generation request needs to be anchored in context to ensure architectural consistency.

We address this with two complementary approaches:

Blueprint-first context anchoring: We maintain a continuously updated blueprint.md file in every project repo. This serves as the canonical reference for:
- System architecture
- API schemas
- Design tokens and UI systems
- Shared component libraries
- External integrations and dependencies
The blueprint evolves alongside the codebase. As new features are implemented, architectural changes are documented and reflected in this file, ensuring that both humans and AI share a single source of project truth.
Pre-load system design context into generation: Before any new generation request, we selectively pre-feed models with:
- Relevant sections of blueprint.md
- Current API schemas
- Design system files
- Shared utility libraries
This allows the AI to generate code that fits seamlessly into the system’s existing structure, without hallucinating inconsistent patterns or duplicating logic unnecessarily.

💡 In Cline: We configure Cline to read the blueprint and pin critical files into the context window before Act cycles.

# Blueprint Maintainance

**Objective:** Read, Create, maintain and modify `blueprint.md` that outlines project's architecture, design decisions, core logic, and file references to guide development and onboarding.

**Trigger:** Whenever a new task starts, ask the user if you should read the blueprint, and according to the response read it. Or, when a significant milestone or a set of related architectural changes within a task is complete, and before moving to a distinctly new phase or feature set, or just before the intended final `attempt_completion`. This offer should be made after the relevant changes are understood by the user. If further significant architectural changes are requested after a blueprint update offer, the offer should be made again once those subsequent changes are complete.

**Process:**
- **At the Start**
1. **Ask User if you should read the blueprint file:** Use the `ask_followup_question` tool, whenever a new task is assigned, to check with the user if you should read the blueprint.
2. **Follow user directives:** Per the provided responose, if directed, read the blueprint file, if not, skip and continue with the task.

- **Before Completion**
1.  **Offer Blueprint Changes via Tool:** Use the `ask_followup_question` tool as the final step before the intended `attempt_completion`. The tool use should be similar to:
    ```xml
    <ask_followup_question>
    <question>I've completed the planned steps for the task. Before I finalize with attempt_completion, would you like me to analyse the changes made and suggest potential updates to the blueprint docs?</question>
    <options>["Yes, Propose updates for Blueprint", "No, complete the task now"]</options>
    </ask_followup_question>
    ```
2.  **Await User Choice:**
    *   If the user selects **"No, complete the task now"**: Proceed immediately with the `attempt_completion` tool.
    *   If the user selects **"Yes, identify changes"**: Proceed to step 3.
3.  **If User Chooses Update:**
    a.  **Identify Changes:** Check if the changes affect architecture, logic, structure, stack, design system, or external integrations. For every significant update, create a 1-3 sentence summary of what changed and why, being concise and clear.
    b.  **Read Existing Blueprint:** If `blueprint.md` exists in the current repo, Identify the corresponding section(s) in the document that need to be updated. If it doesn't exist, create a new document.
    c.  **Formulate & Propose Changes:** Generate specific, actionable changes for the *content* of the `blueprint.md` document. Prioritize changes directly addressing non-trivial updates. Use `replace_in_file` diff blocks when practical, otherwise describe changes clearly.
    d.  **Await User Action on Suggestions:** Ask the user if they agree with the proposed changes and if they'd like me to apply them *now* using the appropriate tool (`replace_in_file` or `write_to_file`). **Crucially, wait for explicit user approval before applying any changes to the blueprint document.** Apply changes if approved.
    e.  **Update Logs:** Log updates at the bottom of the document, under `Updates Log` header, in the following format, `**TimeStamp**: Single line reflecting the change`, to maintain update history, then proceed to `attempt_completion`.

**Constraint:** Do not offer updates if:
*   The updates are trivial (e.g., typo fixes, comment formatting etc)
*   The current mode is PLAN MODE (update offer should only happen in ACT mode just before completion).

3. Using Predesigned Prompts and Tool Augmentations

While prompt engineering can be done ad hoc, we’ve found that standardizing reusable instruction templates vastly improves stability, correctness, and model behavior over time.

Rules can be easily updated as we observe model behavior, allowing for continuous self-improvement of the assistant.

Tool Augmentations with MCP Servers

Cline also supports integration with external MCP servers, giving the assistant access to powerful external capabilities beyond pure generation:

Live internet search for retrieving up-to-date documentation
External browser access for rendering live API responses
Grounded RAG-style retrieval of internal documentation
Specialized validators for linting, type-checking, and tests

These tools significantly enhance both planning and debugging, allowing us to offload certain reasoning steps to reliable external processes, rather than relying purely on language model speculation.

3. Break Complex Tasks into Atomic Steps

Complex features are always broken into discrete, independently verifiable subtasks:

First, design the overall plan with clear milestones.
Each subtask should involve limited file changes, clear input/output expectations, and, if applicable, accompanying tests.

💡 In Cline: Our Complex Task Breakdown rule ensures that large features are broken into actionable steps, and prevents uncontrolled multi-file edits.

# Complex Task Breakdown Rule

**Objective:** For complex tasks, propose breaking them down into smaller, verifiable steps and manage execution step-by-step with user verification.

**Trigger:** When a task plan is generated that involves multiple steps, modifications of multiple files, and high estimated lines of code (LOC), and before asking the user to toggle to ACT mode.

**Process:**

1.  **Propose Breakdown:** Use `ask_followup_question` to ask the user if they want to break down the task.
    ```xml
    <ask_followup_question>
    <question>This task appears complex, involving multiple steps and file modifications. Would you like me to break it down into smaller, independently verifiable steps and work through them one by one?</question>
    <options>["Yes, break it down", "No, proceed with the full plan"]</options>
    </ask_followup_question>
    ```
2.  **Await User Choice:**
    *   If user selects **"No, proceed with the full plan"**: Acknowledge and proceed with the original plan, asking the user to toggle to ACT mode.
    *   If user selects **"Yes, break it down"**: Proceed to step 3.
3.  **Breakdown and Document:**
    a.  Analyze the complex plan and break it into smaller, atomic steps.
    b.  Create a temporary Markdown file in `/tmp` (e.g., `/tmp/cline_task_breakdown_[timestamp].md`) listing the broken-down steps with brief descriptions and required context.
    c.  Inform the user that the breakdown document has been created and list the first step.
4.  **Execute Steps Iteratively:**
    a.  Work on the current step using appropriate tools in ACT mode.
    b.  Upon completion of a step's implementation:
        i.  Identify a verification method (e.g., "Check the console for log messages from function X", "Verify component Y renders correctly in the browser", "Run test Z").
        ii. Use `ask_followup_question` to inform the user of the verification method and ask for confirmation that the step works.
        ```xml
        <ask_followup_question>
        <question>Step [step_number] is complete. Please verify the implementation by [verification_method]. Has this step been successfully verified?</question>
        <options>["Yes, verified", "No, there's an issue"]</options>
        </ask_followup_question>
        ```
    c.  **Await User Verification:**
        i.  If user selects **"No, there's an issue"**: Troubleshoot the issue based on user feedback. Once resolved, re-request verification.
        ii. If user selects **"Yes, verified"**: Mark the step as complete in the temporary document (optional, but good practice). Proceed to the next step in the document. If there are more steps, inform the user of the next step. If all steps are complete, proceed to step 5.
5.  **Completion:** Once all steps are verified, inform the user that the broken-down task is complete and proceed with the final `attempt_completion`. Mention the temporary file and suggest its deletion.

4. Diagnose First, Then Escalate

"Fix the problem. Not the symptom."

When code errors arise:

First, manually review the failing code to understand the root cause.
If the issue is obvious, we either fix it directly or prompt the assistant with additional precise context.
If unclear, allow the assistant one controlled fix attempt.
If failure persists, revert to a known good state and re-approach with new context.

💡 In Cline: We leverage Cline’s Checkpoints for instant rollback to earlier states, keeping the codebase clean from accumulated patches.

5. Debug Like a Detective, Not a Gambler

AI agents tend to throw speculative fixes at bugs. We follow a structured debugging workflow:

Understand the problem
Create a hypothesis of why it might be happening, and plan information collection to validate it.
Collect logs, stack traces, or relevant error output.
Use AI models to help reason through hypotheses and only then apply a surgical fix.

💡 In Cline: We switch into Plan Mode for debugging sessions, using our Debugging Strategy Rule to structure diagnosis before any fix attempt.

# General Debugging Strategy

**Objective:** Provide a structured approach for diagnosing and resolving various types of software issues, from logical errors to state synchronization problems.

**Trigger:** When a user reports unexpected behavior, errors, or inconsistencies in the application.

**Process:**

1.  **Understand the Problem:**
    *   Clarify the exact symptoms and reproduction steps from the user.
    *   Formulate a clear hypothesis about the potential cause of the issue (e.g., "Incorrect data processing," "UI not updating correctly," "API call failing," "State synchronization problem").
2.  **Plan Diagnostic Logging/Inspection:**
    *   Identify key areas in the code relevant to the hypothesis (e.g., function inputs/outputs, state variables, API responses, component lifecycle methods).
    *   Use `console.log` (or equivalent debugging tools like breakpoints) to output:
        *   Relevant variable values at different stages.
        *   Execution flow indicators (e.g., "Entering function X," "Exiting loop Y").
        *   Timestamps (if timing is suspected).
        *   The specific code location for clarity.
    *   For UI issues, consider using browser developer tools to inspect element properties, styles, and component states.
    *   When using custom UI components, always consult their interface/props definitions to ensure correct usage.
    *   When working with state management libraries like Redux, ensure that components using hooks like `useSelector` or `useDispatch` are properly wrapped within the Redux `Provider`.
3.  **Instruct User to Reproduce and Provide Information:**
    *   Ask the user to reproduce the issue with the added logging/inspection steps enabled.
    *   Request the full console output, network tab details, or screenshots as relevant to the problem.
4.  **Analyze Information and Confirm Hypothesis:**
    *   Examine the provided logs and other diagnostic information.
    *   Compare observed behavior with expected behavior based on the hypothesis.
    *   Confirm or refine the hypothesis based on the evidence.
5.  **Propose Fix (Only After Confirmation):**
    *   If the hypothesis is confirmed, propose a targeted fix that directly addresses the identified root cause.
    *   If the hypothesis is not confirmed, iterate by refining the hypothesis and diagnostic steps.

6. Keep Diffs Small and Human-Reviewable

"Humans are the bottleneck, not the AI."

Even if the AI can generate large blocks of code, we cap each request’s scope:

Make sure a lot of files are not modified at once, keeping changes limited per commit.
Proper documentation using inline comments, as well as detailed commit messages
Put a cap on how many LOC can be updated per commit.

This makes review manageable, limits downstream bugs, and reduces unnecessary rework.

💡 In Cline: We enforce these constraints by default as part of our Act loop boundaries.

7. Automate Self-Improvement

Continuous improvement applies not just to the code, but to the assistant itself:

After each coding session, we reflect on what rules, prompts, or context setups worked or failed.
Update the assistant’s operating rules based on actual session history.

💡 In Cline: Our Self-Reflecting Rule automatically reviews every completed session and proposes improvements to the rule set.

# Self-Improving Cline Reflection

**Objective:** Offer opportunities to continuously improve `.clinerules` based on user interactions and feedback.

**Trigger:** Just before the intended final `attempt_completion` for any task that involved user feedback provided at any point during the conversation, or involved multiple non-trivial steps (e.g., multiple file edits, complex logic generation).

**Process:**

1.  **Offer Reflection via Tool:** Use the `ask_followup_question` tool as the final step before the intended `attempt_completion`. The tool use should be similar to:
    ```xml
    <ask_followup_question>
    <question>I've completed the planned steps for the task. Before I finalize with attempt_completion, would you like me to reflect on our interaction and suggest potential improvements to the active `.clinerules`?</question>
    <options>["Yes, reflect first", "No, complete the task now"]</options>
    </ask_followup_question>
    ```
2.  **Await User Choice:**
    *   If the user selects **"No, complete the task now"**: Proceed immediately with the `attempt_completion` tool.
    *   If the user selects **"Yes, reflect first"**: Proceed to step 3.
3.  **If User Chooses Reflection:**
    a.  **Review Interaction:** Synthesize all feedback provided by the user throughout the entire conversation history for the task. Analyze how this feedback relates to the active `.clinerules` and identify areas where modified instructions could have improved the outcome or better aligned with user preferences.
    b.  **Identify Active Rules:** List the specific global and workspace `.clinerules` files active during the task.
    c.  **Formulate & Propose Improvements:** Generate specific, actionable suggestions for improving the *content* of the relevant active rule files. Prioritize suggestions directly addressing user feedback. Use `replace_in_file` diff blocks when practical, otherwise describe changes clearly.
    d.  **Await User Action on Suggestions:** Ask the user if they agree with the proposed improvements and if they'd like me to apply them *now* using the appropriate tool (`replace_in_file` or `write_to_file`). Apply changes if approved, then proceed to `attempt_completion`.

**Constraint:** Do not offer reflection if:
*   No `.clinerules` were active.
*   The task was very simple and involved no feedback.
*   The current mode is PLAN MODE (reflection offer should only happen in ACT mode just before completion).

The issues with vibe coding — and what’s likely next

Where the wobbles show up today

Issue	What it looks like in the wild	Why it matters
Run-away technical debt	Mixed-style, undocumented patches pile up as each LLM “fix” layers new files over the last one. Zencoder lists it as the #1 risk for 2025 vibe projects. (zencoder.ai)	Refactors and onboarding time balloon, erasing the early speed gains.
Security gaps	Prompt-injection, leaked system prompts, and “Excessive Agency” sit in the new OWASP LLM Top 10 (LLM06). (genai.owasp.org)	A single unsafe generation can expose prod data or trigger supply-chain exploits.
API & logic hallucinations	Recent FSE ’25 paper shows 31 % of LLM errors come from “calling functions that don’t exist.” Their MARIN framework had to cut hallucinations by 67 % to make code usable. (arxiv.org)	Debug-time soars because errors are semantically, not syntactically, wrong.
Bug-density paradox	A Stanford study cited by WIRED found devs using assistants shipped more bugs while believing their code was safer. (wired.com)	Over-confidence plus hidden faults is a risk multiplier in prod systems.
Repo bloat & context drift	ARSTURN chronicles “spaghetti repos” where each prompt adds redundant helpers, assets and test stubs. (arsturn.com)	Even simple fixes turn into archaeology sessions; CI minutes (and bills) spike.
Skill atrophy & oversight fatigue	WIRED likens it to airline pilots over-trusting autopilot—engineers delegate, skills rust. (wired.com)	Teams that can’t audit the AI eventually can’t ship without it.
Cost opacity	Long context windows + retries make pay-as-you-go bills unpredictable; devs report “$300 day-one surprises” on Reddit threads. (reddit.com)

What the next two years probably bring

Guard-railed pipelines become table-stakes. Expect IDEs and agent layers to ship OWASP-style scanners that auto-block LLM01–LLM05 errors before code hits main. Early versions already appear in GitHub’s “Copilot Security” beta. (wired.com)
Context gets smarter, not just larger. Research like MARIN uses hierarchical dependency maps instead of brute-force RAG to feed only compile-time-relevant code into the model, slashing hallucinations and token spend. (arxiv.org)
Blueprint-driven coding workflows. Lightweight “living design docs” (Cline Blueprint, Cursor Ask-Doc) will act as the canonical memory so that any agent can rehydrate project context instantly—mitigating drift and duplicated helpers. (arsturn.com)
Hybrid agent loops dominate. Fully autonomous dev agents stay experimental; Deloitte predicts only 25 % of Gen-AI companies will run agentic pilots in 2025, rising to 50 % by 2027. Human-in-the-loop plan → act → review models will remain standard for production code. (deloitte.com)
Local & small models for inner-loop work. To tame cost spikes and handle private code, teams shift simple tasks (lint, boilerplate, unit-test stubs) to 7 – 13 B parameter SLMs running on-prem, reserving big-ticket API calls for complex reasoning. Industry surveys already show 20× cost savings when mixing models at prompt time. (arxiv.org)
Continuous self-reflection becomes productised. Early self-auditing rules (like your Cline “Reflect after every task”) foreshadow agents that rewrite their own prompts, improve style guides, and raise tests when they trip on the same class of bug twice.
Regulation & provenance layers emerge. EU AI Act clauses on generated code traceability will drive tooling that watermarks or hashes every AI diff, giving auditors a paper trail from prompt → commit.

Bottom line: the vibe-coding wave isn’t crashing—it’s maturing. Speed remains the headline feature, but the winners will be the teams that bolt on structure, security and cost discipline before letting the vibes flow.