Skip to main content

Ephemeral Apps Are Almost Here

AI Software Engineering Architecture

About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.

I recently built a Harvest clone in 18 minutes, a Trello clone in 19 minutes, and a Confluence clone in 16 minutes. All three were generated entirely by Claude Opus 4.6 from requirements documents. All run in Docker. All work.

The interesting thing is what happens to your relationship with code when regenerating an entire application costs less than a cup of coffee and takes less time than drinking one.

The Shift

Building apps and services now centers on writing precise descriptions (requirements documents and technical design documents) and then letting AI generate the entire application. The code becomes an intermediate artifact, like object files during compilation. You edit the source and recompile, not the object files.

We're approaching the same inflection point for applications themselves. The requirements document is the source. The running application is the compiled output. The code in between is just a transient byproduct of the build process.

What I Mean by Ephemeral

I've almost started treating some of my applications as ephemeral, meaning that today's version of the app is whatever was generated from the latest version of the requirements. Adding a new screen means updating the requirements document and regenerating the entire application.

This would have sounded impractical a year ago. But the economics have changed. When generating 7,000 lines of working code costs a few dollars in API tokens and takes 19 minutes, the expense of regeneration is no longer material. The code is disposable. The requirements are the investment.

Here is the workflow I've settled into: I have a requirements document and a technical design document for each application. When I want to change something, I update the requirements, hand both documents to Claude, and tell it to build the app. It does. I test it. If something needs fixing, I refine the requirements and regenerate. The code never accumulates the kind of entropy that makes traditional codebases increasingly painful to maintain over time, because there is no long-lived codebase. Every build is a clean room.

The Requirements Are the Product

This inverts a relationship that has been stable for decades. In traditional software development, the code is the product and the spec is a planning artifact that gets stale the moment development begins. In ephemeral development, the requirements document is the product. The code is a disposable rendering of that document, regenerated as needed, potentially by a different AI model each time.

Version control shifts too. You still version the code, since it remains useful for debugging and diffing between generations. But the requirements document is your source of truth. You invest your time in it, iterate on it, and protect it.

Writing Requirements Is the Hard Part

If the code is ephemeral, then the quality of the requirements determines the quality of the application. This is where the real skill lies now.

I posted on LinkedIn recently about my process for this:

When I am creating something new, something I really want to turn out well, I give the AI a short description of the app I want to build and then ask it to assume the role of an analyst and interview me. I have it ask me as many questions as needed to get a full and complete set of requirements including UI, accessibility, security, authentication, compliance, onboarding, workflows, admin UI, frameworks, and anything else. The AI then begins a very long and tedious process of questioning and it is very thorough and asks excellent questions. Often, it asks me about design details or edge cases I hadn't even thought of. This process can take hours but, in the end, it generates the finest requirements document you could hope for. That document is the deliverable. This is where I invest my time because I now know that I can regenerate the code at any time.

The AI-as-analyst pattern forces completeness. A human writing requirements alone will skip things they consider obvious: error states, empty states, edge cases in date handling, what happens when the user clicks Back, what the loading state looks like. The AI asks about all of it. Think of it as pair programming for requirements instead of code.

For the Trello clone, I took this even further: I had Claude write the requirements document and the technical design document, then build the application from both. My total input was two prompts. The AI interviewed itself, designed the architecture, and then implemented it. The result was a 6,800-line application with 52 tests, drag-and-drop, markdown editing, dark mode, and a command palette. From two sentences of human input.

The Language Question

If requirements documents become the primary input to software development, an uncomfortable question follows: does this only work in English?

Today's leading AI models (Claude, GPT, Gemini) were trained predominantly on English-language data, with programming tutorials, Stack Overflow answers, API documentation, and open-source codebases that are overwhelmingly in English. When I write a requirements document in English, the model draws on that vast training corpus to infer intent, apply best practices, and generate idiomatic code.

But what about a product owner in Tokyo writing requirements in Japanese? Or a startup in São Paulo writing in Portuguese? The models do understand these languages and can generate code from non-English prompts. But the quality gap is real. English-language requirements benefit from tighter alignment with the model's training distribution. A requirement like "the dashboard should lazy-load widgets as the user scrolls" maps directly to patterns the model has seen thousands of times in English-language React tutorials. The same requirement in Japanese may produce correct code, but the model has seen fewer examples of that specific pattern described in Japanese, so it may make different (sometimes worse) architectural choices.

This is a temporary problem. Training datasets are becoming more multilingual with every generation. And the requirements document itself is a structured, technical artifact, closer to a specification than to prose literature. The more structured and precise the document, the less the natural language matters. A well-organized requirements document with clear data models, explicit workflows, and unambiguous acceptance criteria will produce good results regardless of whether the headings are in English, Japanese, or Spanish.

For now, English remains the lingua franca of AI-assisted development, just as it has been the lingua franca of programming itself. If you're writing requirements in another language and getting inconsistent results, consider writing the technical sections (data models, API specifications, architecture decisions) in English, even if the feature descriptions and user stories are in your native language. This hybrid approach combines your domain expertise expressed naturally with technical precision in the language the model knows best.

The Compilation Analogy

Consider how software compilation works today. An engineer writes source code in a high-level language. They click Build. A compiler transforms their code into machine code, a lower-level representation that the machine can execute. The engineer never opens the compiled binary in a hex editor to make changes. If there's a bug, they fix the source code and recompile. If they want a new feature, they write more source code and recompile. The compiled output is ephemeral: regenerated on every build, never manually modified, treated as a disposable artifact of the build process.

Now replace "source code" with "requirements document," "compiler" with "AI model," and "machine code" with "application source code." The workflow is nearly identical:

Traditional Compilation Ephemeral App Generation
Engineer writes source code Designer/engineer writes requirements
Clicks "Build" / "Compile" Clicks "Generate" / hands to AI
Compiler transforms source to machine code AI transforms requirements to application code
Tests the compiled executable Tests the generated application
Finds a bug → edits source, recompiles Finds an issue → edits requirements, regenerates
Wants a new feature → writes more source, recompiles Wants a new screen → describes it in requirements, regenerates
Never edits the compiled binary Never edits the generated code
Build time: seconds to minutes Build time: minutes (soon: seconds)

The parallel holds because it is the same pattern at a different level of abstraction. In both cases, a human works in a high-level representation (source code or requirements), a tool transforms it into a lower-level representation (machine code or application code), and the lower-level representation is treated as disposable output. The discipline is the same: invest your time in the input, trust the build process, and fix the input when the output is wrong.

The only real difference today is speed. Compilation takes seconds. Generation takes minutes. But that gap is closing fast.

The analogy breaks down in one important way: compilation is deterministic; generation is not. If I take the same requirements document and generate an application twice, I get two different codebases. Different variable names, different component structures, sometimes different libraries. Both implement the same requirements. Both work. But they are not the same code.

Why LLMs Produce Different Code Each Time

This non-determinism is a fundamental property of how large language models work. An LLM generates code one token at a time, with each token selected probabilistically from a distribution of likely next tokens, rather than looking up the correct code the way a compiler looks up the correct machine instruction. The model assigns probabilities to thousands of possible continuations at each step, and a sampling process (controlled by parameters like temperature) introduces randomness into which token is actually chosen.

When the model is deciding what to name a variable, there might be a dozen reasonable options: boardList, boards, allBoards, boardData. Each has a similar probability. The one that gets selected depends on the random seed at that moment. And that single early choice cascades: if the variable is called boardList instead of boards, every subsequent reference to it throughout the codebase will differ. Multiply this by thousands of naming decisions, structural choices, and library preferences, and you get a codebase that's functionally equivalent but structurally distinct on every generation.

Even with temperature set to zero (fully deterministic sampling), different runs can produce different outputs due to floating-point non-determinism in GPU computations and differences in how the model's attention layers process the context. In practice, no two generations of a non-trivial application will be identical.

This is strange if you come from a traditional engineering mindset. Compile the same C file twice and you get identical binaries. Generate the same app twice and you get two different apps that do the same thing. It's as if you had a compiler that produced functionally equivalent but structurally distinct machine code on every run.

In practice, this matters less than you'd expect. If the code is truly ephemeral (never edited directly, always regenerated from requirements) then the specific variable names and component boundaries are irrelevant. What matters is that the generated application satisfies the requirements and passes the tests. The code is an implementation detail of the build process, and like all implementation details, you should avoid depending on its specifics.

It does mean that diffing between generations is sometimes meaningless. You can't always look at a git diff between Tuesday's generation and Wednesday's generation and understand what changed, because the AI might have restructured half the codebase for no functional reason. The meaningful diff is between the requirements documents, not the code.

The Consistency Problem

A practical tension exists here. While the code itself can vary freely between generations, certain aspects of the application must remain stable unless the requirements change. Users expect branding consistency: the logo in the same place, the same color scheme, the same navigation structure. If you regenerate the app to add a new report page and the sidebar navigation moves from left to right, or the primary button color shifts from blue to green, that is a regression even though nothing in the requirements changed.

Today's models do not guarantee this kind of visual and structural consistency between generations. A future improvement to the ephemeral model would be deterministic anchoring of certain application properties (layout conventions, branding elements, navigation patterns, component styling) so that these remain stable across regenerations unless explicitly changed in the requirements. This amounts to a design system that the AI must respect, baked into the requirements as constraints rather than suggestions. The problem is solvable, and solving it would make the ephemeral model far more practical for applications with real users.

The workflow pattern is still converging with traditional compilation. The layer of abstraction that engineers primarily work in is moving up from code to requirements.

The Timeline

Right now, regenerating a full-stack application takes 15-20 minutes with Claude Opus 4.6. That is fast enough to be practical (I regenerate my apps today) but still too slow to feel like compilation. You can't iterate at the speed of thought when each cycle takes 20 minutes.

Consider the trajectory:

  • 2024: Generating a working app from a prompt was unreliable. It usually needed significant manual fixes. You couldn't walk away.
  • 2025: Generating a working app from a detailed requirements document takes 15-20 minutes and usually works on the first try. You can walk away.
  • 2026-2027: Build times will compress as models get faster and inference costs continue dropping. Speculative: full app regeneration in 2-5 minutes.
  • Beyond: Regeneration in the time it takes to refill your coffee mug. Seconds, not minutes.

When regeneration takes seconds, the workflow changes fundamentally. Product owners and designers will work directly in requirements documents. They will describe a new screen, click Build, and see it running. The feedback loop between intent and result will be almost instantaneous. The distinction between "designing" an app and "building" an app will collapse.

What Happens to Software Engineers?

Software engineers will still matter, but the job description shifts.

The engineers who thrive will be the ones who are also excellent writers of requirements. Beyond writing a user story, this means specifying complex systems: data models, state machines, error handling, security boundaries, performance constraints, accessibility standards, edge cases. The ability to think precisely about what software should do, and articulate it completely enough that an AI can build it, becomes the core skill.

Engineers also remain essential for:

  • Architecture decisions that requirements documents can't fully capture: choosing between event-driven and request-response, deciding on consistency models, evaluating infrastructure tradeoffs
  • Debugging generated code when the AI produces something subtly wrong. Understanding why the code doesn't match the intent requires deep engineering knowledge.
  • Performance optimization when the generated code is correct but slow: profiling, identifying bottlenecks, and specifying performance requirements precisely enough that the next generation avoids them
  • Security review of generated code. AI models can and do produce code with security vulnerabilities, and catching these requires the same expertise it always has.
  • Infrastructure and deployment, including Terraform configurations, CI/CD pipelines, monitoring, and alerting. The AI can generate these too (and does, in my workflow), but someone needs to understand what is being provisioned and why.

The engineers who struggle will be the ones whose primary value is translating requirements into code, because that is exactly the task being automated. If your job is taking a Jira ticket and writing the React component it describes, the timeline for that job is measured in years, not decades.

The Implications

If applications become ephemeral (regenerated from requirements on demand) several things change:

Technical debt disappears as a concept. Every build is a clean generation with no accumulated cruft. No legacy code to maintain. No "we should refactor this someday" conversations. The requirements document either describes what you want or it needs updating. If it needs updating, you update it and regenerate.

Framework lock-in weakens. Your requirements document has no coupling to React, or Flask, or PostgreSQL. Today you generate a React app. Tomorrow you might generate a SwiftUI app from the same requirements. The day after, maybe something that doesn't exist yet. The requirements are portable in a way that code never is.

Every new AI model is a free upgrade. When a new model is released (faster, more capable, better at code generation) you simply regenerate your application using the new model. The output will be cleaner code, better performing functions, more idiomatic patterns, more thorough test coverage. Every improvement in AI code generation flows directly into your application the next time you rebuild it. In traditional development, benefiting from better tooling requires a conscious refactoring effort. In ephemeral development, you get the improvements simply by regenerating. Your requirements stay the same; the quality of the generated output ratchets upward with each model generation.

The "rewrite vs. refactor" debate ends. You always rewrite. Every generation is a rewrite. The cost of a rewrite drops to nearly zero, so the question of whether to invest in incremental improvement versus starting fresh answers itself.

Onboarding new team members gets easier. Reading a well-written requirements document takes a fraction of the time needed to read a complex codebase. When the requirements are the source of truth, a new team member can understand the entire system by reading a document, not by spelunking through thousands of lines of code across dozens of files.

Testing strategy changes. You stop writing unit tests for implementation details (those change every generation) and focus entirely on integration and end-to-end tests that validate the requirements. The test suite becomes a machine-readable version of the requirements document, which is what tests should have been all along.

The translation chasm disappears. In certain domains (financial modeling, actuarial science, quantitative research, scientific simulations) there has always been a painful gap between the domain expert who understands the math and the software engineer who implements it. A financial analyst specifies a Monte Carlo simulation with stochastic volatility models, mean-reversion parameters, and correlation matrices. A software engineer translates that specification into Python or C++. Every step of that translation is an opportunity for misinterpretation. The engineer doesn't fully understand the finance. The analyst doesn't fully understand the code. Bugs hide in the gap between them, and they are the worst kind of bugs: the code runs fine but produces subtly wrong numbers.

Ephemeral generation eliminates the middleman. The analyst who understands the Black-Scholes variations, the Greeks, the term structure models: that person writes the requirements. They describe the formulas, the edge cases, the numerical precision requirements, the validation checks. The AI generates the implementation. The analyst can verify the output against known results without ever reading a line of code. If the numbers don't match, they refine the requirements and regenerate. The domain expert becomes the developer, not because they learned to code, but because the barrier between domain knowledge and working software has been removed.

This applies anywhere complex domain knowledge gets lost in translation to code: bioinformatics pipelines, structural engineering simulations, pharmacokinetic models, energy grid optimization. The people who understand the problem best are finally the ones who can build the solution directly.

Jupyter Notebooks Were a Preview

A predecessor to this model has been hiding in plain sight: Jupyter notebooks.

Data scientists and researchers have been working this way for years. A Jupyter notebook interleaves prose descriptions (explaining the methodology, the assumptions, the mathematical reasoning) with executable code cells that implement each step. The notebook is the requirements document and the implementation simultaneously. You read the markdown cell that says "Apply a 30-day rolling average to smooth the signal" and immediately below it is the code that does exactly that.

Jupyter notebooks are essentially ephemeral development at the cell level. Researchers routinely delete a code cell, rewrite the description of what they want, and regenerate the implementation, sometimes by hand, increasingly with AI assistance. The notebook is versioned and shared. The code cells are treated as somewhat disposable; the markdown cells explaining the intent are the durable part.

The ephemeral app model extends this pattern from individual code cells to entire applications. Instead of interleaving requirements and code in a single notebook, you separate them entirely: the requirements document is one artifact, the generated application is another. The philosophy is the same: the description of what you want is the primary artifact, and the code that implements it is secondary, regenerable, ephemeral.

If you've ever worked in a Jupyter notebook and found yourself spending more time on the markdown cells than the code cells (making sure the reasoning is clear, the methodology is documented, the assumptions are explicit) you've already been practicing ephemeral development. You just didn't have a name for it yet.

The Defect-Free Prerequisite

This entire premise depends on code generation that is defect-free.

If a button is clicked and nothing happens when it was supposed to show a popup, the requirements were already correct; the AI simply failed to implement them. You would have to either debug the generated code (which defeats the purpose of treating it as ephemeral) or regenerate and hope the next attempt gets it right (which is unreliable if the model has a blind spot).

The ephemeral app model only works when the AI can reliably translate requirements into working code and verify that it works. The AI must be able to test and verify everything in the application, from backend API responses to frontend user interactions. The bar goes beyond "does the code compile" to "does clicking this button actually show the popup, with the right content, in the right position, dismissable by the right actions."

Today, we are partially there. Claude writes backend tests that cover API endpoints and data integrity. It writes Playwright E2E tests that cover navigation, form submission, and basic user workflows. But the test coverage is far from exhaustive. In my Harvest clone, Claude wrote 32 tests. A thorough QA engineer would have written 200. The gaps between tested and untested behavior are where defects hide, and those defects break the ephemeral model because they require manual intervention to diagnose and fix.

The full vision (where you never touch the generated code, where every regeneration produces a working application) requires AI that can test every interaction, every edge case, every error state, every visual layout, every accessibility requirement, and every performance target. It requires AI that can look at a rendered screen and judge whether it matches the design intent. It requires AI that can simulate real user behavior and catch the bugs that only appear when you click things in the wrong order.

A year ago, this was unimaginable. Today, it is inevitable. The trajectory of AI-assisted testing (visual regression testing, AI-driven E2E test generation, model-based testing that explores state spaces automatically) is converging with AI code generation. When those two capabilities merge completely, the ephemeral app model stops being aspirational and becomes the default way software is built.

We're Not There Yet

Ephemeral app generation works today for a specific category of software:

  • Single-user tools with straightforward data models (my Single Serving Applications)
  • CRUD applications where the business logic is well-understood
  • Greenfield projects where there's no existing data or integrations to preserve

It does not yet work well for:

  • Large, complex systems with hundreds of screens and intricate business rules
  • Systems with critical state where data migration between generations is non-trivial
  • Real-time systems with demanding performance requirements
  • Systems that integrate with many external services where the integration points are fragile

The gap will close. Models will get better at maintaining consistency across large codebases. Data migration between generations will become a solved problem (or databases will be generated alongside the code, with schema continuity enforced by the requirements). But today, ephemeral generation is practical for small-to-medium applications. For enterprise systems, we are still in the "traditional compilation" era, editing source code directly and rebuilding incrementally.

The Bottom Line

The most important artifact in software development is now the requirements document. The code is ephemeral: generated, tested, deployed, and regenerated when the requirements change. The requirements are durable: versioned, iterated, and maintained as the single source of truth.

If you're a software engineer, the most valuable skill you can develop right now is learning to write requirements so precise and complete that an AI can build the entire application from them without asking a single clarifying question.

If you're a product owner or designer, the tools that will matter most in the next few years are whatever tools emerge for writing, managing, and versioning requirements documents that serve as build inputs for AI code generation.

The era of ephemeral apps is almost here. The engineers and organizations that adapt first, shifting their investment from code to requirements, from maintenance to regeneration, from frameworks to specifications, will have a significant advantage.

The requirements are the product. Everything else is a build artifact.

Let's Build Something!

I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.

Currently taking on select consulting engagements through Vantalect.