Background
Replit's company mission is to empower software creators. Historically, this has meant building an IDE with beginner-friendly abstractions and escape hatches to more powerful tooling. However, given the rapidly increasing power of LLMs in early 2024, our CEO Amjad felt it was time to take the leap toward our ultimate product vision: a tool for software creation that does not require an understanding of coding principles.
I was asked to lead design for this product. The Replit Agent lets users build software purely with natural language – it takes care of writing all the code, setting up the environment, and working collaboratively with the user toward building the intended app. It is worth nothing that the Agent's ability to do all this is largely due to Replit's existing primitives for easy developer environment management, which also enables the user to take control when the Agent is done.
Workflow
The Agent workflow is fairly straightforward. After being given an initial description of an app, the Agent first generates a plan for the app it intends to build. If the user approves this plan, the Agent enters a thinking and execution loop where it reads the dev environment, evaluates the best next course of action, and performing that action – generating code, making a change to the environment, or requesting further user input. Along the way, the Agent will proactively check its work with the user – as with the rest of the Replit platform, collaboration is built-in.
Abilities
In desigining the Agent, I was given green field to determine all of the abilities and mechanics it needed to do its job successfully. Although this list of abilities will only grow with time, I am quite pleased with all that we developed and shipped, including:
- Reading, creating, and updating files
- Installing packages and OS-level modules
- Running shell commands and binding them to the Workspace run button
- Reviewing runtime output (both text-based console output and UIs)
- Evaluating UI output via screenshotting and vision
- Determining if the user is better suited to evaluate an output and asking them to do so
- Periodic checkpointing via git commits
- Reverting to previous checkpoints mid-session and changing course
- Updating the top-level plan due to a course correction
- Presenting results to the user (idea credit to Szymon Kaliski)
- Requesting environment variables from the user without adding them into context (unsafe)