Runloop is the batteries included platform designed for building and optimizing AI-driven software engineering agents.

With the Runloop platform, you get:

  • Fast, isolated, snapshottable virtual machines for executing agents & agent tools (Devboxes).
  • Team-shareable templates for launching new devboxes with custom configuration (Blueprints), and suspend and resume support (Snapshots).
  • Zero-configuration code repository integration (Code Mounts) and a fully-featured, ready-to-use language server (Code Understanding APIs).
  • Turnkey benchmarking and evaluation services for fine-tuning your agent’s behavior (Code Scenarios).

Whether you are trying to build an AI agent that can respond to pull requests, or an AI agent that can generate new UI components, Runloop makes it possible to get from zero to POC in just a few lines of code.

Why Runloop?

Our mission at Runloop is to keep you focused on the things differentiate your AI agent. Leave the building blocks to us and spend time on what actually matters.

As you agent evolves, your needs will evolve too. Runloop is designed for builders at all stages:

StageWhy Runloop
Prototyping
  • Zero infrastructure worries using managed, instant-on devboxes.
  • Build, deploy, learn, and iterate quickly.
Production
  • Team-shared blueprints and projects.
  • 24/7/365 managed platform and oncall team.
  • SOC2 compliant.
Growth
  • Benchmarking and evaluation stack to monitor and fine-tune your agent’s performance.

Use cases

Our customers are already leveraging Runloop to build AI agents that can:

  • Respond to Pull Requests and enhance the code review process
  • Enable users to chat with and navigate their codebase
  • Generate new test cases for existing codebases
  • Act as pair programmers
  • Generate new UI components for their frontend

Have a use case that we didn’t cover? Send us an email at support@runloop.ai to learn more about how Runloop can help you build AI agents.

Core Components of Runloop

Devboxes

Devboxes are isolated, cloud-based development environments that can be controlled by AI agents via the Runloop API. You can give agents access to a devbox to let agents run and test code in a safe, isolated environment.

import Runloop from '@runloop/api-client';
import { generateText, tool } from 'ai';

// Create an isolated Devbox for the agent to use
const devbox = await runloopClient.devboxes.createAndAwaitRunning();
// Get the Runloop Tool Representation for the Devbox and convert them to Vercel AI Tools
const runloopDevboxShellTools = runloopClient.devboxes.tools.shellTools(devbox.id)      
const runloopDevboxFileTools = runloopClient.devboxes.tools.fileTools(devbox.id)      

// Use VercelAI SDK to create a simple agent that uses the Devbox to code a game
const { text: answer } = await generateText({
    model: openai('gpt-4o-2024-08-06'),
    tools: {
        ...runloopDevboxShellTools,
        ...runloopDevboxFileTools
    },
    maxSteps: 10,
    system:
        'You are an expert python coder that specializes in making CLI games.'
    prompt:
        'Create a CLI game that is a guessing game where the user has to guess a number between 1 and 100. Write the python script in the file `game.py`. The program should be callable from the command line via `python game.py`. Once you have generated the program, run it and print the output to stdout.'
  });

console.log(`ANSWER: ${answer}`);

Code Understanding APIs

Code Understanding APIs are currently in beta. Please contact us at support@runloop.ai to get access.

A critical part of making AI SWE agents work reliably is giving them the right context to solve the problem. In many cases, this means extracting context from the existing codebase such as function signatures, finding tests that cover a specific segment of code, or understanding which files are often edited together. For example, one common heuristic that helps AI agents navigate codebases is the Repository Map used by Aider. Writing your own Repository Map style heuristic can be difficult as it requires static analysis of the codebase. Other heuristics can be even harder to create and rely on gathering information from the runtime dataflow of the codebase. The Code Understanding APIs aim to make it possible to create these types of heuristics in just a few lines of code.

import Runloop from '@runloop/api-client';

const runloopClient = new Runloop({
    bearerToken: process.env.RUNLOOP_API_KEY,
});

// First create a repository connection for Runloop to preindex the codebase we will run on:
const repositoryConnectionView = await client.repositories.createAndAwaitIndexing({ name: 'repository-name', owner: 'repository-owner' });

// Now let's use the repository connection to recreate the repository map heuristic:
// 1. We list all the code files in the repository
const codeFiles = await client.repositories.codeFiles(repositoryConnectionView.id);
// 2. We can use the special file viewer and query syntax to only view the files with method signatures
const files_with_method_signatures_only = await client.repositories.fileViewer(repositoryConnectionView.id, {
    files: codeFiles,
    // We make our query against the AST such that we only get class and method nodes and only include the signature and comments 
    query: 'class(signatures, comments) || method(signatures, comments)'
});

// Or we can simply use the built in repository map heuristic:
const repositoryMap = await client.repositories.repositoryMap(repositoryConnectionView.id);

Code Scenarios

Code Scenarios APIs are currently in beta. Please contact us at support@runloop.ai to get access.

Tuning the behavior of your AI agent is a critical part of making it work reliably. However, going from POC to production is where most AI agents fail. Code Scenarios is a set of Benchmarking and Eval Tools that help you understand and improve your AI agent’s behavior in a methodical way. For example, with Code Scenarios:

  • You can run your agent against well known benchmarks such as SWE-bench or create your own custom benchmarks
  • You can record live production agent traces and monitor the Agent performance or use the traces to create new benchmarks
  • You can create custom Reward Models based on production traces and benchmark data to fine tune your agent’s behavior
import Runloop from '@runloop/api-client';

const runloopClient = new Runloop({
  bearerToken: process.env.RUNLOOP_API_KEY,
});

// 1. Create new Benchmarks or use existing ones like SWE-bench
const myBenchmark = await runloop.benchmarks.create({
    benchmark: 'UI Component Generation',
    testCases: [
        {
            name: 'Create a Login Page',
            problemStatement: 'Create a login page that allows users to login to the application. Call the component `LoginPage` and export it from the file `src/components/LoginPage.tsx`.',
            // Configure specific starting Devbox environemnts for your Agent to use as part of a benchmark test
            environment: 'DEFAULT_DEVBOX',
            outputContractRules: [
                {
                    type: 'typescript',
                    files: ["expected_snapshot.png"]
                    validate: (output) => {
                        // Validate the storybook snapshot is updated and roughly matches the expected_snapshot.png
                    }
                }
            ]                
        }
    ]
});

// 2. Run your agent against the benchmark
const benchmarkRun = await runloop.benchmarks.beginTestRun(myBenchmark.id);
// For each test case run our agent and report the completion
for (const testCase of myBenchmark.testCases) {
    const agentOutput = await myAgent.run({
        prompt: testCase.problemStatement,
        devbox: testCase.devbox
    });
    await runloop.benchmarks.reportTestCaseRun(testCaseRun.id, {
        output: agentOutput
    });
}