API Reference
Devbox
- The Devbox Object
- Devbox Lifecycle
- Devbox File Tools
- Devbox Shell Tools
- Devbox Network Tools
- Devbox Persistence Tools
- Devbox Observability Tools
- Devbox Add-ons
Blueprint
- The Blueprint Object
- Blueprint Lifecycle
- Blueprint Observability
Repository
- The Repository Object
- Repository Lifecycle
Code Scenario
- Code Scenario Lifecycle
- Scenarios Runs
- Custom Scenario Scorer
- Public Scenarios
Benchmark
- Benchmark Lifecycle
- Benchmark Runs
- Public Benchmarks
Create a Scenario.
Create a Scenario, a repeatable AI coding evaluation test that defines the starting environment as well as evaluation success criteria.
import Runloop from '@runloop/api-client';
const client = new Runloop({
bearerToken: process.env['RUNLOOP_API_KEY'], // This is the default and can be omitted
});
async function main() {
const scenario = await client.scenarios.create({
input_context: { problem_statement: 'problem_statement' },
is_public: true,
name: 'name',
scoring_contract: {
scoring_function_parameters: [
{
name: 'name',
scorer: { pattern: 'pattern', search_directory: 'search_directory', type: 'ast_grep_scorer' },
weight: 0,
},
],
},
});
console.log(scenario.id);
}
main();
{
"id": "<string>",
"name": "<string>",
"environment": {
"blueprint_id": "<string>",
"snapshot_id": "<string>",
"prebuilt_id": "<string>",
"launch_parameters": {
"launch_commands": [
"<string>"
],
"resource_size_request": "X_SMALL",
"keep_alive_time_seconds": 123,
"available_ports": [
123
],
"after_idle": {
"idle_time_seconds": 123,
"on_idle": "shutdown"
},
"custom_cpu_cores": 123,
"custom_gb_memory": 123
},
"working_directory": "<string>"
},
"input_context": {
"problem_statement": "<string>",
"additional_context": {}
},
"scoring_contract": {
"scoring_function_parameters": [
{
"name": "<string>",
"scorer": {
"lang": "<string>",
"search_directory": "<string>",
"pattern": "<string>",
"type": "ast_grep_scorer"
},
"weight": 123
}
]
},
"metadata": {},
"reference_output": "<string>",
"is_public": true
}
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
Name of the scenario.
The scoring contract for the Scenario.
A list of scoring functions used to evaluate the Scenario.
ScoringFunction specifies a method of scoring a Scenario.
Name of scoring function. Names must only contain [a-zA-Z0-9_-].
The scoring function to use for evaluating this scenario. The type field determines which built-in function to use.
The path to search.
AST pattern to match. Pattern will be passed to ast-grep using the commandline surround by double quotes ("), so make sure to use proper escaping (for example, $$$).
ast_grep_scorer
The language of the pattern.
Weight to apply to scoring function score. Weights of all scoring functions should sum to 1.0.
Whether this scenario is public.
The Environment in which the Scenario will run.
Use the blueprint with matching ID.
Use the snapshot with matching ID.
Use the prebuilt with matching ID.
Optional launch parameters to apply to the devbox environment at launch.
Set of commands to be run at launch time, before the entrypoint process is run.
Manual resource configuration for Devbox. If not set, defaults will be used.
X_SMALL
, SMALL
, MEDIUM
, LARGE
, X_LARGE
, XX_LARGE
, CUSTOM_SIZE
Time in seconds after which Devbox will automatically shutdown. Default is 1 hour.
A list of ports to make available on the Devbox. Only ports made available will be surfaced to create tunnels via the 'createTunnel' API.
Configure Devbox lifecycle based on idle activity. If after_idle is set, Devbox will ignore keep_alive_time_seconds.
custom resource size, number of cpu cores, must be multiple of 2.
custom memory size, number in Gi, must be a multiple of 2.
The working directory where the agent is expected to fulfill the scenario. Scoring functions also run from the working directory.
User defined metadata to attach to the scenario for organization.
A string representation of the reference output to solve the scenario. Commonly can be the result of a git diff or a sequence of command actions to apply to the environment.
Response
A ScenarioView represents a repeatable AI coding evaluation test, complete with initial environment and scoring contract.
The ID of the Scenario.
The name of the Scenario.
The scoring contract for the Scenario.
A list of scoring functions used to evaluate the Scenario.
ScoringFunction specifies a method of scoring a Scenario.
Name of scoring function. Names must only contain [a-zA-Z0-9_-].
The scoring function to use for evaluating this scenario. The type field determines which built-in function to use.
The path to search.
AST pattern to match. Pattern will be passed to ast-grep using the commandline surround by double quotes ("), so make sure to use proper escaping (for example, $$$).
ast_grep_scorer
The language of the pattern.
Weight to apply to scoring function score. Weights of all scoring functions should sum to 1.0.
User defined metadata to attach to the scenario for organization.
The Environment in which the Scenario is run.
Use the blueprint with matching ID.
Use the snapshot with matching ID.
Use the prebuilt with matching ID.
Optional launch parameters to apply to the devbox environment at launch.
Set of commands to be run at launch time, before the entrypoint process is run.
Manual resource configuration for Devbox. If not set, defaults will be used.
X_SMALL
, SMALL
, MEDIUM
, LARGE
, X_LARGE
, XX_LARGE
, CUSTOM_SIZE
Time in seconds after which Devbox will automatically shutdown. Default is 1 hour.
A list of ports to make available on the Devbox. Only ports made available will be surfaced to create tunnels via the 'createTunnel' API.
Configure Devbox lifecycle based on idle activity. If after_idle is set, Devbox will ignore keep_alive_time_seconds.
custom resource size, number of cpu cores, must be multiple of 2.
custom memory size, number in Gi, must be a multiple of 2.
The working directory where the agent is expected to fulfill the scenario. Scoring functions also run from the working directory.
A string representation of the reference output to solve the scenario. Commonly can be the result of a git diff or a sequence of command actions to apply to the environment.
Whether this scenario is public.
Was this page helpful?
import Runloop from '@runloop/api-client';
const client = new Runloop({
bearerToken: process.env['RUNLOOP_API_KEY'], // This is the default and can be omitted
});
async function main() {
const scenario = await client.scenarios.create({
input_context: { problem_statement: 'problem_statement' },
is_public: true,
name: 'name',
scoring_contract: {
scoring_function_parameters: [
{
name: 'name',
scorer: { pattern: 'pattern', search_directory: 'search_directory', type: 'ast_grep_scorer' },
weight: 0,
},
],
},
});
console.log(scenario.id);
}
main();
{
"id": "<string>",
"name": "<string>",
"environment": {
"blueprint_id": "<string>",
"snapshot_id": "<string>",
"prebuilt_id": "<string>",
"launch_parameters": {
"launch_commands": [
"<string>"
],
"resource_size_request": "X_SMALL",
"keep_alive_time_seconds": 123,
"available_ports": [
123
],
"after_idle": {
"idle_time_seconds": 123,
"on_idle": "shutdown"
},
"custom_cpu_cores": 123,
"custom_gb_memory": 123
},
"working_directory": "<string>"
},
"input_context": {
"problem_statement": "<string>",
"additional_context": {}
},
"scoring_contract": {
"scoring_function_parameters": [
{
"name": "<string>",
"scorer": {
"lang": "<string>",
"search_directory": "<string>",
"pattern": "<string>",
"type": "ast_grep_scorer"
},
"weight": 123
}
]
},
"metadata": {},
"reference_output": "<string>",
"is_public": true
}