The world's first benchmark for enterprise AI agents. Know more

Agent Studio: build, test, deploy, and manage AI agents on one canvas

6 min read

—Updated May 25, 2026

DevRev Editorial

We built Computer, your AI teammate

The lifecycle gap that kills agents

Most agent tooling covers one phase. You get a builder with no deployment story. Or an orchestration framework with no evaluation system. Or a monitoring tool that can’t roll back a bad release.

Agent Studio covers the full loop:

Build. Define agent capabilities through reusable skills – natural language composition for business users, custom Python for developers. The same canvas supports both. Non-technical teams describe behaviors in plain language. Engineers add sophisticated logic only where needed.

Test. A playground for quick iteration, plus bulk evaluation against datasets with LLM-as-judge scoring. Upload input-output pairs, run the agent against the full set, and get scored on correctness, completeness, and task success.

Deploy. Version agents like code. Canary deployments with gradual rollout. One-click rollback on performance degradation. Publish new versions without touching the live agent. Every archived version is restorable from history.

Manage. Real-time metrics on cost, accuracy, and usage. Session traces showing step-by-step reasoning. Automated daily benchmarks ensuring quality thresholds hold. If performance degrades, you know before your customers do.

Three types of skills that turn agents into actors

Without skills, an agent only answers questions. With skills, it takes action.

Tools are built-in DevRev actions with configurable parameters – creating tickets, updating records, sending messages. Each operation has input fields the agent auto-fills from conversation context. Immediate, single-step operations.

NL Skills are sub-agents with plan-based reasoning. An NL Skill receives a natural-language objective, decomposes it into steps, and executes autonomously. For multi-step work beyond a single tool call – the kind of reasoning that separates real AI agents from chatbots.

Workflows are deterministic automation sequences built visually. Combine conditional logic, loops, delays, AI reasoning nodes, and external integrations on one canvas. Workflows themselves become agent skills – an agent in conversation can trigger a workflow, execute the full logic, and return results seamlessly.

All skills run under an “Execute as User” permission model: the agent acts with the permissions of the user it represents. No privilege escalation. Full audit trail.

160+ native nodes, serverless and durable

The workflow engine is where Agent Studio’s architecture becomes visible. 160+ native nodes across five categories:

Triggers (48 types) – ticket created, timer, API call, SLA breach, or agent skill invocation. Workflows start from real business events.

Control – If/Else branching, parallel routing, For Each loops, While loops (up to 1,000 iterations), variable management. Full programmatic control without code.

Actions (100+) – CRUD on any object, HTTP calls, MCP remote tools, hybrid search, SQL queries, custom Python execution.

AI nodes – Ask AI (normal and reasoning modes), delegate to another agent, classify objects, evaluate sentiment. LLM decisions embedded in deterministic flows.

Blocking – Sleep for a duration, until a date, or until an external event occurs. Workflows can pause for days or months and resume exactly where they left off.

The engine is serverless and durable – it doesn’t lose state. A workflow that pauses waiting for customer approval on a Tuesday resumes seamlessly when the approval arrives on Friday.

Custom Python in a sandboxed environment

The Execute Code node bridges no-code and full programmatic control. Write Python directly within workflows for data transformation, complex calculations, or multi-variable routing logic.

Runtime constraints keep things safe: 120-second timeout, 256 MB memory, 512 KB output limit. Standard libraries available (json, datetime, requests, regex, math). System access blocked (no os, subprocess, or threading). Your code runs in isolated sandboxes – never touching other tenants or production infrastructure.

Observability that actually explains what happened

Once agents are live, the Observe tab provides two complementary views:

Analytics – aggregate dashboards showing resolution rates, conversation volumes, and quality over time. Tracks cost, accuracy, latency, and token usage. Filterable by time range.

Sessions – individual conversation traces. Inspect step-by-step reasoning, understand tool usage, diagnose failures. Each session shows the full execution detail with tool inputs and outputs.

Performance targets built into production monitoring: mean end-to-end score above 0.75, time to first token under 4 seconds, total response time under 15 seconds. Automated CI benchmark runs daily.

L1–L4: a taxonomy for enterprise agent capabilities

Agent Studio proposes a novel framework for benchmarking enterprise AI agents – inspired by TPC database benchmarks and autonomous driving levels:

L1 – Reactive. Deterministic retrieval and execution. If A, then B. Objectively right or wrong. May span multiple systems but every operation is rule-based.

L2 – Analytical. Multi-step reasoning with judgment calls. The agent interprets, synthesizes, and infers – not just follows joins.

L3 – Strategic. Proactive detection and cross-system coordination. The agent acts on signals without being explicitly prompted.

L4 – Autonomous. Self-directed optimization over extended periods with minimal human input. Novel pattern recognition and open-ended execution.

This taxonomy gives enterprises a shared language for evaluating agent maturity – both internally and across vendors.

From natural language to production in minutes

Start by describing what you want the agent to do. Agent Studio generates the initial structure. Extend with custom code only where your business demands it. The same canvas that a business analyst uses to define agent behavior is the same one a developer uses to add Python logic, AWS AgentCore integrations, or MCP connections.

Business users build the first version. Developers harden it. Operations teams monitor it. One platform, one canvas, one lifecycle.

Agent Studio is generally available as part of Computer by DevRev. The full technical architecture, including the L1–L4 benchmarking framework, is detailed in the Agent Studio whitepaper.

Build your first agent in Agent Studio →