Agent Studio: build, test, deploy, and manage AI agents on one canvas
6 min read
—90% of AI agents fail within 90 days. Not because the models are wrong – because teams have no infrastructure to move agents from prototype to production safely.
Every enterprise has the same story: someone builds a demo agent in a weekend. It works in the playground. Then it hits production and breaks in ways nobody anticipated – hallucinating on edge cases, executing actions without proper permissions, degrading silently with no alerting. There’s no versioning. No rollback. No way to know if yesterday’s agent was better than today’s.
Agent Studio solves the lifecycle problem, not the model problem. It’s enterprise agent infrastructure built on DevRev’s Knowledge Graph – where you build skills in natural language or code, test against real datasets, deploy with canary rollouts, and manage production agents with full observability.
The lifecycle gap that kills agents
Most agent tooling covers one phase. You get a builder with no deployment story. Or an orchestration framework with no evaluation system. Or a monitoring tool that can’t roll back a bad release.
Agent Studio covers the full loop:
Build. Define agent capabilities through reusable skills – natural language composition for business users, custom Python for developers. The same canvas supports both. Non-technical teams describe behaviors in plain language. Engineers add sophisticated logic only where needed.
Test. A playground for quick iteration, plus bulk evaluation against datasets with LLM-as-judge scoring. Upload input-output pairs, run the agent against the full set, and get scored on correctness, completeness, and task success.
Deploy. Version agents like code. Canary deployments with gradual rollout. One-click rollback on performance degradation. Publish new versions without touching the live agent. Every archived version is restorable from history.
Manage. Real-time metrics on cost, accuracy, and usage. Session traces showing step-by-step reasoning. Automated daily benchmarks ensuring quality thresholds hold. If performance degrades, you know before your customers do.
Three types of skills that turn agents into actors
Without skills, an agent only answers questions. With skills, it takes action.
Tools are built-in DevRev actions with configurable parameters – creating tickets, updating records, sending messages. Each operation has input fields the agent auto-fills from conversation context. Immediate, single-step operations.
NL Skills are sub-agents with plan-based reasoning. An NL Skill receives a natural-language objective, decomposes it into steps, and executes autonomously. For multi-step work beyond a single tool call – the kind of reasoning that separates real AI agents from chatbots.
Workflows are deterministic automation sequences built visually. Combine conditional logic, loops, delays, AI reasoning nodes, and external integrations on one canvas. Workflows themselves become agent skills – an agent in conversation can trigger a workflow, execute the full logic, and return results seamlessly.
All skills run under an “Execute as User” permission model: the agent acts with the permissions of the user it represents. No privilege escalation. Full audit trail.
160+ native nodes, serverless and durable
The workflow engine is where Agent Studio’s architecture becomes visible. 160+ native nodes across five categories:
Triggers (48 types) – ticket created, timer, API call, SLA breach, or agent skill invocation. Workflows start from real business events.
Control – If/Else branching, parallel routing, For Each loops, While loops (up to 1,000 iterations), variable management. Full programmatic control without code.
Actions (100+) – CRUD on any object, HTTP calls, MCP remote tools, hybrid search, SQL queries, custom Python execution.
AI nodes – Ask AI (normal and reasoning modes), delegate to another agent, classify objects, evaluate sentiment. LLM decisions embedded in deterministic flows.
Blocking – Sleep for a duration, until a date, or until an external event occurs. Workflows can pause for days or months and resume exactly where they left off.
The engine is serverless and durable – it doesn’t lose state. A workflow that pauses waiting for customer approval on a Tuesday resumes seamlessly when the approval arrives on Friday.
Custom Python in a sandboxed environment
The Execute Code node bridges no-code and full programmatic control. Write Python directly within workflows for data transformation, complex calculations, or multi-variable routing logic.
Runtime constraints keep things safe: 120-second timeout, 256 MB memory, 512 KB output limit. Standard libraries available (json, datetime, requests, regex, math). System access blocked (no os, subprocess, or threading). Your code runs in isolated sandboxes – never touching other tenants or production infrastructure.
Observability that actually explains what happened
Once agents are live, the Observe tab provides two complementary views:
Analytics – aggregate dashboards showing resolution rates, conversation volumes, and quality over time. Tracks cost, accuracy, latency, and token usage. Filterable by time range.
Sessions – individual conversation traces. Inspect step-by-step reasoning, understand tool usage, diagnose failures. Each session shows the full execution detail with tool inputs and outputs.
Performance targets built into production monitoring: mean end-to-end score above 0.75, time to first token under 4 seconds, total response time under 15 seconds. Automated CI benchmark runs daily.
L1–L4: a taxonomy for enterprise agent capabilities
Agent Studio proposes a novel framework for benchmarking enterprise AI agents – inspired by TPC database benchmarks and autonomous driving levels:
L1 – Reactive. Deterministic retrieval and execution. If A, then B. Objectively right or wrong. May span multiple systems but every operation is rule-based.
L2 – Analytical. Multi-step reasoning with judgment calls. The agent interprets, synthesizes, and infers – not just follows joins.
L3 – Strategic. Proactive detection and cross-system coordination. The agent acts on signals without being explicitly prompted.
L4 – Autonomous. Self-directed optimization over extended periods with minimal human input. Novel pattern recognition and open-ended execution.
This taxonomy gives enterprises a shared language for evaluating agent maturity – both internally and across vendors.
From natural language to production in minutes
Start by describing what you want the agent to do. Agent Studio generates the initial structure. Extend with custom code only where your business demands it. The same canvas that a business analyst uses to define agent behavior is the same one a developer uses to add Python logic, AWS AgentCore integrations, or MCP connections.
Business users build the first version. Developers harden it. Operations teams monitor it. One platform, one canvas, one lifecycle.
Agent Studio is generally available as part of Computer by DevRev. The full technical architecture, including the L1–L4 benchmarking framework, is detailed in the Agent Studio whitepaper.
Frequently Asked Questions
Related Articles

Anirudh Shenoy

Michael Machado

Akhil Kintali

Abhinav Singh
Computer+ Apps
Our customers
Initiatives

