Agent Bricks Launches Service to Scale AI Agents on Databricks

Serge Bulaev

Serge Bulaev

Agent Bricks has launched a new service that helps companies quickly build, test, and improve AI agents using Databricks. The service, shown at the 2025 Data + AI Summit, helps keep costs low and makes it easy to deploy reliable AI agents even as demand grows. It uses smart judges and test data

Agent Bricks Launches Service to Scale AI Agents on Databricks

Agent Bricks has launched a new service designed to build, evaluate, and scale production AI agents on the Databricks Data Intelligence Platform. Unveiled at the 2025 Data + AI Summit, the beta release accelerates the path from prototype to governed, cost-effective deployment.

Many enterprises face challenges maintaining AI agent reliability under heavy traffic. According to a LangChain survey, only 52% of teams use formal evaluation, and less than a third are satisfied with their safety guardrails. Agent Bricks addresses these issues with automated LLM judges, synthetic data generation, and serverless endpoints that scale to zero.

What the new service delivers

Agent Bricks is a managed service on the Databricks platform that streamlines the creation, testing, and deployment of reliable, production-ready AI agents. It automates agent evaluation using custom benchmarks and LLM judges to ensure high performance, security, and cost control for enterprise-specific tasks and data.

  • Automated Benchmarking: Generates task-aware synthetic benchmarks that reflect your unique data.
  • AI-Powered Evaluation: Uses LLM judges to grade agent outputs for accuracy, containment, and cost.
  • Performance Optimization: Systematically searches prompts, fine-tunes, and models to achieve target KPIs.
  • Native Databricks Integration: Integrates seamlessly with Databricks Unity Catalog for security and vector search.

How the Workflow Operates

  1. Define the Task: Connect enterprise data sources and describe a business objective, such as invoice data extraction.
  2. Generate Candidates: Agent Bricks automatically creates and tests dozens of agent variations against domain-specific metrics.
  3. Evaluate and Refine: Automated judges identify underperforming agents, while subject-matter experts can provide targeted feedback to refine performance.
  4. Deploy with Confidence: The top-performing agent is promoted to a managed endpoint, with complete data lineage captured for auditing.

Early adopters are already reporting significant gains. The North Dakota University System reduced manual tuning time by 30 days for its legislative document parsing workflow, as noted on the official Databricks product page. Similarly, RV-parts manufacturer Lippert accelerated the deployment of a multi-agent knowledge assistant built to answer queries from its standard operating procedures.

Why governance is baked in

Robust governance is a core pillar of the service, addressing a common obstacle in AI deployment. With frameworks often changing faster than compliance cycles, Agent Bricks provides stability by locking evaluations, prompts, and scoring logic into versioned artifacts that align with ISO 27001 controls. A new Judge Builder also allows companies to codify internal policies without writing code, a key feature in its governed evaluation capabilities detailed by Prolifics.

Early use cases gaining traction

Use case Key metric Reported benchmark
Information extraction Goal accuracy 85 percent on supplier PDFs
Knowledge assistant Containment rate 90 percent on SOP queries
Text transformation Latency Under 400 ms at 50 QPS

These initial benchmarks meet the enterprise-grade evaluation targets outlined in Dataiku's 2025 guidance, which calls for 70-90% containment and hallucination rates below 5% for most production use cases.

Agent Bricks is available in open beta on Azure and AWS through the first half of 2026. The service uses a pay-as-you-go pricing model based on token consumption and judge executions. At the summit keynote, Databricks CEO Ali Ghodsi described the launch as "a whole new way of building and deploying AI agents that can reason on your data," highlighting the demand for tools that deliver audited business value from AI.


What is Agent Bricks and why did Databricks build it?

Agent Bricks is a Databricks product that automatically builds, evaluates and optimizes production-grade AI agents grounded in your own enterprise data. It exists because the vast majority of agents built today never make it to production: only 57% of organizations have agents running live, and fewer than one in three teams are happy with their quality. Agent Bricks tackles this gap by turning a high-level task description into a governed, cost-optimized agent without manual prompt tuning or weeks of experimentation.

How does Agent Brics ensure my agent will actually work on real tasks?

The platform creates task-specific evaluations and LLM judges automatically, then generates synthetic data that mirrors your real documents so the agent is tested on scenarios it will meet in production. Metrics like containment rate, goal accuracy and hallucination rate are tracked from day one, and the system keeps iterating through techniques such as prompt engineering, fine-tuning and Agent Learning from Human Feedback (ALHF) until business-set thresholds are met. Customers such as the North Dakota University System already saved 30 days of manual optimisation when parsing legislative documents.

What governance features are baked in?

Governance is not an afterthought. Agent Brics inherits Databricks' Unity Catalog security, provides full audit trails, and ships with Agent-as-a-Judge, Tunable Judges and Judge Builder so compliance teams can set and track policies inside the same UI developers use. Endpoints automatically scale to zero after three idle days to control cost, while lineage and rollback buttons satisfy ISO 27001 and SOC 2 requirements without extra tooling.

Which business problems can I hand to an Agent Brics agent today?

Typical starter use cases include:
- Information extraction - turn supplier PDFs into structured product tables for retail teams
- Knowledge assistant - answer engineers with cited snippets from manuals instead of generic chatbot text
- Multi-agent orchestration - chain agents to summarise support calls and surface insight dashboards, as demonstrated by Lippert

Each template is pre-optimised for quality-cost balance and can be deployed with one click to a Databricks serverless endpoint.

How do I try it and what should I prepare?

Agent Brics entered public beta in June 2025 and is available today on Azure and AWS Databricks workspaces. Bring a task description, a data connection (PDF folders, Delta tables, vector indexes) and a success metric such as "≥85% of questions answered without escalation." A five-minute demo video walks through the full flow, and you can open the free trial directly inside your existing Databricks console.

Serge Bulaev

Written by

Serge Bulaev

Founder & CEO of Creative Content Crafts and creator of Co.Actor — an AI tool that helps employees grow their personal brand and their companies too.