OpenAI, Broadcom unveil Jalapeño chip: 4x faster, 50% cheaper AI inference

OpenAI and Broadcom have introduced a new Jalapeño chip that may offer up to 4 times higher efficiency and about 50 percent lower cost for AI tasks compared to current GPUs, according to their own tests. These results are self-reported and still need to be confirmed by outside sources. The chip is designed to save energy and work better with AI models, and it appears to fit a new trend of companies making their own chips to reduce costs. Experts suggest that these custom chips could make AI much cheaper in the next few years, though using them may also require new skills and careful planning. Whether Jalapeño's benefits will hold up in real-world use is still uncertain until more independent tests are done.

OpenAI and Broadcom have unveiled Jalapeño, a custom-built inference processor designed specifically for large language model workloads. The announcement marks OpenAI's first significant step into hardware vertical integration, with the chip promising substantial improvements in cost efficiency and performance for AI inference.

What is the Jalapeño chip and who developed it?

The Jalapeño chip is a custom-built AI processor, an Application-Specific Integrated Circuit (ASIC), co-developed by OpenAI and Broadcom. Manufactured by TSMC, it is specifically designed for AI inference tasks to improve efficiency and lower costs compared to general-purpose hardware.

Jalapeño is a custom Application-Specific Integrated Circuit (ASIC) co-developed by OpenAI and Broadcom, with manufacturing handled by TSMC on a 3nm process. The chip represents a nine-month development cycle - remarkably fast for silicon of this complexity - and is built around a systolic array architecture optimized for general Transformer/LLM inference workloads. It integrates HBM memory directly on-package to eliminate data bottlenecks and is programmed using custom kernels optimized for its systolic array architecture.

Engineering samples are already operating in labs, with production deployment expected in late 2026.

How does Jalapeño's performance compare to existing AI hardware like NVIDIA's H100?

According to self-reported internal benchmarks, Jalapeño achieves:

Metric	Claimed Performance vs. NVIDIA H100
Throughput per watt	Approximately 2.4x higher for inference workloads
Inference cost	~50% cheaper
Latency	~66% lower (2.9x improvement), with improved stability

These figures come with an important caveat: no independent third-party technical report or full benchmark suite has been published as of mid-2026. OpenAI has promised detailed specifications and verified results in the coming months.

Why did OpenAI choose to build custom silicon instead of using off-the-shelf GPUs?

The decision reflects a broader industry shift toward hardware-software co-design. General-purpose GPUs like the H100 carry inefficiencies when running transformer workloads - graphics pipelines and flexible compute units are overbuilt for the specific math operations LLMs require. By hardwiring systolic arrays for attention and feed-forward layers, Jalapeño eliminates this overhead.

This follows a pattern among hyperscalers: Google (TPU v7 "Ironwood"), AWS (Trainium 3), Microsoft (Maia 200), and Meta (MTIA v4 "Santa Barbara") have all deployed custom accelerators. Industry reports suggest NVIDIA's dominance in the inference market faces growing competition as specialized silicon gains adoption across production workloads.

What does Jalapeño mean for enterprise AI costs and deployment strategies?

The implications are significant for organizations running large-scale language models:

Cost structure shift: Custom ASICs offer potential cost advantages over GPUs for inference, with the gap potentially widening as model architectures stabilize
From renting to owning: As inference costs fall sharply, self-hosted AI stacks are becoming more economically compelling than per-seat cloud API access
Hardware-agnostic architectures: Enterprises increasingly need model-agnostic platforms that can route inference to whatever silicon offers the best price-performance, rather than locking into single-vendor ecosystems

Industry data shows inference workloads represent the primary cost driver for enterprise AI spending rather than training - making efficiency at this layer critical to overall AI economics.

What are the limitations and unknowns about Jalapeño?

Several important constraints should be noted:

Inference-only: Jalapeño is not designed for model training - it targets inference operations for LLM workloads
Self-reported metrics: All performance claims remain unverified by independent testing pending OpenAI's promised technical report
Availability timeline: Production deployment is still months away (late 2026), with no confirmed pricing or external customer access model

For enterprises evaluating AI infrastructure today, the strategic imperative is flexibility: building stacks that can adapt as new specialized accelerators like Jalapeño reach market, rather than making long-term commitments to any single hardware architecture.