Decart raises $300M, partners with Nvidia to optimize AI models

Decart has raised about $300 million in early 2026 and is working with Nvidia to help optimize AI models for different types of computer chips. The company says its platform may speed up the time needed to tune AI models and can work across many types of hardware, like Nvidia, Amazon, and Google chips. Early reports suggest Decart's tools might lower costs by a large amount, but there is uncertainty about whether these results will last as technology changes. Amazon is also testing Decart's platform, which may signal trust in its ability to work on many chips. Whether Decart's cost savings continue as chips and memory become more complex will likely impact how many companies use it in the future.

With a new funding round and a strategic partnership with Nvidia, Decart is positioned to help companies optimize AI models across a wide array of hardware. The Decart Optimization Stack (DOS) is gaining traction among AI teams seeking efficient methods to run large models on accelerators from Nvidia, Google, and Amazon. The company's platform promises to condense months of complex kernel tuning into just weeks, a claim detailed on its Decart AI website. This article explores Decart's technology, the industry challenges it addresses, and its growing market validation.

What is the Decart Optimization Stack (DOS)?

Decart's Optimization Stack (DOS) is a software platform built to make AI models run faster and more cheaply across diverse hardware. It helps developers tune complex models for chips from Nvidia, Google, and Amazon, aiming to dramatically reduce the costs and timelines for AI inference and training.

Decart defines its product as a "vertically integrated inference and training platform" for demanding AI workloads, including LLMs, agentic systems, and video models. The stack integrates proprietary compilers, custom kernels, and hardware-aware model architecture. According to a report from Alpha Partners blog, early customers report significant cost reductions when using AWS Trainium and other accelerators.

Core Platform Capabilities

Hardware-Agnostic Compilation: Targets a wide range of accelerators, including GPUs, TPUs, AWS Trainium, and AMD silicon.
Real-Time Performance: Engineered to meet sub-100ms latency requirements for interactive agentic and video applications.
Integrated Performance Monitoring: Features built-in tools to benchmark kernel performance and track regressions across different hardware generations.
Continuous Updates: The platform's compiler and kernel libraries are consistently updated to support new silicon releases.

Addressing Hardware Fragmentation and Industry Constraints

The primary challenge Decart addresses is running software consistently across a fragmented hardware landscape. According to industry reports, toolchains often vary significantly between GPU and NPU vendors. This complexity is compounded by memory bandwidth limitations and interconnect bottlenecks. Furthermore, constraints on power and data center capacity are driving a shift toward hybrid deployments that mix cloud, edge, and sovereign infrastructure.

Funding Round and Strategic Alliances

Decart has secured significant funding in a recent round that includes Nvidia as both an investor and a strategic partner. Amazon has also been onboarded as a strategic customer, signaling strong confidence from a major cloud provider in the Decart stack's ability to perform across different chip architectures.

What This Means for AI Developers

Decart claims its DOS platform can unlock peak performance from any chip and slash serving costs by an order of magnitude. If these claims prove consistent, developer teams grappling with GPU scarcity or strict latency budgets could achieve significant benefits:

Drastically shorten model optimization timelines.
Maximize throughput on their current hardware infrastructure.
Achieve lower per-inference costs, making new applications economically viable.

Future Outlook and Key Signals

According to industry reports, software-driven optimizations like quantization and advanced compiler tuning are becoming critical drivers of AI performance. For Decart, the key test will be its ability to maintain its impressive cost-reduction curve as hardware evolves. Its long-term enterprise adoption and future funding will likely depend on how effectively the DOS platform adapts to new chiplet designs and more complex memory hierarchies.

What is Decart's new funding and valuation?

Decart has secured significant fresh capital with Nvidia joining as an investor and business partner. The company has raised substantial funding across multiple rounds, positioning it as a well-funded player in the AI optimization space.

Why is the Nvidia partnership significant?

Beyond the capital injection, Nvidia's dual role as investor and hardware partner accelerates co-engineering on GPU kernels, compilers, and memory-aware scheduling for real-time AI workloads. The collaboration lets Decart tune models specifically for Nvidia GPUs while still offering cross-vendor portability through its Decart Optimization Stack (DOS).

How does Decart claim to cut AI inference costs?

Using DOS, the company says it has driven significant reductions in generative content costs, and customers report substantial cost reductions on AWS Trainium and other accelerators. The gains come from hardware-aware quantization, pruned and distilled models, and custom kernel compilation that squeezes more throughput per chip.

Who are Decart's enterprise customers?

In addition to Amazon signing as a strategic customer, Decart states it already has contracts with several of the world's largest cloud providers, AI laboratories, and hyperscale companies. Use-cases span media generation, commerce, advertising, and physical AI applications.

What makes Decart's stack different from other optimizers?

DOS is described as a vertically integrated inference and training platform that covers hardware-aware model design, kernel tooling, proprietary compilers, and inference optimization. It is purpose-built for real-time, low-latency agents, video, and world models, rather than legacy batch-oriented pipelines, and supports GPUs, TPUs, Trainium, AMD, and emerging chiplets in a single workflow.