Content.Fans
  • AI News & Trends
  • Business & Ethical AI
  • AI Deep Dives & Tutorials
  • AI Literacy & Trust
  • Personal Influence & Brand
  • Institutional Intelligence & Tribal Knowledge
No Result
View All Result
  • AI News & Trends
  • Business & Ethical AI
  • AI Deep Dives & Tutorials
  • AI Literacy & Trust
  • Personal Influence & Brand
  • Institutional Intelligence & Tribal Knowledge
No Result
View All Result
Content.Fans
No Result
View All Result
Home AI Deep Dives & Tutorials

LongCat-Flash-Chat: Meituan’s 560B MoE Model Reshaping Enterprise AI

Serge Bulaev by Serge Bulaev
September 3, 2025
in AI Deep Dives & Tutorials
0
LongCat-Flash-Chat: Meituan's 560B MoE Model Reshaping Enterprise AI
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter

LongCat-Flash-Chat is a huge AI model made by Meituan, with 560 billion parts inside. It’s super fast, can handle really long texts, and costs less to use than most rivals. Businesses are already using it for things like better delivery routes and answering customer questions faster. The model is open-source, so anyone can try it, and it could change how companies use AI in the future.

What is LongCat-Flash-Chat and why is it significant for enterprise AI?

LongCat-Flash-Chat is a 560-billion-parameter open-source Mixture-of-Experts (MoE) language model by Meituan, offering faster inference, long context handling, and low-cost API access. It powers real deployments in logistics and enterprise SaaS, reshaping large-scale AI applications and pricing in 2025.

Newsletter

Stay Inspired • Content.Fans

Get exclusive content creation insights, fan engagement strategies, and creator success stories delivered to your inbox weekly.

Join 5,000+ creators
No spam, unsubscribe anytime

In September 2025, Meituan quietly flipped the switch on LongCat-Flash-Chat , a 560-billion-parameter Mixture-of-Experts (MoE) language model that is already reshaping how large-scale AI is built, priced, and deployed. Below is what practitioners, investors, and researchers are watching – without hype, just the numbers and design choices that matter.

1. A quick anatomy of the model

Attribute Value / Range What it means in practice
Total parameters 560 B Largest Chinese open-source model to date
Active parameters/token 18.6 B – 31.3 B (avg ≈ 27 B) Cheaper inference than most 100 B+ dense models
Pre-training corpus ~20 T tokens Comparable to GPT-4-scale data volume
Context length (fine-tune) 32 k → 128 k tokens Enables long doc QA or multi-turn agent loops
Inference speed >100 tok/s on H800 Roughly 3× faster than many 70 B dense models
API price floor (public) $0.70 per 1 M tokens Undercuts several commercial tier-1 endpoints

Sources:
– arXiv technical report (§3.4 & §5.2)
– OpenSourceForU coverage

2. MoE tricks that actually move the needle

Instead of a generic “throw more GPUs at it” approach, the engineering team baked in four concrete optimizations:

  1. Per-layer sub-block
    Two attention blocks + FFN + MoE gate in every layer keeps tensor-parallel communication patterns simple.

  2. Zero-Compute “sink” expert
    Tokens scoring below a routing threshold skip heavy computation entirely, shaving ~8 % off average latency.

  3. dsv3-like load-bias
    A lightweight bias term prevents the classic “expert 0” hot-spot without extra all-reduce traffic.

  4. Inter-layer cross-channel pathways
    Overlaps MoE all-to-all with attention matmuls, cutting bubble time during both training and inference.

3. Benchmark snapshot (late-2025 runs)

Benchmark Score Peer comparison note
TerminalBench (agent) 39.5 Ties DeepSeek-Prover-7B-preview on math-heavy turns
τ²-Bench 67.7 +2.4 pts over Qwen3-72B-Instruct
Safety suite avg. 87 % 83–94 % across 4 categories

Raw numbers are from an internal eval deck reproduced in the SCMP write-up.

4. Real deployments already live

  • Meituan logistics stack – Route-planning agents shaved 11 % off average delivery time in the first 6 weeks of pilot (Futunn news wire, 1 Sep 2025).
  • Enterprise SaaS connectors – At least 3 Chinese CRM vendors have rolled LongCat into ticket deflection bots, citing the $0.70/million token rate as the decisive factor.

5. How to access or reproduce

LongCat-Flash-Chat is Apache-2.0 licensed and distributed through:

  • GitHub: github.com/meituan-longcat/LongCat-Flash-Chat
  • Hugging Face: meituan-longcat/LongCat-Flash-Chat
  • Docker images: longcat.ai/inference:latest (includes SGLang backend)

Weights are BF16 shards totaling 1.1 TB; an 8×H800 node loads in ~9 minutes. A quantized INT8 variant drops VRAM usage to 320 GB without measurable accuracy loss on the reported benchmarks.

6. What to watch next

The team’s roadmap – outlined in the same arXiv report – lists:

  • 256 k context fine-tune (no YaRN)
  • Tool-calling grammar compiler (targeting OpenAI-compatible endpoints)
  • Community LoRA hub under discussion

If you are benchmarking MoE models for production, LongCat-Flash-Chat now sits at the intersection of lowest publicly documented dollar-per-token cost and top-quartile agent-task performance.


What makes LongCat-Flash-Chat special compared with other 560 B models?

Its Mixture-of-Experts (MoE) design keeps only 18.6 B–31.3 B parameters active per token (average ~27 B) while storing 560 B in total. That is 5–15× less active compute than a dense model of the same size yet it still hits competitive scores: 39.5 on TerminalBench and 67.7 on τ²-Bench. In practical terms, inference costs drop to ≈ $0.70 per million tokens – a price point few open-source models at this scale have reached.

Is the model really open-source for commercial use?

Yes. Meituan published the weights, tokenizer, config files, and a detailed 70-page technical report under a permissive license. You can download everything from GitHub, Hugging Face, or the official site longcat.ai without registration fees or usage restrictions.

How fast is inference in production?

Official benchmarks show >100 tokens / s on a single H800 GPU with >90 % speculative acceptance. At that speed a 2 000-token chat turn streams back in under 20 seconds on commodity hardware.

Can it handle long documents?

The context window extends to 32 k tokens out-of-the-box and 128 k tokens after a light continued-pre-training step on ~100 B tokens – no YaRN or other tricks required. Early adopters report accurate summarisation of 80-page PDFs in one pass.

What real-world tasks is it already solving inside Meituan?

Inside Meituan’s own stack the model powers:

  • Logistics route planning – optimising millions of delivery paths nightly
  • Customer-support agents – handling 40 % of chat volume with higher CSAT than the previous pipeline
  • Code generation – internal surveys show 28 % faster MR merge times when developers use the built-in coding assistant

These workloads run on the same open weights, proving the efficiency claims are not just lab numbers.

Serge Bulaev

Serge Bulaev

CEO of Creative Content Crafts and AI consultant, advising companies on integrating emerging technologies into products and business processes. Leads the company’s strategy while maintaining an active presence as a technology blogger with an audience of more than 10,000 subscribers. Combines hands-on expertise in artificial intelligence with the ability to explain complex concepts clearly, positioning him as a recognized voice at the intersection of business and technology.

Related Posts

GEO: How to Shift from SEO to Generative Engine Optimization in 2025
AI Deep Dives & Tutorials

GEO: How to Shift from SEO to Generative Engine Optimization in 2025

December 11, 2025
How to Build an AI-Only Website for 2025
AI Deep Dives & Tutorials

How to Build an AI-Only Website for 2025

December 10, 2025
CMS AI Integration: How Editors Adopt AI in 7 Steps
AI Deep Dives & Tutorials

CMS AI Integration: How Editors Adopt AI in 7 Steps

December 9, 2025
Next Post
MarketingProfs Unveils Advanced AI Tracks: Essential Skills for the Evolving B2B Marketing Landscape

MarketingProfs Unveils Advanced AI Tracks: Essential Skills for the Evolving B2B Marketing Landscape

The Open-Source Paradox: Sustaining Critical Infrastructure in 2025

The Open-Source Paradox: Sustaining Critical Infrastructure in 2025

{"title": "Actionable AI Literacy: Empowering the 2025 Professional Workforce"}

Actionable AI Literacy: Empowering the 2025 Professional Workforce

Follow Us

Recommended

AI Teammates Boost Productivity, Cut Costs for Enterprises

AI Teammates Boost Productivity, Cut Costs for Enterprises

2 weeks ago
AI Bots Threaten Social Feeds, Outpace Human Traffic in 2025

AI Bots Threaten Social Feeds, Outpace Human Traffic in 2025

1 month ago
Claudia: A Practical Enterprise Field Guide to the Open-Source Desktop GUI for Claude Code

Claudia: A Practical Enterprise Field Guide to the Open-Source Desktop GUI for Claude Code

4 months ago
midjourney aivideo

When AI Chases Smoke: The New Era of Particle Physics in Video

5 months ago

Instagram

    Please install/update and activate JNews Instagram plugin.

Categories

  • AI Deep Dives & Tutorials
  • AI Literacy & Trust
  • AI News & Trends
  • Business & Ethical AI
  • Institutional Intelligence & Tribal Knowledge
  • Personal Influence & Brand
  • Uncategorized

Topics

acquisition advertising agentic ai agentic technology ai-technology aiautomation ai expertise ai governance ai marketing ai regulation ai search aivideo artificial intelligence artificialintelligence businessmodelinnovation compliance automation content management corporate innovation creative technology customerexperience data-transformation databricks design digital authenticity digital transformation enterprise automation enterprise data management enterprise technology finance generative ai googleads healthcare leadership values manufacturing prompt engineering regulatory compliance retail media robotics salesforce technology innovation thought leadership user-experience Venture Capital workplace productivity workplace technology
No Result
View All Result

Highlights

New AI workflow slashes fact-check time by 42%

XenonStack: Only 34% of Agentic AI Pilots Reach Production

Microsoft Pumps $17.5B Into India for AI Infrastructure, Skilling 20M

GEO: How to Shift from SEO to Generative Engine Optimization in 2025

New Report Details 7 Steps to Boost AI Adoption

New AI Technique Executes Million-Step Tasks Flawlessly

Trending

xAI's Grok Imagine 0.9 Offers Free AI Video Generation
AI News & Trends

xAI’s Grok Imagine 0.9 Offers Free AI Video Generation

by Serge Bulaev
December 12, 2025
0

xAI's Grok Imagine 0.9 provides powerful, free AI video generation, allowing creators to produce highquality, watermarkfree clips...

Hollywood Crew Sizes Fall 22.4% as AI Expands Film Production

Hollywood Crew Sizes Fall 22.4% as AI Expands Film Production

December 12, 2025
Resops AI Playbook Guides Enterprises to Scale AI Adoption

Resops AI Playbook Guides Enterprises to Scale AI Adoption

December 12, 2025
New AI workflow slashes fact-check time by 42%

New AI workflow slashes fact-check time by 42%

December 11, 2025
XenonStack: Only 34% of Agentic AI Pilots Reach Production

XenonStack: Only 34% of Agentic AI Pilots Reach Production

December 11, 2025

Recent News

  • xAI’s Grok Imagine 0.9 Offers Free AI Video Generation December 12, 2025
  • Hollywood Crew Sizes Fall 22.4% as AI Expands Film Production December 12, 2025
  • Resops AI Playbook Guides Enterprises to Scale AI Adoption December 12, 2025

Categories

  • AI Deep Dives & Tutorials
  • AI Literacy & Trust
  • AI News & Trends
  • Business & Ethical AI
  • Institutional Intelligence & Tribal Knowledge
  • Personal Influence & Brand
  • Uncategorized

Custom Creative Content Soltions for B2B

No Result
View All Result
  • Home
  • AI News & Trends
  • Business & Ethical AI
  • AI Deep Dives & Tutorials
  • AI Literacy & Trust
  • Personal Influence & Brand
  • Institutional Intelligence & Tribal Knowledge

Custom Creative Content Soltions for B2B