Baseten built the world's fastest API for GLM-5.2
It achieves over 280 tokens per second through optimizations like NVFP4 quantization, KV-aware routing, PD disaggregation, and Multi-Token Prediction.
See the latest news and media coverage for Baseten. We track all announcements, press releases, and industry mentions in real time, all in one place.
AI model inference platform
baseten.coLast updated
In short: Baseten expanded its inference platform with frontier model support and technical optimizations while entering reported talks for a $1 billion raise.
It achieves over 280 tokens per second through optimizations like NVFP4 quantization, KV-aware routing, PD disaggregation, and Multi-Token Prediction.
Revenue grew 20x and inference volume 40x, with participation from Altimeter, Conviction, and Spark.
The feature updates models incrementally without doubling GPU spend and offers pause, resume, and rollback controls.
Mercury 2 runs over 1000 tokens per second on NVIDIA GPUs, at half the cost of comparable models, with 90% cost reduction for Augment Code.
The round was led by U.S.-based investors Sands Capital and Wellington Management, Baseten said late on Monday...
US News is a recognized leader in college, grad school, hospital, mutual fund, and car rankings. Track elected officials, research health conditions, and find news...
AI startup Baseten has recently been in talks with investors to raise $1 billion at an $11 billion valuation including the money, according to a...
Nvidia invested $150 million in Baseten, which helps companies deploy and run large AI models.
Track Baseten and your other target companies to get real-time alerts and weekly summaries delivered straight to your inbox.
Browse news for competitors to Baseten and other trending companies.