Cerebras Brings High-Speed AI Inference to Hugging Face Hub

Iris Kode

15 Mar 2025 — 1 min read

AI hardware company Cerebras has partnered with Hugging Face to integrate its powerful inference capabilities into the Hugging Face Hub, providing over 5 million developers with access to models running on Cerebras' CS-3 system.

The integration, now available on the Hugging Face platform, processes more than 2,000 tokens per second. Recent benchmarks show that models like Llama 3.3 70B running on Cerebras' system can reach speeds exceeding 2,200 tokens per second, significantly outperforming leading GPU-based solutions.

"By making Cerebras Inference available through Hugging Face, we are enabling developers to access alternative infrastructure for open source AI models," said Andrew Feldman, CEO of Cerebras.

For developers using the Hugging Face platform, the integration offers a streamlined way to leverage Cerebras' technology. Users can simply select "Cerebras" as their inference provider to instantly access what the companies describe as "one of the industry's fastest inference capabilities."

According to the companies, the demand for high-speed, high-accuracy AI inference is growing, particularly for test-time compute and agentic AI applications. Open source models optimized for Cerebras' CS-3 architecture enable faster and more precise AI reasoning, with claimed speed gains ranging from 10 to 70 times compared to GPUs.

"Cerebras has been a leader in inference speed and performance, and we're thrilled to partner to bring this industry-leading inference on open source models to our developer community," said Julien Chaumond, CTO of Hugging Face.

Developers can access Cerebras-powered AI inference by selecting supported models on Hugging Face, such as Llama 3.3 70B, and choosing Cerebras as their inference provider.

By John K. Waters, Campus Technology

Half the Internet Just Broke

Half the internet just broke because one company had a DNS problem in one datacenter. That's not acceptable for systems billions of people depend on daily.

Polymarket Isn’t Predicting the Future. It’s Liquidating Uncertainty.

Polymarket hit 95.2% accuracy in the last four hours. Not over a week. Not over cherry-picked markets. Four hours. Across all resolved prediction markets. "World record," tweeted Atlantislq. "No media in history ever hit that accuracy. 91.1% accuracy over a month, do you realize how

Jack Dorsey Says Signal Should Adopt Bitcoin. He’s Half Right.

Signal should use Bitcoin and or Monero

Do People Feel Compassion When Robots Are Mistreated? Hanyang University Looked Into It

A new study from Hanyang University explores how people react when they see someone mistreat a service robot. The findings reveal some mimic the bad behavior, while others feel empathy and step in. The difference? How human the robot seems.

Read more

Half the Internet Just Broke

Polymarket Isn’t Predicting the Future. It’s Liquidating Uncertainty.

Jack Dorsey Says Signal Should Adopt Bitcoin. He’s Half Right.

Do People Feel Compassion When Robots Are Mistreated? Hanyang University Looked Into It