Way to run DeepSeek's 671B AI model without expensive GPUs discovered

Image source: Aristal, Pixabay

Hugging Face engineer Matthew Carrigan recently revealed on X a method to locally run DeepSeek's advanced R1 model with 8-bit quantization, eliminating the need for expensive GPUs, for a reported cost of $6,000. The key? Having a lot of memory as opposed to vast computing power reserves.

Daniel Miron, Published 02/04/2025 🇫🇷 🇪🇸 ...

AI Software

Launched on January 20, 2025, DeepSeek-R1 is a 671B parameter Mixture-of-Experts (MoE) model with 37B active parameters per token. Designed for advanced reasoning, it supports 128K token inputs and generates up to 32K tokens. Thanks to its MoE architecture, it delivers top-tier performance while using fewer resources than traditional dense models.

Independent testing suggests that the R1 language model achieves performance comparable to OpenAI’s O1, positioning it as a competitive alternative in high-stakes AI applications. Let`s find out what we need to run it locally.

The hardware

This build centers around dual AMD Epyc CPUs and 768GB of DDR5 RAM—no expensive GPUs needed.

Case: Enthoo Pro 2 Server
Motherboard: Gigabyte MZ73-LM0 or MZ73-LM1 (has two CPU sockets & 24 RAM slots)
CPU: 2x AMD Epyc 9004/9005 (9115 or 9015 work as more budget-friendly options)
Cooling: Arctic Freezer 4U-SP5
RAM: 24x 32GB DDR5 RDIMM (768GB total)
Storage: 1TB+ NVMe SSD (to quickly load 700GB of model weights)
Power Supply: Corsair HX1000i (1000W, plenty for dual CPUs)

Software & Setup

Once assembled, Linux and llama.cpp need be installed in order to run the model. A crucial BIOS tweak, setting NUMA groups to 0, doubles RAM efficiency for better performance. The full 700GB of DeepSeek-R1 weights can be downloaded from Hugging Face.

Performance

This setup generates 6-8 tokens per second—not bad for a fully local high-end AI model. It skips GPU entirely, but that’s intentional. Running Q8 quantization (for high quality) on GPUs would require 700GB+ of VRAM, costing over $100K. Despite its raw power, the entire system consumes under 400W, making it surprisingly efficient.

For those who want full control over frontier AI, no cloud, no restrictions, this is a game changer. It proves that high-end AI can be run locally, in a fully open-source fashion, while prioritizing data privacy, minimizing vulnerabilities to breaches, and eliminating reliance on external systems.

Source(s)

Matthew Carrigan on X, Docsbot, DeepSeek, teaser image: Pixabay

DeepSeek continues to improve its top-ten ranked R1 AI LLM model with its latest update. (Image source: DeepSeek)

DeepSeek releases updated R1 model with improved AI performance and fewer hallucinations 05/29/2025

DeepSeek-V3-0324 released with improved performance and capabilities. (Image source: DeepSeek)

DeepSeek releases improved V3 AI model three months after initial launch 03/28/2025

Youdao has launched the SpaceOne dictionary pen in China. (Image source: JD.com)

New SpaceOne dictionary pen with DeepSeek-R1 deep reasoning arrives 02/24/2025

OpenAI unveils faster o3-mini AI LLM that outperforms prior o1-mini models. (Image source: AI-generated by Dall-E 3)

OpenAI launches smarter o3-mini AI with free ChatGPT access 02/01/2025

Invoke AI is a painless way of running a range of image generation models on your own hardware (Image Source: Invoke AI)

How to host your own AI image generator with Invoke AI and Stable Diffusion 01/31/2025

AI-model Qwen 2.5 stands victorious over Deepseek (Image Source: AI Generated)

Alibaba's AI model Qwen 2.5 Max emerges victorious over Deepseek 01/30/2025

Loading Comments

Comment on this article

Google is working on bringing Adapt...

PNY launches three new flash drives...

Daniel Miron - Tech Writer - 8 articles published on Notebookcheck since 2025

Ever since I was a kid, I’ve been obsessed with computers and technology. I knew it was the key to exploring new possibilities, even before I had my own PC. I’d read every tech magazine I could get my hands on, eager to learn more. When I finally got my first computer, the first thing I did was take it apart and put it back together the next day—just to see how it all worked. That curiosity never faded. Over 20 years later, I’m still just as excited about the latest innovations and what the future holds.

Please share our article, every link counts!