Notebookcheck Logo

Groq presents specialized language processing unit significantly faster than Nvidia's AI accelerators

Groq LPU (Image Source: Groq)
Groq LPU (Image Source: Groq)
The LPU Inference Engine from Groq is designed to be considerably faster than GPGPUs when processing LLM data. To achieve this, the LPU makes better use of sequential processing and is paired with SRAM instead of DRAM or HBM.

While Nvidia is currently enjoying outstanding profits as it rides the AI wave with the increasing demand for compute GPUs, the market could become more decentralized as more companies step in to provide viable alternative AI processors. We have seen efforts from several companies in this regard, including AMD, d-Matrix, OpenAI and Samsung. It looks like quite a few engineers who helped design Google’s tensor processing unit (TPU) are now involved in independent AI projects that promise to outclass Nvidia’s solutions. Samsung, for instance, recently announced that its new AGI Computing Lab opening in Silicon Valley is led by former Google TPU developer Dr. Woo Dong-hyuk. Another key engineer that helped with the development of the Google TPU is Jonathan Ross who is now the CEO of a new company called Groq. Harnessing the experience accumulated at Google, Ross brings innovation to the AI accelerator market with the world’s first Language Processing Unit (LPU).

Groq's LPU is specifically designed to process large language models (LLMs) and has clear advantages over general purpose GPUs or NPUs. Groq initially developed the Tensor Stream Processor (TSP), which was later rebranded as language processing unit to reflect its increased proficiency at Generative AI tasks based on inference. Since it is focused solely on LLMs, the LPU is much more streamlined than a GPGPU and allows for simplified scheduling hardware with lower latency, sustained throughput and increased efficiency.

Consequently, the LPU reduces the amount of time per word calculated, and sequences of text can be generated much faster. Another key improvement is that the LPU eliminates the need for expensive memory (HBM), as it uses only 230 MB SRAM per chip with 80 TB/s bandwidth, making it considerably faster than traditional GPGPU solutions. Groq’s architecture also supports scalability, as multiple LPUs can be interconnected to provide increased processing power for more complex LLMs.

To demonstrate how much faster the LPU Inference Engine is compared to GPUs, Groq is providing a video comparison of its own chatbot that can switch between the Llama 2 / Mixtral LLMs versus OpenAI’s Chat-GPT. Groq claims that the LLM is generating the text in a fraction of a second and the other 3 ⁄ 4 of the time is spent searching for relevant information.

Single chip accelerator (Image Source: Groq)
Single chip accelerator (Image Source: Groq)

Source(s)

Read all 1 comments / answer
static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2024 02 > Groq presents specialized language processing unit significantly faster than Nvidia's AI accelerators
Bogdan Solca, 2024-02-28 (Update: 2024-02-28)