The Next Big Shift in AI: Why It’s Not About Nvidia, According to Groq’s Founder
The artificial intelligence (AI) revolution has been synonymous with Nvidia for years. Its graphics processing units (GPUs) have powered the training of massive AI models, cementing Nvidia’s dominance with an 80% share of the high-end chip market and a market cap soaring past $3 trillion. But Jonathan Ross, founder and CEO of Groq, argues that the future of AI isn’t about Nvidia—or GPUs at all. In a recent Yahoo Finance Opening Bid podcast, Ross outlined why the next big shift in AI will be driven by inference, efficiency, and innovation beyond Nvidia’s ecosystem. Here’s why his vision matters and what it means for the AI landscape.
A New Era: From Training to Inference
AI development has two core phases: training, where models learn from vast datasets, and inference, where trained models generate responses or make predictions in real-world applications. Nvidia’s GPUs excel at training, crunching through enormous datasets with brute computational force. However, Ross believes the future lies in inference, as AI models are increasingly deployed in real-time applications like chatbots, medical diagnostics, and autonomous systems.
Groq’s Language Processing Units (LPUs) are designed specifically for inference, offering speed, affordability, and energy efficiency that GPUs can’t match for this purpose. Unlike Nvidia’s chips, which were originally built for graphics and later adapted for AI, LPUs are purpose-built for running large language models (LLMs). Ross claims Groq’s LPUs are up to four times faster, five times cheaper, and three times more energy-efficient than Nvidia’s GPUs for inference tasks. This focus on inference taps into a growing market—estimated at $39 billion in 2024 and projected to reach $60.7 billion by 2028—where speed and cost are critical.
The Bottleneck of GPUs
Nvidia’s GPUs, while powerful, face limitations in inference. Their architecture requires trade-offs between throughput (processing multiple tokens) and interactivity (delivering fast responses to individual users). Ross points out that Nvidia’s latest Blackwell chip, though a leap forward, still maxes out at 50 tokens per second for interactive tasks—too slow for next-gen real-time AI solutions like conversational agents or digital twins. Groq’s LPUs, by contrast, eliminate this trade-off, optimizing both throughput and interactivity through a unique “assembly line” architecture that partitions models across multiple chips for lightning-fast processing.
Moreover, Nvidia’s GPUs are power-hungry, with some models targeting 1,000 watts per chip. This raises sustainability concerns, as data centers consume vast amounts of electricity and water. Groq’s LPUs, built with on-chip SRAM (100x faster than GPU’s HBM memory), deliver high performance with lower power consumption, making them a greener alternative. As businesses and governments prioritize energy efficiency, this could give Groq a competitive edge.
The Printing Press of AI
Ross likens today’s AI landscape to the “printing press era”—a nascent stage where the technology’s potential is just beginning to unfold. He predicts that LLMs will soon make so few mistakes that they’ll be reliable enough for high-stakes fields like medicine and law. More excitingly, he foresees AI models evolving from picking “probable” answers to inventing novel solutions, much like Albert Einstein’s breakthroughs in physics. This shift from replication to creation could unlock new drugs, scientific discoveries, and creative outputs—use cases that demand fast, efficient inference.
Groq’s LPUs are positioned to power this transition. By focusing on smaller models (up to 70 billion parameters) and achieving speeds of 500–750 tokens per second, Groq enables real-time applications that Nvidia’s GPUs struggle to support. For instance, a demo by HyperWrite’s CEO showcased Groq serving Mixtral at nearly 500 tokens per second, producing “pretty much instantaneous” responses. Such performance opens doors to use cases like real-time financial trading, autonomous vehicles, and healthcare diagnostics, where latency is a dealbreaker.
Challenging Nvidia’s Ecosystem
Nvidia’s dominance isn’t just about hardware; it’s about its CUDA software platform, which has locked developers into its ecosystem. Groq counters this by making its platform developer-friendly, offering OpenAI-compatible APIs that require just three lines of code to switch from other providers. Its GroqCloud platform, launched in 2024, hosts open-source LLMs like Meta’s Llama, and independent benchmarks by ArtificialAnalysis.ai confirm Groq’s superior speed for these models. Ross boldly predicted in 2024 that most startups would adopt Groq’s LPUs by year’s end, citing their cost and performance advantages.
However, challenging Nvidia is no small feat. Nvidia’s roadmap, accelerated by AI-driven chip design, keeps competitors on their toes. Critics argue that Groq’s LPUs, while fast for smaller models, may face scalability issues with trillion-parameter models. Additionally, widespread adoption requires convincing developers to optimize for a new architecture, a hurdle given Nvidia’s entrenched ecosystem. Ross acknowledges this, noting, “We’re nowhere near Nvidia yet,” but sees an opening as businesses seek alternatives to Nvidia’s high costs and supply constraints.
Groq’s Momentum
Groq’s traction is undeniable. Founded in 2016 by Ross, a former Google engineer who co-designed the Tensor Processing Unit (TPU), the company has raised over $1 billion, including a $640 million Series D round in August 2024 led by BlackRock, valuing it at $2.8 billion (now $3.5 billion per Yahoo Finance). Partnerships with Samsung for 4nm chip manufacturing, Carahsoft for government contracts, and Earth Wind & Power for European data centers signal ambitious scaling plans. Groq aims to deploy over 108,000 LPUs by Q1 2025, bolstered by recent hires like Meta’s Yann LeCun as a technical advisor and ex-Intel executive Stuart Pann as COO.
Ross’s vision extends beyond competing with Nvidia. His recent trip to Saudi Arabia with tech executives and President Donald Trump underscores Groq’s global ambitions, including a $1.5 billion funding deal with the Kingdom to expand infrastructure. This aligns with Groq’s mission to democratize AI compute, making it accessible to startups, enterprises, and governments—not just tech giants.
The Road Ahead
Can Groq dethrone Nvidia? Probably not anytime soon. Nvidia’s ecosystem, scale, and innovation pace are formidable. But Groq doesn’t need to topple Nvidia to succeed—it needs to carve out a niche in the inference market, where demand for speed, affordability, and sustainability is surging. As Ross puts it, “Compute is the new oil,” and Groq’s LPUs are poised to fuel a wave of AI applications that GPUs weren’t designed for.
The AI shift Ross envisions is about more than chips; it’s about enabling a future where AI invents, solves, and transforms. If Groq delivers on its promise, it could redefine how we interact with AI, making Nvidia’s dominance a chapter in the story, not the conclusion. For now, as Ross says, we’re at the beginning of AI’s printing press era—and Groq is writing its own page.
Sources:
Yahoo Finance, “Groq’s founder on why AI’s next big shift isn’t about Nvidia”
Yahoo Finance, “Nvidia rival Groq makes a bold prediction on what’s next for AI”
VentureBeat, “AI chip race: Groq CEO takes on Nvidia”
Forbes, “The AI Chip Boom Saved This Tiny Startup”
Groq, “What NVIDIA Didn’t Say”
Trajectory Ventures on X
AI's next big shift isn't about Nvidia, says Groq founder Jonathan Ross. Here's why in a 🧵:
— Paramendra Kumar Bhagat (@paramendra) June 24, 2025
It used to be obvious. The pause, the stiffness, the telltale lag. Not anymore.