For most of the last decade, progress in artificial intelligence followed a simple rule: bigger is better. More parameters, more training data, more compute — each jump in scale unlocked new capabilities, and the leaderboard belonged to whoever could afford the largest model. That era is not over, but it is no longer the only story worth telling.
The case for going small
Small language models, often a fraction of the size of their general-purpose cousins, are quietly outperforming much larger systems on narrow, well-defined tasks. A customer-service bot trained only on a company’s own documentation does not need to know the history of the Roman Empire. It needs to answer questions about return policies accurately, quickly, and cheaply — and a focused model often does that better than a sprawling one.
The economics make the case on their own. Running a small model can cost a fraction of what a frontier-scale model costs per query, and it can run on far more modest hardware, sometimes even on-device. For a business deciding whether to embed AI into a product, that difference between cents and fractions of a cent per interaction, multiplied across millions of users, is not a rounding error. It is the difference between a sustainable feature and a budget line that gets cut.
What gets lost — and what doesn’t
Smaller models do trade away some general knowledge and some of the emergent reasoning ability that shows up at larger scale. They are not going to replace frontier models for open-ended research, creative writing across arbitrary domains, or tasks that require synthesizing unfamiliar information on the fly.
But a large share of real-world AI use is not that. It is classification, extraction, summarization within a known domain, and structured assistance — tasks where a smaller, fine-tuned model can match or exceed a general-purpose system, because it is not spending its limited attention on knowledge it does not need.
The practical shift already underway
Several companies building AI products report a similar pattern: prototype with the largest available model to establish what is possible, then narrow down to the smallest model that reliably hits the bar, once the task is well understood. The large model becomes a research tool; the small model becomes the product.
This has quiet implications for accessibility too. Small models can run locally, without sending data to a third-party server, which matters for privacy-sensitive applications in health care, legal work, and finance. They can also run in places without reliable high-bandwidth internet, expanding who can realistically build on AI at all.
The bottom line
The frontier-scale race will continue, and it should — those models are where genuinely new capabilities tend to surface first. But the next wave of AI products that actually reach people may not come from whoever has the biggest model. It may come from whoever figures out how small they can go without losing what matters.