Launching Not Diamond

July 30,  2024

Today, after months of development and testing, we’re releasing general availability for Not Diamond—the world’s most powerful AI model router. Not Diamond automatically determines which LLM is best-suited to respond to any query, improving LLM output quality while reducing costs and latency. Not Diamond is available to use out of the box through our API, and we are also releasing an interface that allows developers to train their own custom routers using their data.

Not Diamond sets new state-of-the-art standards on major benchmarks like GPQA, MMLU, HumanEval, Arena Hard, and Big Bench Hard. For any distribution of data, no model will outperform every other model on every single query. By combining multiple LLMs into a “meta-model” that learns when to call each LLM, we can outperform every individual model by routing effectively between them.

We understand that benchmarks are only weakly correlated with real-world performance. However, these results powerfully illustrate how for any distribution of data—including some of the most challenging benchmarks known to exist—Not Diamond will learn to route between LLMs to outperform each of them individually. AI model routing can significantly reduce costs and latency by leveraging smaller models without sacrificing quality. More meaningful, however, is the new frontier it opens for the performance and generalizability of LLMs.

Almost every day heralds the release of a new state of the art LLM. Identifying the right model to use for a given application is incredibly challenging, and developers are forced to navigate a broad and confusing landscape of models with various strengths and weaknesses at varying price points, latencies, and context windows. We started working on Not Diamond after speaking with hundreds of developers trying and failing to solve this problem themselves. We knew there had to be a better way.

30 years ago, Yahoo tried to build the everything website—a single place on the internet where you could go for anything you needed. Google on the other hand took the opposite bet: that the future of the internet would be incredibly fragmented. And so they built a router—from search queries to websites—and ended up becoming the single “meta-website” for the entire internet.

We believe there’s a similar opportunity to build a “meta-model” for AI—a single interface between the application layer and the inference layer. Not only do we think this will be a similarly important and powerful part of the ecosystem to build, but to the extent we can shift the future towards one in which we leverage networks of specialized models rather using a single giant monolithic model for everything, we can create a safer world to live in. It’s well established that small, specialized models can outperform larger models on narrow domains. Routing gives specialized models the robustness of general ones. This is not only more computationally efficient—we get significant interpretability and safety benefits as well.

We’re honored not only to launch Not Diamond today but also to announce our $2.3M pre-seed round led by Defy with backing from some of the world’s leading AI scientists, engineers, and executives: Jeff Dean (Google), Julien Chaumond (Hugging Face), Zack Kass (OpenAI), Ion Stoica (Anyscale, Databricks), Tom Preston-Werner (Github), Scott Belsky (Adobe), Jeff Weiner (LinkedIn), Eoghan McCabe (Intercom), Alex Chung (Giphy), Carl Rivera (Shopify), John Kim (PayPal), Nadim Hossain (Databricks), Amir Haghighat (Baseten), Aman Khan (Arize AI), Grant Miller (Replicated), and many more, along with additional institutional participation from Inovia Capital, 640 Oxford, VitalStage Ventures, and Karman VC.

If you’d like to try out Not Diamond, you can get started in less than five minutes. We’d love to hear what you think.