notdiamond-0001

December 12,  2023

notdiamond-0001 is our first model router available on Hugging Face. notdiamond-0001 takes any input and determines whether to send it to GPT-3.5 or GPT-4, optimizing for the highest accuracy while drastically reducing your costs and latency.

We’ve spent the last month working to make sure notdiamond-0001 meets the following five criteria:

  • Accurate: Trained on hundreds of thousands of data points from robust, cross-domain evaluation benchmarks, notdiamond-0001 outperforms GPT-4 by a factor of 1.51x when used as a router.
  • Transparent: When you call the notdiamond-0001 endpoint from your application, it will return a label for either GPT-3.5, GPT-4. You determine which version of each model you want to use and make the calls client-side with your own keys.
  • Fast: notdiamond-0001 will determine which model to call in <10ms.1 That’s less than the time it takes for GPT-4 to stream a single token. And by routing queries to GPT-3.5 when appropriate, you can expect to see drastic net speedups in your model calls.
  • Secure: We don’t store any of your input data or use it for training.
  • Free: notdiamond-0001 is entirely free to download and use. We want to help everyone, everywhere maximize their LLM performance. Making notdiamond-0001 free is the best way we believe we can support AI developers around the world.

To get started with notdiamond-0001, download it on Hugging Face.


Why we’re building Not Diamond

We believe in a multi-model future. The world won't have one single, giant model that everyone sends everything to—instead, there will be many foundation models, millions of fine-tuned variants of those models, and countless custom inference engines running on top of them. We believe this is not only a better future for AI, but a safer one as well. We started Not Diamond to enable this multi-model future, starting with safe and robust infrastructure for routing between models.

Why routing? Over the past months, we’ve talked to hundreds of developers and companies building on top of LLMs, from early-stage startups to Fortune 500 companies. For nearly everyone, model routing is a big, hairy, audacious problem. It sucks. Teams are using heuristics to route deterministically with if/else statements and regex expressions, trying to train their own classifiers to route inputs, A/B testing model selections, or handwriting prompts in an attempt to use slow and bulky agents as routers. When teams do manage to get a router working, it frequently breaks whenever the underlying models update. Meanwhile, those who have managed to build functional routers have seen huge gains in their product quality and margins. We decided there had to be a better way.

If you’re using GPT-4, notdiamond-0001 will lead to an immediate and drastic reduction in your inference costs and latency without any degradation in quality. Or, if you’re using GPT-3.5, you can enjoy a much higher response quality without significantly increasing your bill.

As a team, we've built venture scale companies, developed products for billions of users, and published cutting edge research in top AI journals, and we’re excited to be backed by some of the world's best AI developers and founders.

We’re actively hiring, so drop us a line if you want to help us build a multi-model future.