Build without limits

100K monthly free routing requests through our API

Discovery

Free

Up to 100K monthly API routing requests

Train one custom router

Intelligent cost and latency tradeoffs

Joint prompt optimization support

Fallback rerouting

Get started

Possibility

$100/mo

Plus $0.001 per API routing request after the first 100K free

Everything in Discovery

Uncapped API routing requests

Unlimited custom routers

Enhanced data privacy with fuzzy hashing

Get started

Necessity

Custom pricing

Everything in Possibility

VPC deployments

Custom integration and router training support

Access and permissions management

Schedule a call

Our chat app can also be used for free, or you can upgrade to pro for $20/month. We also regularly open source new releases of our base router.

Frequently asked questions

Is Not Diamond a proxy?

Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.

How does Not Diamond determine which model to call?

Not Diamond is a highly specialized predictive model optimized for model routing. Trained on a large, cross-domain evaluation dataset, it accurately predicts which LLM will perform best for any input.

Does Not Diamond integrate with my data?

Not Diamond is designed to work seamlessly with your existing data and evaluation pipelines. You can upload any LLM evaluation dataset and within minutes you’ll get back a router optimized to your use case.

When is the right time to try out Not Diamond?

Not Diamond is designed for every stage of the development process. Our users include developers building on our API from day one all the way up to sophisticated enterprise teams routing every request in production.

Can I optimize prompts across different models?

Not Diamond makes it easy to leverage automatic prompt optimization frameworks like DSPy and SAMMO, or to use your own manually developed prompts for each LLM. Not Diamond will learn the best model and prompt combination for each query.

How is this different from simply using a single model?

You can think of Not Diamond as a “meta-model”, an ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.

Will Not Diamond add extra latency to my model calls?

Not Diamond’s inference speed is under 100ms, and by routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.

Does Not Diamond support RAG and agent use cases?

Yes, Not Diamond is especially powerful for RAG and agent workflows. As highly diverse and unseen prompts propagate through the workflow, Not Diamond's routing improves quality, reliability, speed, and efficiency.

What languages does Not Diamond support?

Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.

Is Not Diamond SOC-2 Compliant?

Not Diamond is currently in the process of securing SOC-2 compliance and will be fully compliant in 2025.

Build without limits

Frequently asked questions

Try Not Diamond for free