Not Diamond is not a proxy. It simply recommends which model to use and then all requests to LLMs are made client-side. You can call models through APIs, gateways, or locally—Not Diamond is agnostic to your request orchestration pipelines.
Not Diamond is a highly specialized predictive model optimized for model routing. Trained on a large, cross-domain evaluation dataset, it accurately predicts which LLM will perform best for any input.
Not Diamond is designed to work seamlessly with your existing data and evaluation pipelines. You can upload any LLM evaluation dataset and within minutes you’ll get back a router optimized to your use case.
Not Diamond is designed for every stage of the development process. Our users include developers building on our API from day one all the way up to sophisticated enterprise teams routing every request in production.
Not Diamond makes it easy to leverage automatic prompt optimization frameworks like DSPy and SAMMO, or to use your own manually developed prompts for each LLM. Not Diamond will learn the best model and prompt combination for each query.
You can think of Not Diamond as a “meta-model”, an ensemble of all the most powerful LLMs, which beats each individual model on quality while drastically reducing costs and latency.
Not Diamond’s inference speed is under 100ms, and by routing to faster LLMs when possible you can drive net speedups in your LLM calls. To avoid network latency and maximize speed you can deploy Not Diamond directly to your infrastructure.
Yes, Not Diamond is especially powerful for RAG and agent workflows. As highly diverse and unseen prompts propagate through the workflow, Not Diamond's routing improves quality, reliability, speed, and efficiency.
Not Diamond is available through our Python SDK, TypeScript client, and our REST API, so you can leverage model routing within any stack.
Not Diamond is currently in the process of securing SOC-2 compliance and will be fully compliant in 2025.