Features

Leverage the best of every AI model
Automatic model routing

For any input, Not Diamond automatically determines which model is best-suited to respond, delivering state-of-the-art performance that beats every foundation model on every major benchmark.

from notdiamond import NotDiamond

# Define the Not Diamond routing client
client = NotDiamond()

# The best LLM is determined by Not Diamond
result, session_id, provider = client.chat.completions.create(
   messages=[
       {
"role": "system", "content": "You are a helpful assistant."},
       {
"role": "user", "content": "Concisely explain merge sort."}
   ],
   model=[
'openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
)

print("LLM called: ", provider.model)  # The LLM routed to

print("LLM output: ", result.content)  # The LLM response

Intelligent cost and latency tradeoffs

By default, Not Diamond maximizes quality above all else. However, you can also define explicit cost and latency tradeoffs to optimize for speed or cost-savings by using faster, cheaper models when doing so doesn't degrade quality.

result, session_id, provider = client.chat.completions.create(
   messages=[
       {
"role": "system", "content": "You are a helpful assistant."},
       {
"role": "user", "content": "Concisely explain merge sort."}
   ],
   model=[
'openai/gpt-4o', 'openai/gpt-3.5-turbo'],
   tradeoff=
"cost" # Use cheaper models without degrading quality
)

Train your own router

Not Diamond provides an elegant interface through which you can upload any evaluation dataset and receive back a customized router within minutes.

from notdiamond.toolkit import CustomRouter

# Initialize the CustomRouter object for training

trainer = CustomRouter()



# Train the model using your dataset
preference_id = trainer.fit(
dataset=pzn_train,
# The dataset containing inputs, responses, and scores
prompt_column="Input", # Column name for the input prompts
response_column="response", # Column name for the model responses
score_column="score" # Column name for the scores
)

print("Custom router preference ID: ", preference_id)

Hyper-personalization

Not Diamond can hyper-personalize routing decisions to each of your end-users’ individual preferences in real-time based on their feedback.

result, session_id, provider = client.chat.completions.create(
   messages=[
       {
"role": "system", "content": "You are a world class programmer."},
       {
"role": "user", "content": "Write a merge sort in Python."}
   ],
   model=llm_providers,
   preference_id=
"YOUR_PREFERENCE_ID",
    metric=Metric(
"accuracy")
)

# The user submits a thumbs up
score = metric.feedback(session_id=session_id, llm_config=provider, value=1)

Joint prompt optimization support

Not Diamond seamlessly supports prompt optimization workflows—both manual and automatic—so that you always call the best model with the best prompt.

from notdiamond.llms.config import LLMConfig

llm_providers = [
   LLMConfig(
       provider=
"openai",
       model=
"gpt-4o",
       system_prompt=
"Summarize this essay:"
   ),
   LLMConfig(
       provider=
"anthropic",
       model=
"claude-3-5-sonnet-20240620",
      system_prompt=
"Distill the essence of this document:"
   ),
]

Fuzzy hashing and VPC deployments

Not Diamond is not a proxy and all calls go out client-side. Bolster your data privacy by enabling similarity-preserving fuzzy hashing on our API or deploy directly to your infrastructure for maximum security.

result, session_id, provider = client.chat.completions.create(
   messages=[
       {
"role": "system", "content": "You are a technical analyst."},
       {
"role": "user", "content": "Review this confidential internal document..."}
   ],
   model=llm_providers,
   hash_content=True
# Turn on fuzzy hashing
)