notdiamond

package module

v0.0.23 Latest Latest Go to latest Published: Mar 5, 2025 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Not-Diamond/go-notdiamond

README ¶

💎 Not Diamond Go SDK

One line statement to improve reliability and uptime of LLM requests. Documentation

Note Currently supported providers:

OpenAI models

Azure models

Vertex AI

✨ Features:

Fallback to other models if one fails
Load balance requests between multiple models
Max retries and timeout for each model
Exponential backoff strategy
Retry based on HTTP status codes
Average rolling latency fallback

📦 Installation

go get github.com/Not-Diamond/go-notdiamond

🚀 Basic Usage

Error handling intentionally ommited in the example for simplicity.

Redis needs to be running and accessible from the machine where this code is executed. (runs on default port 6379, but can be changed)

// Get keys
openaiApiKey := ''
azureApiKey := ''
azureEndpoint := ''
vertexProjectID := '' // Your Google Cloud project ID
vertexLocation := 'us-central1' // Your Google Cloud region

// Create requests
openaiRequest := openai.NewRequest("https://api.openai.com/v1/chat/completions", openaiApiKey)
azureRequest := azure.NewRequest(azureEndpoint, azureApiKey)
vertexRequest := vertex.NewRequest(vertexProjectID, vertexLocation)

// Create config
config := model.Config{
	Clients: []http.Request{ openaiRequest, azureRequest, vertexRequest },
	Models: model.OrderedModels{ "vertex/gemini-pro", "azure/gpt-4o-mini", "openai/gpt-4o-mini" },
	MaxRetries: map[string]int{
		"vertex/gemini-pro": 2,
		"azure/gpt-4o-mini": 2,
		"openai/gpt-4o-mini": 2,
	},
	VertexProjectID: vertexProjectID,
	VertexLocation: vertexLocation,
}

// Create transport
transport, err := notdiamond.NewTransport(config)

// Create a standard http.Client with our transport
client := &http.Client{
	Transport: transport,
}

// Prepare Payload
messages := []map[string]string{ {"role": "user", "content": "Hello, how are you?"} }
payload := map[string]interface{}{ "model": "gpt-4o-mini", "messages": messages }
jsonData := json.Marshal(payload)

// Create request
req := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", bytes.NewBuffer(jsonData))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+openaiApiKey)

// Do request via standard http.Client with our transport
resp := client.Do(req)
defer resp.Body.Close()
body := io.ReadAll(resp.Body)

// Final response
fmt.Println(string(body))

Load Balancing

You can configure load balancing between models using weights:

config := notdiamond.Config{
	// ... other config ...
	Models: notdiamond.WeightedModels{
		"vertex/gemini-pro": 0.4, // 40% of requests
		"azure/gpt-4": 0.3, // 30% of requests
		"openai/gpt-4": 0.3, // 30% of requests
	},
}

Max Retries

Configure custom max retries for each model:

// Default max retries is 1
config := notdiamond.Config{
	// ... other config ...
	MaxRetries: map[string]int{
		"azure/gpt-4":  3, // 3 retries
		"openai/gpt-4": 2, // 2 retries
	},
}

Timeout

Configure custom timeout (in seconds) for each model:

// Default timeout is 100 seconds
config := notdiamond.Config{
	// ... other config ...
	Timeout: map[string]float64{
		"azure/gpt-4":  10.0, // 10 seconds
		"openai/gpt-4": 5.0,  // 5 seconds
	},
}

Exponential Backoff

Configure custom backoff times (in seconds) for each model:

// Default backoff is 1 second
config := notdiamond.Config{
	// ... other config ...
	Backoff: map[string]float64{
		"azure/gpt-4":  0.5,  // Start with 0.5s, then 1s, 2s, etc.
		"openai/gpt-4": 1.0,  // Start with 1s, then 2s, 4s, etc.
	},
}

Model-Specific Messages

You can configure system messages that will be prepended to user messages for specific models:

config := notdiamond.Config{
	// ... other config ...
	ModelMessages: map[string][]map[string]string{
		"azure/gpt-4": {
			{"role": "system", "content": "You are a helpful assistant."},
		},
		"openai/gpt-4": {
			{"role": "system", "content": "Respond concisely."},
		},
	},
}

Status Code Retries

You can configure specific retry behavior for different HTTP status codes, either globally or per model.

Per model retry behavior:

config := notdiamond.Config{
	// ... other config ...
	StatusCodeRetry: map[string]map[string]int{
		"openai/gpt-4": {
			"429": 3, // Retry rate limit errors 3 times
			"500": 2, // Retry internal server errors 2 times
		},
	},
}

Global retry behavior:

config := notdiamond.Config{
	// ... other config ...
	StatusCodeRetry: map[string]int{
		"429": 3, // Retry rate limit errors 3 times
	},
}

Average Rolling Latency Fallback

Configure custom average rolling latency threshold and recovery time for each model:

config := notdiamond.Config{
	// ... other config ...
	ModelLatency: map[string]notdiamond.RollingAverageLatency{
		"azure/gpt-4": {
			AvgLatencyThreshold: 3.2, // Average latency threshold, if average latency is greater than this, fallback to other models
			NoOfCalls:           10, // Number of calls to make to get the average latency
			RecoveryTime:        3 * time.Second, // Time to wait before retrying
		},
	},
}

Model Limits

Configure custom limits for each model:

config := notdiamond.Config{
	// ... other config ...
	ModelLimits: model.ModelLimits{
		MaxNoOfCalls:    10000,
		MaxRecoveryTime: time.Hour * 24,
	},
}

Customise Redis

config := notdiamond.Config{
	// ... other config ...
	Redis: notdiamond.RedisConfig{
		Addr: "localhost:6379",
		Password: "password",
		DB: 0,
	},
}

Redis Data Management

The SDK uses Redis to store metrics for model performance tracking, including latency and error rates. To prevent Redis from accumulating excessive data over time, the following data management features are available:

Automatic Cleanup

When a model exits a recovery period (due to latency or errors), old data is automatically cleaned up
A periodic background cleanup process can run at configurable intervals

Configure Redis data management through environment variables:

# Redis Data Cleanup Configuration
ENABLE_REDIS_PERIODIC_CLEANUP=true    # Enable/disable periodic background cleanup
REDIS_CLEANUP_INTERVAL=6h             # How often to run cleanup (accepts Go duration format)
REDIS_DATA_RETENTION=24h              # How long to keep data before cleanup

Or add these to your .env file:

# Redis Configuration
REDIS_ADDR=localhost:6379
REDIS_PASSWORD=
REDIS_DB=0
# Redis Data Cleanup Configuration
ENABLE_REDIS_PERIODIC_CLEANUP=true
REDIS_CLEANUP_INTERVAL=6h
REDIS_DATA_RETENTION=24h

The periodic cleanup process:

Runs in a separate goroutine to avoid impacting application performance
Identifies all models with data in Redis
Removes data older than the specified retention period
Logs cleanup activities for monitoring

Data Retention Policy

By default, the SDK retains 24 hours of data for both latency and error tracking. This allows for:

Sufficient historical data for performance analysis
Trend detection for model reliability
Prevention of Redis memory growth in high-traffic scenarios

Error Rate Fallback

Configure custom error rate thresholds and recovery time for each model, with different thresholds for different status codes:

config := model.Config{
	// ... other config ...
	ModelErrorTracking: model.ModelErrorTracking{
		"openai/gpt-4": &model.RollingErrorTracking{
			StatusConfigs: map[int]*model.StatusCodeConfig{
				401: {
					ErrorThresholdPercentage: 80,  // Fallback if 80% of calls return 401
					NoOfCalls:                5,   // Number of calls to consider
					RecoveryTime:             1 * time.Minute, // Time to wait before retrying
				},
				500: {
					ErrorThresholdPercentage: 70,  // Fallback if 70% of calls return 500
					NoOfCalls:                5,
					RecoveryTime:             1 * time.Minute,
				},
				502: {
					ErrorThresholdPercentage: 60,  // Fallback if 60% of calls return 502
					NoOfCalls:                5,
					RecoveryTime:             1 * time.Minute,
				},
				429: {
					ErrorThresholdPercentage: 40,  // Fallback if 40% of calls return 429 (rate limit)
					NoOfCalls:                5,
					RecoveryTime:             30 * time.Second,
				},
			},
		},
	},
}

The error tracking system will:

Track all HTTP status codes for each model
Calculate the error percentage for each status code over the last N calls
If any status code's error percentage exceeds its threshold, mark the model as unhealthy and fallback to other models
After the recovery time, the model will be tried again

This allows for fine-grained control over error handling:

Set different thresholds for different types of errors (e.g., more aggressive fallback for rate limits)
Configure different number of calls and recovery times per status code
Track any HTTP status code you want to monitor
Configure different thresholds for different models based on their reliability

Multi-Region Support

The SDK supports configuring multiple regions for Azure and Vertex AI to improve reliability and reduce latency. This allows you to:

Fallback to different regions if one region experiences issues
Load balance requests across multiple regions
Configure region-specific settings for optimal performance

Model Naming Convention

For region-specific models, use the format provider/model/region:

Azure: azure/gpt-4o-mini/eastus
Vertex AI: vertex/gemini-pro/us-central1

Azure Multi-Region Configuration

Azure requires explicit configuration of region endpoints in the AzureRegions map:

config := model.Config{
	// ... other config ...
	Models: model.OrderedModels{
		"azure/gpt-4o-mini/eastus",
		"azure/gpt-4o-mini/westeurope",
		"vertex/gemini-pro/us-central1",
	},
	AzureAPIVersion: "2023-05-15", // Required for Azure
	AzureRegions: map[string]string{
		"eastus":     "https://eastus.api.cognitive.microsoft.com",
		"westeurope": "https://westeurope.api.cognitive.microsoft.com",
	},
}

Vertex AI Multi-Region Configuration

Vertex AI uses the region directly in the API endpoint and doesn't require additional configuration:

config := model.Config{
	// ... other config ...
	Models: model.OrderedModels{
		"vertex/gemini-pro/us-central1",
		"vertex/gemini-pro/us-west1",
		"vertex/gemini-pro/europe-west4",
		"azure/gpt-4o-mini/eastus",
	},
	VertexProjectID: "your-project-id", // Required for Vertex AI
	VertexLocation: "us-central1",      // Default location if not specified in model name
}

Mixed Provider Configuration with Regions

You can combine providers and regions for comprehensive fallback strategies:

config := model.Config{
	// ... other config ...
	Models: model.OrderedModels{
		"vertex/gemini-pro/us-east4",    // Try Vertex in us-east4 first
		"azure/gpt-4o-mini/eastus",      // Then try Azure in eastus
		"vertex/gemini-pro/us-central1", // Then try Vertex in us-central1
		"azure/gpt-4o-mini/westeurope",  // Then try Azure in westeurope
	},
	AzureAPIVersion: "2023-05-15",
	AzureRegions: map[string]string{
		"eastus":     "https://eastus.api.cognitive.microsoft.com",
		"westeurope": "https://westeurope.api.cognitive.microsoft.com",
	},
	VertexProjectID: "your-project-id",
}

Region-Specific Settings

You can configure different settings for each region-specific model:

config := model.Config{
	// ... other config ...
	MaxRetries: map[string]int{
		"azure/gpt-4o-mini/eastus": 3,
		"azure/gpt-4o-mini/westeurope": 2,
		"vertex/gemini-pro/us-central1": 2,
	},
	Timeout: map[string]float64{
		"azure/gpt-4o-mini/eastus": 10.0,
		"azure/gpt-4o-mini/westeurope": 15.0,
		"vertex/gemini-pro/us-central1": 12.0,
	},
}

Parser

The parser is a function that parses the response from the API and returns the response in a structured format.

// Import from https://github.com/Not-Diamond/go-notdiamond/pkg/http/response
result, err := response.Parse(body, startTime)

Documentation ¶

Index ¶

func ClientKey() interface{}
func Init(config model.Config) (*http_client.Client, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ClientKey ¶ added in v0.0.23

func ClientKey() interface{}

ClientKey returns the context key used for storing the NotDiamond client

func Init ¶

func Init(config model.Config) (*http_client.Client, error)

Types ¶

This section is empty.

Source Files ¶

View all Source files

notdiamond.go

Directories ¶

Path	Synopsis
pkg
clients/azure
clients/openai
clients/vertex
http/client
http/request
http/response
metric
model
redis
statistic
transport
validation

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL