Smart Caching
Proxle's smart caching automatically caches identical LLM requests, saving you money during development and testing.
How It Works
- When a request comes through the proxy, Proxle computes a cache key from the request body, provider, model, and endpoint
- Non-semantic fields (like
stream,n,user) are excluded from the cache key - If a matching cache entry exists and hasn't expired, the cached response is returned immediately
- Cache hits are logged as requests with a reference to the original request
Enabling Caching
Caching is controlled per-project via the dashboard:
- Go to Dashboard > Settings > Cache
- Toggle caching on
- Set the TTL (time-to-live) in hours (1-720, default 24)
- Click Save
Cache Headers
Every proxied response includes cache status headers:
| Header | Value | Description |
|--------|-------|-------------|
| X-Cache-Status | hit or miss | Whether the response came from cache |
| X-Cached-At | ISO timestamp | When the cached response was originally stored (only on hits) |
Viewing Cache Entries
Go to Dashboard > Settings > Cache to see:
- Active entries count
- Total cache hits across all entries
- Individual cache entry details (provider, model, hit count, expiry)
Invalidating Cache
Single Entry
Delete a specific cache entry from the cache settings page.
All Entries
Click Clear All Cache in the cache settings to invalidate all entries for your project.
Cache Key Algorithm
The cache key is a SHA-256 hash of:
- Provider (e.g.,
openai) - Endpoint path (e.g.,
/chat/completions) - Request body (normalized, excluding non-semantic fields)
Non-semantic fields excluded from the key:
stream- Streaming preference doesn't affect the contentn- Number of completionsuser- OpenAI's user tracking fieldlogprobs- Log probability settings
Best Practices
- Development: Set a long TTL (168 hours / 7 days) to maximize cache hits
- Testing: Use a shorter TTL (1-4 hours) to balance freshness and savings
- Production: Consider disabling caching or using very short TTLs
- Cache hits don't count against your request quota
- Streaming requests are not cached (only non-streaming responses)