Smart Caching

Proxle's smart caching automatically caches identical LLM requests, saving you money during development and testing.

How It Works

When a request comes through the proxy, Proxle computes a cache key from the request body, provider, model, and endpoint
Non-semantic fields (like stream, n, user) are excluded from the cache key
If a matching cache entry exists and hasn't expired, the cached response is returned immediately
Cache hits are logged as requests with a reference to the original request

Enabling Caching

Caching is controlled per-project via the dashboard:

Go to Dashboard > Settings > Cache
Toggle caching on
Set the TTL (time-to-live) in hours (1-720, default 24)
Click Save

Cache Headers

Every proxied response includes cache status headers:

| Header | Value | Description | |--------|-------|-------------| | X-Cache-Status | hit or miss | Whether the response came from cache | | X-Cached-At | ISO timestamp | When the cached response was originally stored (only on hits) |

Viewing Cache Entries

Go to Dashboard > Settings > Cache to see:

Active entries count
Total cache hits across all entries
Individual cache entry details (provider, model, hit count, expiry)

Invalidating Cache

Single Entry

Delete a specific cache entry from the cache settings page.

All Entries

Click Clear All Cache in the cache settings to invalidate all entries for your project.

Cache Key Algorithm

The cache key is a SHA-256 hash of:

Provider (e.g., openai)
Endpoint path (e.g., /chat/completions)
Request body (normalized, excluding non-semantic fields)

Non-semantic fields excluded from the key:

stream - Streaming preference doesn't affect the content
n - Number of completions
user - OpenAI's user tracking field
logprobs - Log probability settings

Best Practices

Development: Set a long TTL (168 hours / 7 days) to maximize cache hits
Testing: Use a shorter TTL (1-4 hours) to balance freshness and savings
Production: Consider disabling caching or using very short TTLs
Cache hits don't count against your request quota
Streaming requests are not cached (only non-streaming responses)

JavaScript SDK Cost Tracking