Smart Caching

Proxle's smart caching automatically caches identical LLM requests, saving you money during development and testing.

How It Works

  1. When a request comes through the proxy, Proxle computes a cache key from the request body, provider, model, and endpoint
  2. Non-semantic fields (like stream, n, user) are excluded from the cache key
  3. If a matching cache entry exists and hasn't expired, the cached response is returned immediately
  4. Cache hits are logged as requests with a reference to the original request

Enabling Caching

Caching is controlled per-project via the dashboard:

  1. Go to Dashboard > Settings > Cache
  2. Toggle caching on
  3. Set the TTL (time-to-live) in hours (1-720, default 24)
  4. Click Save

Cache Headers

Every proxied response includes cache status headers:

| Header | Value | Description | |--------|-------|-------------| | X-Cache-Status | hit or miss | Whether the response came from cache | | X-Cached-At | ISO timestamp | When the cached response was originally stored (only on hits) |

Viewing Cache Entries

Go to Dashboard > Settings > Cache to see:

  • Active entries count
  • Total cache hits across all entries
  • Individual cache entry details (provider, model, hit count, expiry)

Invalidating Cache

Single Entry

Delete a specific cache entry from the cache settings page.

All Entries

Click Clear All Cache in the cache settings to invalidate all entries for your project.

Cache Key Algorithm

The cache key is a SHA-256 hash of:

  • Provider (e.g., openai)
  • Endpoint path (e.g., /chat/completions)
  • Request body (normalized, excluding non-semantic fields)

Non-semantic fields excluded from the key:

  • stream - Streaming preference doesn't affect the content
  • n - Number of completions
  • user - OpenAI's user tracking field
  • logprobs - Log probability settings

Best Practices

  • Development: Set a long TTL (168 hours / 7 days) to maximize cache hits
  • Testing: Use a shorter TTL (1-4 hours) to balance freshness and savings
  • Production: Consider disabling caching or using very short TTLs
  • Cache hits don't count against your request quota
  • Streaming requests are not cached (only non-streaming responses)