Intelligent Routing · Semantic Caching · Per-Token Billing
┌─────────────────────────────────────────────────────────────┐ │ SENTINEL-AI GATEWAY │ │ │ │ ┌─────────┐ ┌────────────┐ ┌───────────────┐ ┌───────┐ │ │ Security│──▶│Smart Router│──▶│Semantic Cache │──▶│Billing│ │ │ Filter │ │ Advisor │ │ (Redis VDB) │ │Service│ │ └────┬────┘ └─────┬──────┘ └───────┬───────┘ └───┬───┘ │ │ │ │ │ │ │ API Key Auth Complexity Analys. Cache Hit? Deduct $│ └─────────────────────────────────────────────────────────────┘
Automatically classifies prompts as SIMPLE, REASONING, or HIGH_STAKES using regex-based heuristic analysis. Routes to Gemini (cheapest), DeepSeek, or Claude (best). Includes automatic failovers.
Split-token pricing with configurable SaaS markup (default 1.20x) applied on top of base costs. Features atomic balance deduction preventing race conditions.
95% similarity threshold using a Redis Vector Store. Near-identical prompts are served from the cache instantly at a fraction of a cent.
API uses stateless X-Sentinel-API-Key auth. Dashboard uses session-based OAuth2. Keys are zero-trust stored as SHA-256 hashes.
Synchronous Completion
curl -X POST http://localhost:8080/api/v1/chat \
-H "Content-Type: application/json" \
-H "X-Sentinel-API-Key: sk-your-key-here" \
-d '{"prompt": "What is the capital of France?"}'Streaming (SSE)
curl -X POST http://localhost:8080/api/v1/chat/stream \
-H "Content-Type: application/json" \
-H "X-Sentinel-API-Key: sk-your-key-here" \
-d '{"prompt": "Explain quicksort step by step."}'