Sentinel-AI Documentation

Intelligent Routing · Semantic Caching · Per-Token Billing

Architecture Flow

┌─────────────────────────────────────────────────────────────┐
│                     SENTINEL-AI GATEWAY                     │
│                                                             │
│  ┌─────────┐   ┌────────────┐   ┌───────────────┐   ┌───────┐
│  │ Security│──▶│Smart Router│──▶│Semantic Cache │──▶│Billing│
│  │ Filter  │   │  Advisor   │   │  (Redis VDB)  │   │Service│
│  └────┬────┘   └─────┬──────┘   └───────┬───────┘   └───┬───┘
│       │              │                  │               │   │
│ API Key Auth   Complexity Analys.   Cache Hit?      Deduct $│
└─────────────────────────────────────────────────────────────┘

Core Features

🧠 Intelligent LLM Routing

Automatically classifies prompts as SIMPLE, REASONING, or HIGH_STAKES using regex-based heuristic analysis. Routes to Gemini (cheapest), DeepSeek, or Claude (best). Includes automatic failovers.

💰 Per-Token Billing

Split-token pricing with configurable SaaS markup (default 1.20x) applied on top of base costs. Features atomic balance deduction preventing race conditions.

🗄️ Semantic Caching

95% similarity threshold using a Redis Vector Store. Near-identical prompts are served from the cache instantly at a fraction of a cent.

⚡ Dual Authentication

API uses stateless X-Sentinel-API-Key auth. Dashboard uses session-based OAuth2. Keys are zero-trust stored as SHA-256 hashes.

Making Requests

Synchronous Completion

curl -X POST http://localhost:8080/api/v1/chat \
  -H "Content-Type: application/json" \
  -H "X-Sentinel-API-Key: sk-your-key-here" \
  -d '{"prompt": "What is the capital of France?"}'

Streaming (SSE)

curl -X POST http://localhost:8080/api/v1/chat/stream \
  -H "Content-Type: application/json" \
  -H "X-Sentinel-API-Key: sk-your-key-here" \
  -d '{"prompt": "Explain quicksort step by step."}'