# Dedicated Session Types

## The Problem

You have a session. But different use cases need different behavior:
- **Speed-critical** user interaction → needs high priority + caching
- **Background processing** → needs low priority to not block UI
- **Multiple concurrent tasks** → need custom priorities

Same session capabilities, different execution characteristics.

## The Solution

Wrapper sessions that add behavior without changing capabilities.

```
┌────────────────────────┐
│ Base InferenceSession  │
│ (from backend)         │
└────────────────────────┘
       ↓ wrap
┌────────────────────────────────────────────┐
│ PerformanceInferenceSession                │
│ BackgroundInferenceSession                 │
│ PrioritizedInferenceSession                │
└────────────────────────────────────────────┘
```

## Session Types

### Base Session

What you get directly from the backend:

```swift
let session = try await engine.createSession(configuration: config)
```

```
┌────────────────────────┐
│ InferenceSession       │
├────────────────────────┤
│ Priority: Default      │
│ Caching: No            │
│ Batch: No              │
└────────────────────────┘

Use for: Normal, everyday generation
```

### PerformanceInferenceSession

Maximum speed with caching:

```swift
let perfSession = try await engine.createPerformanceSession(configuration: config)
```

```
┌──────────────────────────────────┐
│ PerformanceInferenceSession      │
├──────────────────────────────────┤
│ Priority: HIGH                   │
│ Caching: YES                     │
│ Batch: YES                       │
└──────────────────────────────────┘

Use for: User-facing, speed-critical apps
```

**Features:**

```
1. Caching
   ────────
   prompt: "Hello"  → generates → caches
   prompt: "Hello"  → instant (from cache)

2. High Priority
   ────────────
   All calls run at Task.high priority
   Better CPU scheduling

3. Batch Processing
   ─────────────
   generateBatch(["Q1", "Q2", "Q3"])
   All in parallel, all cached

```

### BackgroundInferenceSession

Low priority for background work:

```swift
let bgSession = try await engine.createBackgroundSession(configuration: config)
```

```
┌──────────────────────────────────┐
│ BackgroundInferenceSession       │
├──────────────────────────────────┤
│ Priority: BACKGROUND             │
│ Caching: No                      │
│ Blocking UI: No                  │
└──────────────────────────────────┘

Use for: Pre-caching, batch analysis, non-urgent tasks
```

**Characteristics:**

```
• Runs with .background priority
• System can defer execution
• Won't block user interactions
• May be significantly slower
• Power-efficient
```

### PrioritizedInferenceSession

Custom priority level:

```swift
let midSession = try await engine.createPrioritizedSession(
    configuration: config,
    priority: .medium
)
```

```
┌──────────────────────────────────┐
│ PrioritizedInferenceSession      │
├──────────────────────────────────┤
│ Priority: YOUR CHOICE            │
│ Caching: No                      │
└──────────────────────────────────┘

Available priorities:
• .high
• .medium
• .low
• .background
• .utility
```

## Same API

All sessions implement `InferenceSession` protocol:

```
┌─────────────────────────────────────────┐
│ All Session Types Support:              │
├─────────────────────────────────────────┤
│ generate(prompt:preset:)                │
│ generate(prompt:generating:preset:)     │
│ stream(prompt:preset:)                  │
│ stream(prompt:generating:preset:)       │
│ prewarm(promptPrefix:)                  │
│ configuration (read-only)               │
└─────────────────────────────────────────┘
```

## Comparison Matrix

```
Feature              Base  Performance  Background  Prioritized
────────────────────────────────────────────────────────────────
Priority             Med   High         Low         Custom
Caching              No    Yes          No          No
Batch Support        No    Yes          No          No
Memory Overhead      Low   Higher       Low         Low
Speed                Med   Fastest      Slowest     Varies
UI Impact            Med   Low          Minimal     Varies
Power Efficiency     Med   Lower        Highest     Varies
────────────────────────────────────────────────────────────────
```

## Use Cases

### PerformanceInferenceSession

```
✅ User-facing chatbot
✅ Real-time data extraction
✅ Interactive UI elements
✅ Repeated queries (caching helps)
✅ Batch processing
✅ When speed matters most

❌ Background tasks (wastes priority)
❌ Non-deterministic output (cache useless)
❌ One-off queries (cache won't help much)
```

### BackgroundInferenceSession

```
✅ Pre-caching content
✅ Batch analysis overnight
✅ Periodic background updates
✅ Index generation
✅ When timing doesn't matter

❌ User-facing features
❌ Real-time responses
❌ Interactive UI
```

### PrioritizedInferenceSession

```
✅ Multiple concurrent sessions
✅ Need specific priority levels
✅ Coordinating work across priorities
✅ Fine-grained control

❌ When default priority is fine
❌ When you need caching (use Performance instead)
```

## Decision Guide

```
╔══════════════════════════╦═══════════════════════════════╗
║ Need                     ║ Use                           ║
╠══════════════════════════╬═══════════════════════════════╣
║ Maximum speed            ║ PerformanceInferenceSession   ║
║ Repeated queries         ║ PerformanceInferenceSession   ║
║ Batch processing         ║ PerformanceInferenceSession   ║
║ Background work          ║ BackgroundInferenceSession    ║
║ UI responsiveness        ║ BackgroundInferenceSession    ║
║ Pre-caching              ║ BackgroundInferenceSession    ║
║ Custom priority          ║ PrioritizedInferenceSession   ║
║ Multiple priority levels ║ Multiple Prioritized sessions ║
║ Default behavior         ║ Base InferenceSession         ║
╚══════════════════════════╩═══════════════════════════════╝
```
