Aegis/docs/SCORING.md

# Aegis — Scoring System

Aegis uses a granular 0–100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.

---

## Technique Score (0–100)

Each ATT&CK technique receives a composite score based on five weighted components:

| Component | Default Weight | Description |
|-----------|---------------|-------------|
| Tests Validated | 40% | Ratio of detected tests to total validated tests |
| Detection Rules | 20% | Number of active detection rules linked to the technique |
| D3FEND Coverage | 15% | Number of D3FEND defensive techniques mapped |
| Freshness | 15% | How recent the latest validated test is |
| Platform Diversity | 10% | Coverage across different platforms (Windows, Linux, macOS) |

### Tests Validated Component

```
score = (detected_tests / total_validated_tests) × weight
```

- Only tests in `validated` state are counted
- `detected` means `detection_result = "detected"`
- Example: 2 detected out of 3 validated → `2/3 × 40 = 26.7`

### Detection Rules Component

```
score = min(active_rules / 3, 1.0) × weight
```

- Counts active detection rules linked to the technique's `mitre_id`
- 3+ rules gives full marks (capped at 1.0)
- Example: 2 active rules → `2/3 × 20 = 13.3`

### D3FEND Coverage Component

```
score = min(d3fend_mappings / 2, 1.0) × weight
```

- Counts D3FEND defensive technique mappings
- 2+ mappings gives full marks
- Example: 1 mapping → `1/2 × 15 = 7.5`

### Freshness Component

```
days = (now - newest_validated_test.red_validated_at).days
score = max(0, 1.0 - days / 180) × weight
```

- 0 days old = full freshness score
- 180+ days old = 0 (completely stale)
- Linear decay between 0 and 180 days
- Example: test is 60 days old → `(1 - 60/180) × 15 = 10.0`

### Platform Diversity Component

```
platforms_covered = unique platforms across validated tests
score = min(platforms_covered / 3, 1.0) × weight
```

- Counts unique platforms (windows, linux, macos) from validated tests
- 3+ platforms gives full marks
- Example: windows + linux → `2/3 × 10 = 6.7`

### Example Calculation

A technique with:
- 2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms

```
Tests:     (2/3) × 40        = 26.7
Detection: (2/3) × 20        = 13.3
D3FEND:    (1/2) × 15        =  7.5
Freshness: (1 - 60/180) × 15 = 10.0
Platform:  (2/3) × 10        =  6.7
                               ─────
Total:                         64.2
```

---

## Configuring Weights

Weights are configurable via environment variables or the admin API. They must sum to 100.

### Environment Variables

```env
SCORING_WEIGHT_TESTS=40
SCORING_WEIGHT_DETECTION_RULES=20
SCORING_WEIGHT_D3FEND=15
SCORING_WEIGHT_FRESHNESS=15
SCORING_WEIGHT_PLATFORM_DIVERSITY=10
```

### API Configuration

```bash
# Get current weights
GET /api/v1/scores/config

# Update weights (admin only)
PATCH /api/v1/scores/config
{
  "tests": 50,
  "detection_rules": 20,
  "d3fend": 10,
  "freshness": 10,
  "platform_diversity": 10
}
```

Note: Runtime changes do not persist across restarts. Update the `.env` file or environment variables for permanent changes.

---

## Tactic Score

The tactic score is the **average** of all technique scores within that tactic:

```
tactic_score = mean(technique_scores for techniques in tactic)
```

Also provides:
- `techniques_total` — number of techniques in the tactic
- `techniques_evaluated` — techniques with score > 0
- `techniques_by_status` — count by status (validated, partial, not_covered, not_evaluated)

### API

```bash
GET /api/v1/scores/tactic/execution
GET /api/v1/scores/tactic/persistence
```

---

## Threat Actor Coverage Score

Measures how well the organization is covered against a specific threat actor:

```
actor_score = mean(technique_scores for techniques used by actor)
```

Also provides:
- `techniques_total` — techniques attributed to the actor
- `techniques_covered` — techniques with score > 0
- `coverage_percentage` — percentage of techniques covered
- `uncovered_techniques` — list of technique IDs with score = 0

### API

```bash
GET /api/v1/scores/threat-actor/{actor_id}
```

---

## Organization Score

The top-level organizational security score is a weighted average of four sub-scores:

| Sub-score | Weight | Description |
|-----------|--------|-------------|
| Total Coverage | 40% | Average technique score across all evaluated techniques |
| Critical Coverage | 25% | Average score for techniques with high/critical severity templates |
| Detection Maturity | 20% | `(triggered_rules / total_active_rules) × 100` |
| Response Readiness | 15% | `(remediation_completed / remediation_total) × 100` |

```
org_score = total_coverage × 0.4
          + critical_coverage × 0.25
          + detection_maturity × 0.2
          + response_readiness × 0.15
```

### Caching

The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:
- A test is validated (state → `validated`)
- Scoring weights are updated via the API

### API

```bash
GET /api/v1/scores/organization
```

---

## Operational Metrics

In addition to coverage scores, Aegis tracks operational KPIs:

### Mean Time to Detect (MTTD)

Time from test execution start (`start_execution` audit entry) to red team submission (`submit_red`).

```
MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests
```

### Mean Time to Respond (MTTR)

Time from blue team evaluation (`blue_validated_at`) to remediation completion (`update_remediation` audit entry).

```
MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests
```

### Detection Efficacy

```
efficacy = (detected_tests / total_validated_tests) × 100
```

### Alert Fidelity

Ratio of true positive detections to total detection rule evaluations.

### Coverage Velocity

Rate at which new techniques are being covered over time (techniques covered per week).

### Validation Throughput

Number of tests moving through the pipeline per time period.

### Rejection Rate

Percentage of tests rejected during dual validation.

### API

```bash
# All operational metrics
GET /api/v1/metrics/operational

# Weekly trend data
GET /api/v1/metrics/operational/trend?period=90d

# Breakdown by team
GET /api/v1/metrics/operational/by-team
```

---

## Score History

Weekly score snapshots for trend analysis:

```bash
GET /api/v1/scores/history?period=90d
# Returns weekly data points with: date, overall_score, total_coverage,
# critical_coverage, detection_maturity, response_readiness
```

Periods: `30d`, `90d`, `1y`

---

## Coverage Snapshots

Point-in-time captures of the complete coverage state for historical comparison:

```bash
# Create a snapshot
POST /api/v1/snapshots
{ "name": "Q1 2026 Baseline" }

# Compare two snapshots
GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
# Returns: score_delta, improved techniques, worsened techniques, unchanged count
```

Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).