feat(phase-33): final polish V3 - navigation, performance, and documentation (T-238 to T-240)

2026-02-10 09:21:35 +01:00
parent 35983de67e
commit 14f8485f06
14 changed files with 1446 additions and 320 deletions
@@ -0,0 +1,285 @@
+# Aegis — Scoring System
+
+Aegis uses a granular 0–100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.
+
+---
+
+## Technique Score (0–100)
+
+Each ATT&CK technique receives a composite score based on five weighted components:
+
+| Component | Default Weight | Description |
+|-----------|---------------|-------------|
+| Tests Validated | 40% | Ratio of detected tests to total validated tests |
+| Detection Rules | 20% | Number of active detection rules linked to the technique |
+| D3FEND Coverage | 15% | Number of D3FEND defensive techniques mapped |
+| Freshness | 15% | How recent the latest validated test is |
+| Platform Diversity | 10% | Coverage across different platforms (Windows, Linux, macOS) |
+
+### Tests Validated Component
+
+```
+score = (detected_tests / total_validated_tests) × weight
+```
+
+- Only tests in `validated` state are counted
+- `detected` means `detection_result = "detected"`
+- Example: 2 detected out of 3 validated → `2/3 × 40 = 26.7`
+
+### Detection Rules Component
+
+```
+score = min(active_rules / 3, 1.0) × weight
+```
+
+- Counts active detection rules linked to the technique's `mitre_id`
+- 3+ rules gives full marks (capped at 1.0)
+- Example: 2 active rules → `2/3 × 20 = 13.3`
+
+### D3FEND Coverage Component
+
+```
+score = min(d3fend_mappings / 2, 1.0) × weight
+```
+
+- Counts D3FEND defensive technique mappings
+- 2+ mappings gives full marks
+- Example: 1 mapping → `1/2 × 15 = 7.5`
+
+### Freshness Component
+
+```
+days = (now - newest_validated_test.red_validated_at).days
+score = max(0, 1.0 - days / 180) × weight
+```
+
+- 0 days old = full freshness score
+- 180+ days old = 0 (completely stale)
+- Linear decay between 0 and 180 days
+- Example: test is 60 days old → `(1 - 60/180) × 15 = 10.0`
+
+### Platform Diversity Component
+
+```
+platforms_covered = unique platforms across validated tests
+score = min(platforms_covered / 3, 1.0) × weight
+```
+
+- Counts unique platforms (windows, linux, macos) from validated tests
+- 3+ platforms gives full marks
+- Example: windows + linux → `2/3 × 10 = 6.7`
+
+### Example Calculation
+
+A technique with:
+- 2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms
+
+```
+Tests:     (2/3) × 40        = 26.7
+Detection: (2/3) × 20        = 13.3
+D3FEND:    (1/2) × 15        =  7.5
+Freshness: (1 - 60/180) × 15 = 10.0
+Platform:  (2/3) × 10        =  6.7
+                               ─────
+Total:                         64.2
+```
+
+---
+
+## Configuring Weights
+
+Weights are configurable via environment variables or the admin API. They must sum to 100.
+
+### Environment Variables
+
+```env
+SCORING_WEIGHT_TESTS=40
+SCORING_WEIGHT_DETECTION_RULES=20
+SCORING_WEIGHT_D3FEND=15
+SCORING_WEIGHT_FRESHNESS=15
+SCORING_WEIGHT_PLATFORM_DIVERSITY=10
+```
+
+### API Configuration
+
+```bash
+# Get current weights
+GET /api/v1/scores/config
+
+# Update weights (admin only)
+PATCH /api/v1/scores/config
+{
+  "tests": 50,
+  "detection_rules": 20,
+  "d3fend": 10,
+  "freshness": 10,
+  "platform_diversity": 10
+}
+```
+
+Note: Runtime changes do not persist across restarts. Update the `.env` file or environment variables for permanent changes.
+
+---
+
+## Tactic Score
+
+The tactic score is the **average** of all technique scores within that tactic:
+
+```
+tactic_score = mean(technique_scores for techniques in tactic)
+```
+
+Also provides:
+- `techniques_total` — number of techniques in the tactic
+- `techniques_evaluated` — techniques with score > 0
+- `techniques_by_status` — count by status (validated, partial, not_covered, not_evaluated)
+
+### API
+
+```bash
+GET /api/v1/scores/tactic/execution
+GET /api/v1/scores/tactic/persistence
+```
+
+---
+
+## Threat Actor Coverage Score
+
+Measures how well the organization is covered against a specific threat actor:
+
+```
+actor_score = mean(technique_scores for techniques used by actor)
+```
+
+Also provides:
+- `techniques_total` — techniques attributed to the actor
+- `techniques_covered` — techniques with score > 0
+- `coverage_percentage` — percentage of techniques covered
+- `uncovered_techniques` — list of technique IDs with score = 0
+
+### API
+
+```bash
+GET /api/v1/scores/threat-actor/{actor_id}
+```
+
+---
+
+## Organization Score
+
+The top-level organizational security score is a weighted average of four sub-scores:
+
+| Sub-score | Weight | Description |
+|-----------|--------|-------------|
+| Total Coverage | 40% | Average technique score across all evaluated techniques |
+| Critical Coverage | 25% | Average score for techniques with high/critical severity templates |
+| Detection Maturity | 20% | `(triggered_rules / total_active_rules) × 100` |
+| Response Readiness | 15% | `(remediation_completed / remediation_total) × 100` |
+
+```
+org_score = total_coverage × 0.4
+          + critical_coverage × 0.25
+          + detection_maturity × 0.2
+          + response_readiness × 0.15
+```
+
+### Caching
+
+The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:
+- A test is validated (state → `validated`)
+- Scoring weights are updated via the API
+
+### API
+
+```bash
+GET /api/v1/scores/organization
+```
+
+---
+
+## Operational Metrics
+
+In addition to coverage scores, Aegis tracks operational KPIs:
+
+### Mean Time to Detect (MTTD)
+
+Time from test execution start (`start_execution` audit entry) to red team submission (`submit_red`).
+
+```
+MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests
+```
+
+### Mean Time to Respond (MTTR)
+
+Time from blue team evaluation (`blue_validated_at`) to remediation completion (`update_remediation` audit entry).
+
+```
+MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests
+```
+
+### Detection Efficacy
+
+```
+efficacy = (detected_tests / total_validated_tests) × 100
+```
+
+### Alert Fidelity
+
+Ratio of true positive detections to total detection rule evaluations.
+
+### Coverage Velocity
+
+Rate at which new techniques are being covered over time (techniques covered per week).
+
+### Validation Throughput
+
+Number of tests moving through the pipeline per time period.
+
+### Rejection Rate
+
+Percentage of tests rejected during dual validation.
+
+### API
+
+```bash
+# All operational metrics
+GET /api/v1/metrics/operational
+
+# Weekly trend data
+GET /api/v1/metrics/operational/trend?period=90d
+
+# Breakdown by team
+GET /api/v1/metrics/operational/by-team
+```
+
+---
+
+## Score History
+
+Weekly score snapshots for trend analysis:
+
+```bash
+GET /api/v1/scores/history?period=90d
+# Returns weekly data points with: date, overall_score, total_coverage,
+# critical_coverage, detection_maturity, response_readiness
+```
+
+Periods: `30d`, `90d`, `1y`
+
+---
+
+## Coverage Snapshots
+
+Point-in-time captures of the complete coverage state for historical comparison:
+
+```bash
+# Create a snapshot
+POST /api/v1/snapshots
+{ "name": "Q1 2026 Baseline" }
+
+# Compare two snapshots
+GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
+# Returns: score_delta, improved techniques, worsened techniques, unchanged count
+```
+
+Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).