feat(phase-33): final polish V3 - navigation, performance, and documentation (T-238 to T-240)

This commit is contained in:
2026-02-10 09:21:35 +01:00
parent 35983de67e
commit 14f8485f06
14 changed files with 1446 additions and 320 deletions

285
docs/SCORING.md Normal file
View File

@@ -0,0 +1,285 @@
# Aegis — Scoring System
Aegis uses a granular 0100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.
---
## Technique Score (0100)
Each ATT&CK technique receives a composite score based on five weighted components:
| Component | Default Weight | Description |
|-----------|---------------|-------------|
| Tests Validated | 40% | Ratio of detected tests to total validated tests |
| Detection Rules | 20% | Number of active detection rules linked to the technique |
| D3FEND Coverage | 15% | Number of D3FEND defensive techniques mapped |
| Freshness | 15% | How recent the latest validated test is |
| Platform Diversity | 10% | Coverage across different platforms (Windows, Linux, macOS) |
### Tests Validated Component
```
score = (detected_tests / total_validated_tests) × weight
```
- Only tests in `validated` state are counted
- `detected` means `detection_result = "detected"`
- Example: 2 detected out of 3 validated → `2/3 × 40 = 26.7`
### Detection Rules Component
```
score = min(active_rules / 3, 1.0) × weight
```
- Counts active detection rules linked to the technique's `mitre_id`
- 3+ rules gives full marks (capped at 1.0)
- Example: 2 active rules → `2/3 × 20 = 13.3`
### D3FEND Coverage Component
```
score = min(d3fend_mappings / 2, 1.0) × weight
```
- Counts D3FEND defensive technique mappings
- 2+ mappings gives full marks
- Example: 1 mapping → `1/2 × 15 = 7.5`
### Freshness Component
```
days = (now - newest_validated_test.red_validated_at).days
score = max(0, 1.0 - days / 180) × weight
```
- 0 days old = full freshness score
- 180+ days old = 0 (completely stale)
- Linear decay between 0 and 180 days
- Example: test is 60 days old → `(1 - 60/180) × 15 = 10.0`
### Platform Diversity Component
```
platforms_covered = unique platforms across validated tests
score = min(platforms_covered / 3, 1.0) × weight
```
- Counts unique platforms (windows, linux, macos) from validated tests
- 3+ platforms gives full marks
- Example: windows + linux → `2/3 × 10 = 6.7`
### Example Calculation
A technique with:
- 2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms
```
Tests: (2/3) × 40 = 26.7
Detection: (2/3) × 20 = 13.3
D3FEND: (1/2) × 15 = 7.5
Freshness: (1 - 60/180) × 15 = 10.0
Platform: (2/3) × 10 = 6.7
─────
Total: 64.2
```
---
## Configuring Weights
Weights are configurable via environment variables or the admin API. They must sum to 100.
### Environment Variables
```env
SCORING_WEIGHT_TESTS=40
SCORING_WEIGHT_DETECTION_RULES=20
SCORING_WEIGHT_D3FEND=15
SCORING_WEIGHT_FRESHNESS=15
SCORING_WEIGHT_PLATFORM_DIVERSITY=10
```
### API Configuration
```bash
# Get current weights
GET /api/v1/scores/config
# Update weights (admin only)
PATCH /api/v1/scores/config
{
"tests": 50,
"detection_rules": 20,
"d3fend": 10,
"freshness": 10,
"platform_diversity": 10
}
```
Note: Runtime changes do not persist across restarts. Update the `.env` file or environment variables for permanent changes.
---
## Tactic Score
The tactic score is the **average** of all technique scores within that tactic:
```
tactic_score = mean(technique_scores for techniques in tactic)
```
Also provides:
- `techniques_total` — number of techniques in the tactic
- `techniques_evaluated` — techniques with score > 0
- `techniques_by_status` — count by status (validated, partial, not_covered, not_evaluated)
### API
```bash
GET /api/v1/scores/tactic/execution
GET /api/v1/scores/tactic/persistence
```
---
## Threat Actor Coverage Score
Measures how well the organization is covered against a specific threat actor:
```
actor_score = mean(technique_scores for techniques used by actor)
```
Also provides:
- `techniques_total` — techniques attributed to the actor
- `techniques_covered` — techniques with score > 0
- `coverage_percentage` — percentage of techniques covered
- `uncovered_techniques` — list of technique IDs with score = 0
### API
```bash
GET /api/v1/scores/threat-actor/{actor_id}
```
---
## Organization Score
The top-level organizational security score is a weighted average of four sub-scores:
| Sub-score | Weight | Description |
|-----------|--------|-------------|
| Total Coverage | 40% | Average technique score across all evaluated techniques |
| Critical Coverage | 25% | Average score for techniques with high/critical severity templates |
| Detection Maturity | 20% | `(triggered_rules / total_active_rules) × 100` |
| Response Readiness | 15% | `(remediation_completed / remediation_total) × 100` |
```
org_score = total_coverage × 0.4
+ critical_coverage × 0.25
+ detection_maturity × 0.2
+ response_readiness × 0.15
```
### Caching
The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:
- A test is validated (state → `validated`)
- Scoring weights are updated via the API
### API
```bash
GET /api/v1/scores/organization
```
---
## Operational Metrics
In addition to coverage scores, Aegis tracks operational KPIs:
### Mean Time to Detect (MTTD)
Time from test execution start (`start_execution` audit entry) to red team submission (`submit_red`).
```
MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests
```
### Mean Time to Respond (MTTR)
Time from blue team evaluation (`blue_validated_at`) to remediation completion (`update_remediation` audit entry).
```
MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests
```
### Detection Efficacy
```
efficacy = (detected_tests / total_validated_tests) × 100
```
### Alert Fidelity
Ratio of true positive detections to total detection rule evaluations.
### Coverage Velocity
Rate at which new techniques are being covered over time (techniques covered per week).
### Validation Throughput
Number of tests moving through the pipeline per time period.
### Rejection Rate
Percentage of tests rejected during dual validation.
### API
```bash
# All operational metrics
GET /api/v1/metrics/operational
# Weekly trend data
GET /api/v1/metrics/operational/trend?period=90d
# Breakdown by team
GET /api/v1/metrics/operational/by-team
```
---
## Score History
Weekly score snapshots for trend analysis:
```bash
GET /api/v1/scores/history?period=90d
# Returns weekly data points with: date, overall_score, total_coverage,
# critical_coverage, detection_maturity, response_readiness
```
Periods: `30d`, `90d`, `1y`
---
## Coverage Snapshots
Point-in-time captures of the complete coverage state for historical comparison:
```bash
# Create a snapshot
POST /api/v1/snapshots
{ "name": "Q1 2026 Baseline" }
# Compare two snapshots
GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
# Returns: score_delta, improved techniques, worsened techniques, unchanged count
```
Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).