Files
Aegis/docs/SCORING.md

286 lines
7.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Aegis — Scoring System
Aegis uses a granular 0100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.
---
## Technique Score (0100)
Each ATT&CK technique receives a composite score based on five weighted components:
| Component | Default Weight | Description |
|-----------|---------------|-------------|
| Tests Validated | 40% | Ratio of detected tests to total validated tests |
| Detection Rules | 20% | Number of active detection rules linked to the technique |
| D3FEND Coverage | 15% | Number of D3FEND defensive techniques mapped |
| Freshness | 15% | How recent the latest validated test is |
| Platform Diversity | 10% | Coverage across different platforms (Windows, Linux, macOS) |
### Tests Validated Component
```
score = (detected_tests / total_validated_tests) × weight
```
- Only tests in `validated` state are counted
- `detected` means `detection_result = "detected"`
- Example: 2 detected out of 3 validated → `2/3 × 40 = 26.7`
### Detection Rules Component
```
score = min(active_rules / 3, 1.0) × weight
```
- Counts active detection rules linked to the technique's `mitre_id`
- 3+ rules gives full marks (capped at 1.0)
- Example: 2 active rules → `2/3 × 20 = 13.3`
### D3FEND Coverage Component
```
score = min(d3fend_mappings / 2, 1.0) × weight
```
- Counts D3FEND defensive technique mappings
- 2+ mappings gives full marks
- Example: 1 mapping → `1/2 × 15 = 7.5`
### Freshness Component
```
days = (now - newest_validated_test.red_validated_at).days
score = max(0, 1.0 - days / 180) × weight
```
- 0 days old = full freshness score
- 180+ days old = 0 (completely stale)
- Linear decay between 0 and 180 days
- Example: test is 60 days old → `(1 - 60/180) × 15 = 10.0`
### Platform Diversity Component
```
platforms_covered = unique platforms across validated tests
score = min(platforms_covered / 3, 1.0) × weight
```
- Counts unique platforms (windows, linux, macos) from validated tests
- 3+ platforms gives full marks
- Example: windows + linux → `2/3 × 10 = 6.7`
### Example Calculation
A technique with:
- 2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms
```
Tests: (2/3) × 40 = 26.7
Detection: (2/3) × 20 = 13.3
D3FEND: (1/2) × 15 = 7.5
Freshness: (1 - 60/180) × 15 = 10.0
Platform: (2/3) × 10 = 6.7
─────
Total: 64.2
```
---
## Configuring Weights
Weights are configurable via environment variables or the admin API. They must sum to 100.
### Environment Variables
```env
SCORING_WEIGHT_TESTS=40
SCORING_WEIGHT_DETECTION_RULES=20
SCORING_WEIGHT_D3FEND=15
SCORING_WEIGHT_FRESHNESS=15
SCORING_WEIGHT_PLATFORM_DIVERSITY=10
```
### API Configuration
```bash
# Get current weights
GET /api/v1/scores/config
# Update weights (admin only)
PATCH /api/v1/scores/config
{
"tests": 50,
"detection_rules": 20,
"d3fend": 10,
"freshness": 10,
"platform_diversity": 10
}
```
Note: Runtime changes do not persist across restarts. Update the `.env` file or environment variables for permanent changes.
---
## Tactic Score
The tactic score is the **average** of all technique scores within that tactic:
```
tactic_score = mean(technique_scores for techniques in tactic)
```
Also provides:
- `techniques_total` — number of techniques in the tactic
- `techniques_evaluated` — techniques with score > 0
- `techniques_by_status` — count by status (validated, partial, not_covered, not_evaluated)
### API
```bash
GET /api/v1/scores/tactic/execution
GET /api/v1/scores/tactic/persistence
```
---
## Threat Actor Coverage Score
Measures how well the organization is covered against a specific threat actor:
```
actor_score = mean(technique_scores for techniques used by actor)
```
Also provides:
- `techniques_total` — techniques attributed to the actor
- `techniques_covered` — techniques with score > 0
- `coverage_percentage` — percentage of techniques covered
- `uncovered_techniques` — list of technique IDs with score = 0
### API
```bash
GET /api/v1/scores/threat-actor/{actor_id}
```
---
## Organization Score
The top-level organizational security score is a weighted average of four sub-scores:
| Sub-score | Weight | Description |
|-----------|--------|-------------|
| Total Coverage | 40% | Average technique score across all evaluated techniques |
| Critical Coverage | 25% | Average score for techniques with high/critical severity templates |
| Detection Maturity | 20% | `(triggered_rules / total_active_rules) × 100` |
| Response Readiness | 15% | `(remediation_completed / remediation_total) × 100` |
```
org_score = total_coverage × 0.4
+ critical_coverage × 0.25
+ detection_maturity × 0.2
+ response_readiness × 0.15
```
### Caching
The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:
- A test is validated (state → `validated`)
- Scoring weights are updated via the API
### API
```bash
GET /api/v1/scores/organization
```
---
## Operational Metrics
In addition to coverage scores, Aegis tracks operational KPIs:
### Mean Time to Detect (MTTD)
Time from test execution start (`start_execution` audit entry) to red team submission (`submit_red`).
```
MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests
```
### Mean Time to Respond (MTTR)
Time from blue team evaluation (`blue_validated_at`) to remediation completion (`update_remediation` audit entry).
```
MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests
```
### Detection Efficacy
```
efficacy = (detected_tests / total_validated_tests) × 100
```
### Alert Fidelity
Ratio of true positive detections to total detection rule evaluations.
### Coverage Velocity
Rate at which new techniques are being covered over time (techniques covered per week).
### Validation Throughput
Number of tests moving through the pipeline per time period.
### Rejection Rate
Percentage of tests rejected during dual validation.
### API
```bash
# All operational metrics
GET /api/v1/metrics/operational
# Weekly trend data
GET /api/v1/metrics/operational/trend?period=90d
# Breakdown by team
GET /api/v1/metrics/operational/by-team
```
---
## Score History
Weekly score snapshots for trend analysis:
```bash
GET /api/v1/scores/history?period=90d
# Returns weekly data points with: date, overall_score, total_coverage,
# critical_coverage, detection_maturity, response_readiness
```
Periods: `30d`, `90d`, `1y`
---
## Coverage Snapshots
Point-in-time captures of the complete coverage state for historical comparison:
```bash
# Create a snapshot
POST /api/v1/snapshots
{ "name": "Q1 2026 Baseline" }
# Compare two snapshots
GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
# Returns: score_delta, improved techniques, worsened techniques, unchanged count
```
Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).