Files
Aegis/docs/SCORING.md

7.0 KiB
Raw Permalink Blame History

Aegis — Scoring System

Aegis uses a granular 0100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.


Technique Score (0100)

Each ATT&CK technique receives a composite score based on five weighted components:

Component Default Weight Description
Tests Validated 40% Ratio of detected tests to total validated tests
Detection Rules 20% Number of active detection rules linked to the technique
D3FEND Coverage 15% Number of D3FEND defensive techniques mapped
Freshness 15% How recent the latest validated test is
Platform Diversity 10% Coverage across different platforms (Windows, Linux, macOS)

Tests Validated Component

score = (detected_tests / total_validated_tests) × weight
  • Only tests in validated state are counted
  • detected means detection_result = "detected"
  • Example: 2 detected out of 3 validated → 2/3 × 40 = 26.7

Detection Rules Component

score = min(active_rules / 3, 1.0) × weight
  • Counts active detection rules linked to the technique's mitre_id
  • 3+ rules gives full marks (capped at 1.0)
  • Example: 2 active rules → 2/3 × 20 = 13.3

D3FEND Coverage Component

score = min(d3fend_mappings / 2, 1.0) × weight
  • Counts D3FEND defensive technique mappings
  • 2+ mappings gives full marks
  • Example: 1 mapping → 1/2 × 15 = 7.5

Freshness Component

days = (now - newest_validated_test.red_validated_at).days
score = max(0, 1.0 - days / 180) × weight
  • 0 days old = full freshness score
  • 180+ days old = 0 (completely stale)
  • Linear decay between 0 and 180 days
  • Example: test is 60 days old → (1 - 60/180) × 15 = 10.0

Platform Diversity Component

platforms_covered = unique platforms across validated tests
score = min(platforms_covered / 3, 1.0) × weight
  • Counts unique platforms (windows, linux, macos) from validated tests
  • 3+ platforms gives full marks
  • Example: windows + linux → 2/3 × 10 = 6.7

Example Calculation

A technique with:

  • 2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms
Tests:     (2/3) × 40        = 26.7
Detection: (2/3) × 20        = 13.3
D3FEND:    (1/2) × 15        =  7.5
Freshness: (1 - 60/180) × 15 = 10.0
Platform:  (2/3) × 10        =  6.7
                               ─────
Total:                         64.2

Configuring Weights

Weights are configurable via environment variables or the admin API. They must sum to 100.

Environment Variables

SCORING_WEIGHT_TESTS=40
SCORING_WEIGHT_DETECTION_RULES=20
SCORING_WEIGHT_D3FEND=15
SCORING_WEIGHT_FRESHNESS=15
SCORING_WEIGHT_PLATFORM_DIVERSITY=10

API Configuration

# Get current weights
GET /api/v1/scores/config

# Update weights (admin only)
PATCH /api/v1/scores/config
{
  "tests": 50,
  "detection_rules": 20,
  "d3fend": 10,
  "freshness": 10,
  "platform_diversity": 10
}

Note: Runtime changes do not persist across restarts. Update the .env file or environment variables for permanent changes.


Tactic Score

The tactic score is the average of all technique scores within that tactic:

tactic_score = mean(technique_scores for techniques in tactic)

Also provides:

  • techniques_total — number of techniques in the tactic
  • techniques_evaluated — techniques with score > 0
  • techniques_by_status — count by status (validated, partial, not_covered, not_evaluated)

API

GET /api/v1/scores/tactic/execution
GET /api/v1/scores/tactic/persistence

Threat Actor Coverage Score

Measures how well the organization is covered against a specific threat actor:

actor_score = mean(technique_scores for techniques used by actor)

Also provides:

  • techniques_total — techniques attributed to the actor
  • techniques_covered — techniques with score > 0
  • coverage_percentage — percentage of techniques covered
  • uncovered_techniques — list of technique IDs with score = 0

API

GET /api/v1/scores/threat-actor/{actor_id}

Organization Score

The top-level organizational security score is a weighted average of four sub-scores:

Sub-score Weight Description
Total Coverage 40% Average technique score across all evaluated techniques
Critical Coverage 25% Average score for techniques with high/critical severity templates
Detection Maturity 20% (triggered_rules / total_active_rules) × 100
Response Readiness 15% (remediation_completed / remediation_total) × 100
org_score = total_coverage × 0.4
          + critical_coverage × 0.25
          + detection_maturity × 0.2
          + response_readiness × 0.15

Caching

The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:

  • A test is validated (state → validated)
  • Scoring weights are updated via the API

API

GET /api/v1/scores/organization

Operational Metrics

In addition to coverage scores, Aegis tracks operational KPIs:

Mean Time to Detect (MTTD)

Time from test execution start (start_execution audit entry) to red team submission (submit_red).

MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests

Mean Time to Respond (MTTR)

Time from blue team evaluation (blue_validated_at) to remediation completion (update_remediation audit entry).

MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests

Detection Efficacy

efficacy = (detected_tests / total_validated_tests) × 100

Alert Fidelity

Ratio of true positive detections to total detection rule evaluations.

Coverage Velocity

Rate at which new techniques are being covered over time (techniques covered per week).

Validation Throughput

Number of tests moving through the pipeline per time period.

Rejection Rate

Percentage of tests rejected during dual validation.

API

# All operational metrics
GET /api/v1/metrics/operational

# Weekly trend data
GET /api/v1/metrics/operational/trend?period=90d

# Breakdown by team
GET /api/v1/metrics/operational/by-team

Score History

Weekly score snapshots for trend analysis:

GET /api/v1/scores/history?period=90d
# Returns weekly data points with: date, overall_score, total_coverage,
# critical_coverage, detection_maturity, response_readiness

Periods: 30d, 90d, 1y


Coverage Snapshots

Point-in-time captures of the complete coverage state for historical comparison:

# Create a snapshot
POST /api/v1/snapshots
{ "name": "Q1 2026 Baseline" }

# Compare two snapshots
GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
# Returns: score_delta, improved techniques, worsened techniques, unchanged count

Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).