Files

Kitos 14f8485f06 feat(phase-33): final polish V3 - navigation, performance, and documentation (T-238 to T-240)

2026-02-10 09:21:35 +01:00

7.0 KiB

Raw Permalink Blame History

Aegis — Scoring System

Aegis uses a granular 0–100 scoring system to measure security coverage at multiple levels: individual techniques, tactics, threat actors, and the overall organization.

Technique Score (0–100)

Each ATT&CK technique receives a composite score based on five weighted components:

Component	Default Weight	Description
Tests Validated	40%	Ratio of detected tests to total validated tests
Detection Rules	20%	Number of active detection rules linked to the technique
D3FEND Coverage	15%	Number of D3FEND defensive techniques mapped
Freshness	15%	How recent the latest validated test is
Platform Diversity	10%	Coverage across different platforms (Windows, Linux, macOS)

Tests Validated Component

score = (detected_tests / total_validated_tests) × weight

Only tests in validated state are counted
detected means detection_result = "detected"
Example: 2 detected out of 3 validated → 2/3 × 40 = 26.7

Detection Rules Component

score = min(active_rules / 3, 1.0) × weight

Counts active detection rules linked to the technique's mitre_id
3+ rules gives full marks (capped at 1.0)
Example: 2 active rules → 2/3 × 20 = 13.3

D3FEND Coverage Component

score = min(d3fend_mappings / 2, 1.0) × weight

Counts D3FEND defensive technique mappings
2+ mappings gives full marks
Example: 1 mapping → 1/2 × 15 = 7.5

Freshness Component

days = (now - newest_validated_test.red_validated_at).days
score = max(0, 1.0 - days / 180) × weight

0 days old = full freshness score
180+ days old = 0 (completely stale)
Linear decay between 0 and 180 days
Example: test is 60 days old → (1 - 60/180) × 15 = 10.0

Platform Diversity Component

platforms_covered = unique platforms across validated tests
score = min(platforms_covered / 3, 1.0) × weight

Counts unique platforms (windows, linux, macos) from validated tests
3+ platforms gives full marks
Example: windows + linux → 2/3 × 10 = 6.7

Example Calculation

A technique with:

2/3 tests detected, 2 detection rules, 1 D3FEND mapping, 60 days old, 2 platforms

Tests:     (2/3) × 40        = 26.7
Detection: (2/3) × 20        = 13.3
D3FEND:    (1/2) × 15        =  7.5
Freshness: (1 - 60/180) × 15 = 10.0
Platform:  (2/3) × 10        =  6.7
                               ─────
Total:                         64.2

Configuring Weights

Weights are configurable via environment variables or the admin API. They must sum to 100.

Environment Variables

SCORING_WEIGHT_TESTS=40
SCORING_WEIGHT_DETECTION_RULES=20
SCORING_WEIGHT_D3FEND=15
SCORING_WEIGHT_FRESHNESS=15
SCORING_WEIGHT_PLATFORM_DIVERSITY=10

API Configuration

# Get current weights
GET /api/v1/scores/config

# Update weights (admin only)
PATCH /api/v1/scores/config
{
  "tests": 50,
  "detection_rules": 20,
  "d3fend": 10,
  "freshness": 10,
  "platform_diversity": 10
}

Note: Runtime changes do not persist across restarts. Update the .env file or environment variables for permanent changes.

Tactic Score

The tactic score is the average of all technique scores within that tactic:

tactic_score = mean(technique_scores for techniques in tactic)

Also provides:

techniques_total — number of techniques in the tactic
techniques_evaluated — techniques with score > 0
techniques_by_status — count by status (validated, partial, not_covered, not_evaluated)

API

GET /api/v1/scores/tactic/execution
GET /api/v1/scores/tactic/persistence

Threat Actor Coverage Score

Measures how well the organization is covered against a specific threat actor:

actor_score = mean(technique_scores for techniques used by actor)

Also provides:

techniques_total — techniques attributed to the actor
techniques_covered — techniques with score > 0
coverage_percentage — percentage of techniques covered
uncovered_techniques — list of technique IDs with score = 0

API

GET /api/v1/scores/threat-actor/{actor_id}

Organization Score

The top-level organizational security score is a weighted average of four sub-scores:

Sub-score	Weight	Description
Total Coverage	40%	Average technique score across all evaluated techniques
Critical Coverage	25%	Average score for techniques with high/critical severity templates
Detection Maturity	20%	`(triggered_rules / total_active_rules) × 100`
Response Readiness	15%	`(remediation_completed / remediation_total) × 100`

org_score = total_coverage × 0.4
          + critical_coverage × 0.25
          + detection_maturity × 0.2
          + response_readiness × 0.15

Caching

The organization score is cached in-memory for 5 minutes. The cache is automatically invalidated when:

A test is validated (state → validated)
Scoring weights are updated via the API

API

GET /api/v1/scores/organization

Operational Metrics

In addition to coverage scores, Aegis tracks operational KPIs:

Mean Time to Detect (MTTD)

Time from test execution start (start_execution audit entry) to red team submission (submit_red).

MTTD = mean(submit_red.timestamp - start_execution.timestamp) for all tests

Mean Time to Respond (MTTR)

Time from blue team evaluation (blue_validated_at) to remediation completion (update_remediation audit entry).

MTTR = mean(update_remediation.timestamp - blue_validated_at) for remediated tests

Detection Efficacy

efficacy = (detected_tests / total_validated_tests) × 100

Alert Fidelity

Ratio of true positive detections to total detection rule evaluations.

Coverage Velocity

Rate at which new techniques are being covered over time (techniques covered per week).

Validation Throughput

Number of tests moving through the pipeline per time period.

Rejection Rate

Percentage of tests rejected during dual validation.

API

# All operational metrics
GET /api/v1/metrics/operational

# Weekly trend data
GET /api/v1/metrics/operational/trend?period=90d

# Breakdown by team
GET /api/v1/metrics/operational/by-team

Score History

Weekly score snapshots for trend analysis:

GET /api/v1/scores/history?period=90d
# Returns weekly data points with: date, overall_score, total_coverage,
# critical_coverage, detection_maturity, response_readiness

Periods: 30d, 90d, 1y

Coverage Snapshots

Point-in-time captures of the complete coverage state for historical comparison:

# Create a snapshot
POST /api/v1/snapshots
{ "name": "Q1 2026 Baseline" }

# Compare two snapshots
GET /api/v1/snapshots/compare?a={snapshot_id_a}&b={snapshot_id_b}
# Returns: score_delta, improved techniques, worsened techniques, unchanged count

Automatic weekly snapshots are created every Sunday at 00:00 by the scheduler, with old snapshots cleaned up to keep the last 52 (one year).

7.0 KiB Raw Permalink Blame History Unescape Escape