2
Operational-Alerts
kitos edited this page 2026-05-22 12:33:06 +00:00

Operational Alerts

The operational alerts system monitors the state of your security coverage and notifies the team when conditions fall below defined thresholds.


Alert Rules

Alert rules define what to check and when to fire. Each rule has a type, severity, configuration thresholds, and notification preferences.

Rule Types

Rule Type What it checks
coverage_drop Overall coverage score drops below a threshold
stale_test A test has been in red_executing or blue_evaluating for too long
unvalidated_test Tests stuck in in_review beyond a threshold duration
high_risk_uncovered High-severity techniques have no validated tests
detection_gap Technique has validated attack tests but no detection rule

Rule Fields

{
  "name": "Coverage below 70%",
  "description": "Alert when overall coverage drops below 70%",
  "rule_type": "coverage_drop",
  "severity": "high",
  "config": {
    "threshold": 70.0,
    "tactic_id": null
  },
  "is_enabled": true,
  "cooldown_hours": 24,
  "notify_in_app": true,
  "notify_webhook": true,
  "webhook_id": "webhook-uuid-or-null"
}

Severity Levels

Severity Use case
info Informational; no action needed immediately
low Worth noting but not urgent
medium Should be addressed in next sprint
high Requires prompt attention
critical Immediate action required

Rule Configuration Examples

Coverage drop:

{"threshold": 75.0}

Fires when organization score drops below 75%.

Stale test:

{"stale_days": 7}

Fires for any test in executing/evaluating state for more than 7 days.

High risk uncovered:

{"min_severity": "high", "max_uncovered": 5}

Fires when more than 5 high-severity techniques have no validated test.

Detection gap:

{"require_detection_rule": true}

Fires for every validated attack test that has no linked detection rule.


Alert Instances

When a rule's condition is met and the rule is not in cooldown, an alert instance is created.

Instance Lifecycle

open ──────────────> acknowledged ──────────────> resolved
  │                                                   │
  └────────────────> dismissed                        │
                         │                            │
                         └── suppressed until         └── final state
                             cooldown resets               (immutable)

Instance Fields

{
  "id": "uuid",
  "rule_id": "uuid",
  "rule_name": "Coverage below 70%",
  "rule_type": "coverage_drop",
  "severity": "high",
  "status": "open",
  "details": {"current_score": 67.3, "threshold": 70.0},
  "fired_at": "2024-03-15T10:00:00Z",
  "acknowledged_at": null,
  "acknowledged_by": null,
  "resolved_at": null,
  "dismissed_at": null
}

Alert Lifecycle Actions

Acknowledge

Marks the alert as seen and being investigated. Does NOT suppress re-firing.

POST /api/v1/alerts/{id}/acknowledge
{"notes": "Investigating coverage drop — two campaigns just completed"}

Required role: red_lead, blue_lead, admin

Resolve

Marks the underlying issue as fixed. Prevents re-evaluation from creating a duplicate alert (until cooldown expires and condition is met again).

POST /api/v1/alerts/{id}/resolve
{"resolution_notes": "Coverage restored to 78% after campaign validation"}

Required role: red_lead, blue_lead, admin

Dismiss

Suppresses the alert for the rule's cooldown period.

POST /api/v1/alerts/{id}/dismiss
{"reason": "Planned maintenance window — coverage drop expected"}

Required role: red_lead, blue_lead, admin


Alert Evaluation

Automatic (hourly)

Aegis runs alert evaluation every hour via APScheduler:

  • Checks all is_enabled=true rules
  • For each rule, evaluates the condition against current data
  • Creates an instance if condition is met AND rule is not in cooldown
  • Sends in-app notifications and/or webhook calls per rule configuration

Manual trigger

POST /api/v1/alerts/evaluate

Required role: red_lead, blue_lead, admin

Useful when you've made changes and want to check immediately without waiting for the hourly job.


In-App Notifications

When notify_in_app: true on a rule, an in-app notification is sent to all users with role red_lead, blue_lead, or admin.

View notifications:

GET /api/v1/notifications

Mark as read:

PATCH /api/v1/notifications/{id}
{"is_read": true}

Webhook Notifications

When notify_webhook: true and a webhook_id is set, Aegis POSTs to the configured webhook URL when the alert fires.

Webhook payload:

{
  "event": "alert.fired",
  "alert_id": "uuid",
  "rule_name": "Coverage below 70%",
  "severity": "high",
  "details": {"current_score": 67.3, "threshold": 70.0},
  "fired_at": "2024-03-15T10:00:00Z"
}

Summary

GET /api/v1/alerts/summary

Returns:

{
  "total": 12,
  "by_status": {"open": 5, "acknowledged": 3, "resolved": 3, "dismissed": 1},
  "by_severity": {"critical": 1, "high": 4, "medium": 5, "low": 2, "info": 0},
  "by_type": {
    "coverage_drop": 2,
    "stale_test": 4,
    "unvalidated_test": 3,
    "high_risk_uncovered": 2,
    "detection_gap": 1
  }
}