diff --git a/Operational-Alerts.-.md b/Operational-Alerts.-.md index e69de29..9d1c00f 100644 --- a/Operational-Alerts.-.md +++ b/Operational-Alerts.-.md @@ -0,0 +1,227 @@ +# Operational Alerts + +The operational alerts system monitors the state of your security coverage and +notifies the team when conditions fall below defined thresholds. + +--- + +## Alert Rules + +Alert rules define **what to check** and **when to fire**. Each rule has a type, +severity, configuration thresholds, and notification preferences. + +### Rule Types + +| Rule Type | What it checks | +|-----------|---------------| +| `coverage_drop` | Overall coverage score drops below a threshold | +| `stale_test` | A test has been in `red_executing` or `blue_evaluating` for too long | +| `unvalidated_test` | Tests stuck in `in_review` beyond a threshold duration | +| `high_risk_uncovered` | High-severity techniques have no validated tests | +| `detection_gap` | Technique has validated attack tests but no detection rule | + +### Rule Fields + +```json +{ + "name": "Coverage below 70%", + "description": "Alert when overall coverage drops below 70%", + "rule_type": "coverage_drop", + "severity": "high", + "config": { + "threshold": 70.0, + "tactic_id": null + }, + "is_enabled": true, + "cooldown_hours": 24, + "notify_in_app": true, + "notify_webhook": true, + "webhook_id": "webhook-uuid-or-null" +} +``` + +### Severity Levels + +| Severity | Use case | +|----------|---------| +| `info` | Informational; no action needed immediately | +| `low` | Worth noting but not urgent | +| `medium` | Should be addressed in next sprint | +| `high` | Requires prompt attention | +| `critical` | Immediate action required | + +### Rule Configuration Examples + +**Coverage drop:** +```json +{"threshold": 75.0} +``` +Fires when organization score drops below 75%. + +**Stale test:** +```json +{"stale_days": 7} +``` +Fires for any test in executing/evaluating state for more than 7 days. + +**High risk uncovered:** +```json +{"min_severity": "high", "max_uncovered": 5} +``` +Fires when more than 5 high-severity techniques have no validated test. + +**Detection gap:** +```json +{"require_detection_rule": true} +``` +Fires for every validated attack test that has no linked detection rule. + +--- + +## Alert Instances + +When a rule's condition is met and the rule is not in cooldown, an alert instance is created. + +### Instance Lifecycle + +``` +open ──────────────> acknowledged ──────────────> resolved + │ │ + └────────────────> dismissed │ + │ │ + └── suppressed until └── final state + cooldown resets (immutable) +``` + +### Instance Fields + +```json +{ + "id": "uuid", + "rule_id": "uuid", + "rule_name": "Coverage below 70%", + "rule_type": "coverage_drop", + "severity": "high", + "status": "open", + "details": {"current_score": 67.3, "threshold": 70.0}, + "fired_at": "2024-03-15T10:00:00Z", + "acknowledged_at": null, + "acknowledged_by": null, + "resolved_at": null, + "dismissed_at": null +} +``` + +--- + +## Alert Lifecycle Actions + +### Acknowledge + +Marks the alert as seen and being investigated. Does NOT suppress re-firing. +```http +POST /api/v1/alerts/{id}/acknowledge +{"notes": "Investigating coverage drop — two campaigns just completed"} +``` +Required role: red_lead, blue_lead, admin + +### Resolve + +Marks the underlying issue as fixed. Prevents re-evaluation from creating a +duplicate alert (until cooldown expires and condition is met again). +```http +POST /api/v1/alerts/{id}/resolve +{"resolution_notes": "Coverage restored to 78% after campaign validation"} +``` +Required role: red_lead, blue_lead, admin + +### Dismiss + +Suppresses the alert for the rule's cooldown period. +```http +POST /api/v1/alerts/{id}/dismiss +{"reason": "Planned maintenance window — coverage drop expected"} +``` +Required role: red_lead, blue_lead, admin + +--- + +## Alert Evaluation + +### Automatic (hourly) + +Aegis runs alert evaluation every hour via APScheduler: +- Checks all `is_enabled=true` rules +- For each rule, evaluates the condition against current data +- Creates an instance if condition is met AND rule is not in cooldown +- Sends in-app notifications and/or webhook calls per rule configuration + +### Manual trigger + +```http +POST /api/v1/alerts/evaluate +``` +Required role: red_lead, blue_lead, admin + +Useful when you've made changes and want to check immediately without waiting for the hourly job. + +--- + +## In-App Notifications + +When `notify_in_app: true` on a rule, an in-app notification is sent to all users +with role red_lead, blue_lead, or admin. + +View notifications: +```http +GET /api/v1/notifications +``` + +Mark as read: +```http +PATCH /api/v1/notifications/{id} +{"is_read": true} +``` + +--- + +## Webhook Notifications + +When `notify_webhook: true` and a `webhook_id` is set, Aegis POSTs to the configured +webhook URL when the alert fires. + +Webhook payload: +```json +{ + "event": "alert.fired", + "alert_id": "uuid", + "rule_name": "Coverage below 70%", + "severity": "high", + "details": {"current_score": 67.3, "threshold": 70.0}, + "fired_at": "2024-03-15T10:00:00Z" +} +``` + +--- + +## Summary + +```http +GET /api/v1/alerts/summary +``` + +Returns: +```json +{ + "total": 12, + "by_status": {"open": 5, "acknowledged": 3, "resolved": 3, "dismissed": 1}, + "by_severity": {"critical": 1, "high": 4, "medium": 5, "low": 2, "info": 0}, + "by_type": { + "coverage_drop": 2, + "stale_test": 4, + "unvalidated_test": 3, + "high_risk_uncovered": 2, + "detection_gap": 1 + } +} +```