Add wiki page: Operational-Alerts

2026-05-22 12:33:06 +00:00
parent 817679c70b
commit 73be05601e

@@ -0,0 +1,227 @@
# Operational Alerts
The operational alerts system monitors the state of your security coverage and
notifies the team when conditions fall below defined thresholds.
---
## Alert Rules
Alert rules define **what to check** and **when to fire**. Each rule has a type,
severity, configuration thresholds, and notification preferences.
### Rule Types
| Rule Type | What it checks |
|-----------|---------------|
| `coverage_drop` | Overall coverage score drops below a threshold |
| `stale_test` | A test has been in `red_executing` or `blue_evaluating` for too long |
| `unvalidated_test` | Tests stuck in `in_review` beyond a threshold duration |
| `high_risk_uncovered` | High-severity techniques have no validated tests |
| `detection_gap` | Technique has validated attack tests but no detection rule |
### Rule Fields
```json
{
"name": "Coverage below 70%",
"description": "Alert when overall coverage drops below 70%",
"rule_type": "coverage_drop",
"severity": "high",
"config": {
"threshold": 70.0,
"tactic_id": null
},
"is_enabled": true,
"cooldown_hours": 24,
"notify_in_app": true,
"notify_webhook": true,
"webhook_id": "webhook-uuid-or-null"
}
```
### Severity Levels
| Severity | Use case |
|----------|---------|
| `info` | Informational; no action needed immediately |
| `low` | Worth noting but not urgent |
| `medium` | Should be addressed in next sprint |
| `high` | Requires prompt attention |
| `critical` | Immediate action required |
### Rule Configuration Examples
**Coverage drop:**
```json
{"threshold": 75.0}
```
Fires when organization score drops below 75%.
**Stale test:**
```json
{"stale_days": 7}
```
Fires for any test in executing/evaluating state for more than 7 days.
**High risk uncovered:**
```json
{"min_severity": "high", "max_uncovered": 5}
```
Fires when more than 5 high-severity techniques have no validated test.
**Detection gap:**
```json
{"require_detection_rule": true}
```
Fires for every validated attack test that has no linked detection rule.
---
## Alert Instances
When a rule's condition is met and the rule is not in cooldown, an alert instance is created.
### Instance Lifecycle
```
open ──────────────> acknowledged ──────────────> resolved
│ │
└────────────────> dismissed │
│ │
└── suppressed until └── final state
cooldown resets (immutable)
```
### Instance Fields
```json
{
"id": "uuid",
"rule_id": "uuid",
"rule_name": "Coverage below 70%",
"rule_type": "coverage_drop",
"severity": "high",
"status": "open",
"details": {"current_score": 67.3, "threshold": 70.0},
"fired_at": "2024-03-15T10:00:00Z",
"acknowledged_at": null,
"acknowledged_by": null,
"resolved_at": null,
"dismissed_at": null
}
```
---
## Alert Lifecycle Actions
### Acknowledge
Marks the alert as seen and being investigated. Does NOT suppress re-firing.
```http
POST /api/v1/alerts/{id}/acknowledge
{"notes": "Investigating coverage drop two campaigns just completed"}
```
Required role: red_lead, blue_lead, admin
### Resolve
Marks the underlying issue as fixed. Prevents re-evaluation from creating a
duplicate alert (until cooldown expires and condition is met again).
```http
POST /api/v1/alerts/{id}/resolve
{"resolution_notes": "Coverage restored to 78% after campaign validation"}
```
Required role: red_lead, blue_lead, admin
### Dismiss
Suppresses the alert for the rule's cooldown period.
```http
POST /api/v1/alerts/{id}/dismiss
{"reason": "Planned maintenance window coverage drop expected"}
```
Required role: red_lead, blue_lead, admin
---
## Alert Evaluation
### Automatic (hourly)
Aegis runs alert evaluation every hour via APScheduler:
- Checks all `is_enabled=true` rules
- For each rule, evaluates the condition against current data
- Creates an instance if condition is met AND rule is not in cooldown
- Sends in-app notifications and/or webhook calls per rule configuration
### Manual trigger
```http
POST /api/v1/alerts/evaluate
```
Required role: red_lead, blue_lead, admin
Useful when you've made changes and want to check immediately without waiting for the hourly job.
---
## In-App Notifications
When `notify_in_app: true` on a rule, an in-app notification is sent to all users
with role red_lead, blue_lead, or admin.
View notifications:
```http
GET /api/v1/notifications
```
Mark as read:
```http
PATCH /api/v1/notifications/{id}
{"is_read": true}
```
---
## Webhook Notifications
When `notify_webhook: true` and a `webhook_id` is set, Aegis POSTs to the configured
webhook URL when the alert fires.
Webhook payload:
```json
{
"event": "alert.fired",
"alert_id": "uuid",
"rule_name": "Coverage below 70%",
"severity": "high",
"details": {"current_score": 67.3, "threshold": 70.0},
"fired_at": "2024-03-15T10:00:00Z"
}
```
---
## Summary
```http
GET /api/v1/alerts/summary
```
Returns:
```json
{
"total": 12,
"by_status": {"open": 5, "acknowledged": 3, "resolved": 3, "dismissed": 1},
"by_severity": {"critical": 1, "high": 4, "medium": 5, "low": 2, "info": 0},
"by_type": {
"coverage_drop": 2,
"stale_test": 4,
"unvalidated_test": 3,
"high_risk_uncovered": 2,
"detection_gap": 1
}
}
```