Add wiki page: Operational-Alerts
@@ -0,0 +1,227 @@
|
||||
# Operational Alerts
|
||||
|
||||
The operational alerts system monitors the state of your security coverage and
|
||||
notifies the team when conditions fall below defined thresholds.
|
||||
|
||||
---
|
||||
|
||||
## Alert Rules
|
||||
|
||||
Alert rules define **what to check** and **when to fire**. Each rule has a type,
|
||||
severity, configuration thresholds, and notification preferences.
|
||||
|
||||
### Rule Types
|
||||
|
||||
| Rule Type | What it checks |
|
||||
|-----------|---------------|
|
||||
| `coverage_drop` | Overall coverage score drops below a threshold |
|
||||
| `stale_test` | A test has been in `red_executing` or `blue_evaluating` for too long |
|
||||
| `unvalidated_test` | Tests stuck in `in_review` beyond a threshold duration |
|
||||
| `high_risk_uncovered` | High-severity techniques have no validated tests |
|
||||
| `detection_gap` | Technique has validated attack tests but no detection rule |
|
||||
|
||||
### Rule Fields
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Coverage below 70%",
|
||||
"description": "Alert when overall coverage drops below 70%",
|
||||
"rule_type": "coverage_drop",
|
||||
"severity": "high",
|
||||
"config": {
|
||||
"threshold": 70.0,
|
||||
"tactic_id": null
|
||||
},
|
||||
"is_enabled": true,
|
||||
"cooldown_hours": 24,
|
||||
"notify_in_app": true,
|
||||
"notify_webhook": true,
|
||||
"webhook_id": "webhook-uuid-or-null"
|
||||
}
|
||||
```
|
||||
|
||||
### Severity Levels
|
||||
|
||||
| Severity | Use case |
|
||||
|----------|---------|
|
||||
| `info` | Informational; no action needed immediately |
|
||||
| `low` | Worth noting but not urgent |
|
||||
| `medium` | Should be addressed in next sprint |
|
||||
| `high` | Requires prompt attention |
|
||||
| `critical` | Immediate action required |
|
||||
|
||||
### Rule Configuration Examples
|
||||
|
||||
**Coverage drop:**
|
||||
```json
|
||||
{"threshold": 75.0}
|
||||
```
|
||||
Fires when organization score drops below 75%.
|
||||
|
||||
**Stale test:**
|
||||
```json
|
||||
{"stale_days": 7}
|
||||
```
|
||||
Fires for any test in executing/evaluating state for more than 7 days.
|
||||
|
||||
**High risk uncovered:**
|
||||
```json
|
||||
{"min_severity": "high", "max_uncovered": 5}
|
||||
```
|
||||
Fires when more than 5 high-severity techniques have no validated test.
|
||||
|
||||
**Detection gap:**
|
||||
```json
|
||||
{"require_detection_rule": true}
|
||||
```
|
||||
Fires for every validated attack test that has no linked detection rule.
|
||||
|
||||
---
|
||||
|
||||
## Alert Instances
|
||||
|
||||
When a rule's condition is met and the rule is not in cooldown, an alert instance is created.
|
||||
|
||||
### Instance Lifecycle
|
||||
|
||||
```
|
||||
open ──────────────> acknowledged ──────────────> resolved
|
||||
│ │
|
||||
└────────────────> dismissed │
|
||||
│ │
|
||||
└── suppressed until └── final state
|
||||
cooldown resets (immutable)
|
||||
```
|
||||
|
||||
### Instance Fields
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"rule_id": "uuid",
|
||||
"rule_name": "Coverage below 70%",
|
||||
"rule_type": "coverage_drop",
|
||||
"severity": "high",
|
||||
"status": "open",
|
||||
"details": {"current_score": 67.3, "threshold": 70.0},
|
||||
"fired_at": "2024-03-15T10:00:00Z",
|
||||
"acknowledged_at": null,
|
||||
"acknowledged_by": null,
|
||||
"resolved_at": null,
|
||||
"dismissed_at": null
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Alert Lifecycle Actions
|
||||
|
||||
### Acknowledge
|
||||
|
||||
Marks the alert as seen and being investigated. Does NOT suppress re-firing.
|
||||
```http
|
||||
POST /api/v1/alerts/{id}/acknowledge
|
||||
{"notes": "Investigating coverage drop — two campaigns just completed"}
|
||||
```
|
||||
Required role: red_lead, blue_lead, admin
|
||||
|
||||
### Resolve
|
||||
|
||||
Marks the underlying issue as fixed. Prevents re-evaluation from creating a
|
||||
duplicate alert (until cooldown expires and condition is met again).
|
||||
```http
|
||||
POST /api/v1/alerts/{id}/resolve
|
||||
{"resolution_notes": "Coverage restored to 78% after campaign validation"}
|
||||
```
|
||||
Required role: red_lead, blue_lead, admin
|
||||
|
||||
### Dismiss
|
||||
|
||||
Suppresses the alert for the rule's cooldown period.
|
||||
```http
|
||||
POST /api/v1/alerts/{id}/dismiss
|
||||
{"reason": "Planned maintenance window — coverage drop expected"}
|
||||
```
|
||||
Required role: red_lead, blue_lead, admin
|
||||
|
||||
---
|
||||
|
||||
## Alert Evaluation
|
||||
|
||||
### Automatic (hourly)
|
||||
|
||||
Aegis runs alert evaluation every hour via APScheduler:
|
||||
- Checks all `is_enabled=true` rules
|
||||
- For each rule, evaluates the condition against current data
|
||||
- Creates an instance if condition is met AND rule is not in cooldown
|
||||
- Sends in-app notifications and/or webhook calls per rule configuration
|
||||
|
||||
### Manual trigger
|
||||
|
||||
```http
|
||||
POST /api/v1/alerts/evaluate
|
||||
```
|
||||
Required role: red_lead, blue_lead, admin
|
||||
|
||||
Useful when you've made changes and want to check immediately without waiting for the hourly job.
|
||||
|
||||
---
|
||||
|
||||
## In-App Notifications
|
||||
|
||||
When `notify_in_app: true` on a rule, an in-app notification is sent to all users
|
||||
with role red_lead, blue_lead, or admin.
|
||||
|
||||
View notifications:
|
||||
```http
|
||||
GET /api/v1/notifications
|
||||
```
|
||||
|
||||
Mark as read:
|
||||
```http
|
||||
PATCH /api/v1/notifications/{id}
|
||||
{"is_read": true}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Webhook Notifications
|
||||
|
||||
When `notify_webhook: true` and a `webhook_id` is set, Aegis POSTs to the configured
|
||||
webhook URL when the alert fires.
|
||||
|
||||
Webhook payload:
|
||||
```json
|
||||
{
|
||||
"event": "alert.fired",
|
||||
"alert_id": "uuid",
|
||||
"rule_name": "Coverage below 70%",
|
||||
"severity": "high",
|
||||
"details": {"current_score": 67.3, "threshold": 70.0},
|
||||
"fired_at": "2024-03-15T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
```http
|
||||
GET /api/v1/alerts/summary
|
||||
```
|
||||
|
||||
Returns:
|
||||
```json
|
||||
{
|
||||
"total": 12,
|
||||
"by_status": {"open": 5, "acknowledged": 3, "resolved": 3, "dismissed": 1},
|
||||
"by_severity": {"critical": 1, "high": 4, "medium": 5, "low": 2, "info": 0},
|
||||
"by_type": {
|
||||
"coverage_drop": 2,
|
||||
"stale_test": 4,
|
||||
"unvalidated_test": 3,
|
||||
"high_risk_uncovered": 2,
|
||||
"detection_gap": 1
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user