Add wiki page: Operational-Alerts
@@ -0,0 +1,227 @@
|
|||||||
|
# Operational Alerts
|
||||||
|
|
||||||
|
The operational alerts system monitors the state of your security coverage and
|
||||||
|
notifies the team when conditions fall below defined thresholds.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alert Rules
|
||||||
|
|
||||||
|
Alert rules define **what to check** and **when to fire**. Each rule has a type,
|
||||||
|
severity, configuration thresholds, and notification preferences.
|
||||||
|
|
||||||
|
### Rule Types
|
||||||
|
|
||||||
|
| Rule Type | What it checks |
|
||||||
|
|-----------|---------------|
|
||||||
|
| `coverage_drop` | Overall coverage score drops below a threshold |
|
||||||
|
| `stale_test` | A test has been in `red_executing` or `blue_evaluating` for too long |
|
||||||
|
| `unvalidated_test` | Tests stuck in `in_review` beyond a threshold duration |
|
||||||
|
| `high_risk_uncovered` | High-severity techniques have no validated tests |
|
||||||
|
| `detection_gap` | Technique has validated attack tests but no detection rule |
|
||||||
|
|
||||||
|
### Rule Fields
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Coverage below 70%",
|
||||||
|
"description": "Alert when overall coverage drops below 70%",
|
||||||
|
"rule_type": "coverage_drop",
|
||||||
|
"severity": "high",
|
||||||
|
"config": {
|
||||||
|
"threshold": 70.0,
|
||||||
|
"tactic_id": null
|
||||||
|
},
|
||||||
|
"is_enabled": true,
|
||||||
|
"cooldown_hours": 24,
|
||||||
|
"notify_in_app": true,
|
||||||
|
"notify_webhook": true,
|
||||||
|
"webhook_id": "webhook-uuid-or-null"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Severity Levels
|
||||||
|
|
||||||
|
| Severity | Use case |
|
||||||
|
|----------|---------|
|
||||||
|
| `info` | Informational; no action needed immediately |
|
||||||
|
| `low` | Worth noting but not urgent |
|
||||||
|
| `medium` | Should be addressed in next sprint |
|
||||||
|
| `high` | Requires prompt attention |
|
||||||
|
| `critical` | Immediate action required |
|
||||||
|
|
||||||
|
### Rule Configuration Examples
|
||||||
|
|
||||||
|
**Coverage drop:**
|
||||||
|
```json
|
||||||
|
{"threshold": 75.0}
|
||||||
|
```
|
||||||
|
Fires when organization score drops below 75%.
|
||||||
|
|
||||||
|
**Stale test:**
|
||||||
|
```json
|
||||||
|
{"stale_days": 7}
|
||||||
|
```
|
||||||
|
Fires for any test in executing/evaluating state for more than 7 days.
|
||||||
|
|
||||||
|
**High risk uncovered:**
|
||||||
|
```json
|
||||||
|
{"min_severity": "high", "max_uncovered": 5}
|
||||||
|
```
|
||||||
|
Fires when more than 5 high-severity techniques have no validated test.
|
||||||
|
|
||||||
|
**Detection gap:**
|
||||||
|
```json
|
||||||
|
{"require_detection_rule": true}
|
||||||
|
```
|
||||||
|
Fires for every validated attack test that has no linked detection rule.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alert Instances
|
||||||
|
|
||||||
|
When a rule's condition is met and the rule is not in cooldown, an alert instance is created.
|
||||||
|
|
||||||
|
### Instance Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
open ──────────────> acknowledged ──────────────> resolved
|
||||||
|
│ │
|
||||||
|
└────────────────> dismissed │
|
||||||
|
│ │
|
||||||
|
└── suppressed until └── final state
|
||||||
|
cooldown resets (immutable)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Instance Fields
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": "uuid",
|
||||||
|
"rule_id": "uuid",
|
||||||
|
"rule_name": "Coverage below 70%",
|
||||||
|
"rule_type": "coverage_drop",
|
||||||
|
"severity": "high",
|
||||||
|
"status": "open",
|
||||||
|
"details": {"current_score": 67.3, "threshold": 70.0},
|
||||||
|
"fired_at": "2024-03-15T10:00:00Z",
|
||||||
|
"acknowledged_at": null,
|
||||||
|
"acknowledged_by": null,
|
||||||
|
"resolved_at": null,
|
||||||
|
"dismissed_at": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alert Lifecycle Actions
|
||||||
|
|
||||||
|
### Acknowledge
|
||||||
|
|
||||||
|
Marks the alert as seen and being investigated. Does NOT suppress re-firing.
|
||||||
|
```http
|
||||||
|
POST /api/v1/alerts/{id}/acknowledge
|
||||||
|
{"notes": "Investigating coverage drop — two campaigns just completed"}
|
||||||
|
```
|
||||||
|
Required role: red_lead, blue_lead, admin
|
||||||
|
|
||||||
|
### Resolve
|
||||||
|
|
||||||
|
Marks the underlying issue as fixed. Prevents re-evaluation from creating a
|
||||||
|
duplicate alert (until cooldown expires and condition is met again).
|
||||||
|
```http
|
||||||
|
POST /api/v1/alerts/{id}/resolve
|
||||||
|
{"resolution_notes": "Coverage restored to 78% after campaign validation"}
|
||||||
|
```
|
||||||
|
Required role: red_lead, blue_lead, admin
|
||||||
|
|
||||||
|
### Dismiss
|
||||||
|
|
||||||
|
Suppresses the alert for the rule's cooldown period.
|
||||||
|
```http
|
||||||
|
POST /api/v1/alerts/{id}/dismiss
|
||||||
|
{"reason": "Planned maintenance window — coverage drop expected"}
|
||||||
|
```
|
||||||
|
Required role: red_lead, blue_lead, admin
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alert Evaluation
|
||||||
|
|
||||||
|
### Automatic (hourly)
|
||||||
|
|
||||||
|
Aegis runs alert evaluation every hour via APScheduler:
|
||||||
|
- Checks all `is_enabled=true` rules
|
||||||
|
- For each rule, evaluates the condition against current data
|
||||||
|
- Creates an instance if condition is met AND rule is not in cooldown
|
||||||
|
- Sends in-app notifications and/or webhook calls per rule configuration
|
||||||
|
|
||||||
|
### Manual trigger
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/v1/alerts/evaluate
|
||||||
|
```
|
||||||
|
Required role: red_lead, blue_lead, admin
|
||||||
|
|
||||||
|
Useful when you've made changes and want to check immediately without waiting for the hourly job.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## In-App Notifications
|
||||||
|
|
||||||
|
When `notify_in_app: true` on a rule, an in-app notification is sent to all users
|
||||||
|
with role red_lead, blue_lead, or admin.
|
||||||
|
|
||||||
|
View notifications:
|
||||||
|
```http
|
||||||
|
GET /api/v1/notifications
|
||||||
|
```
|
||||||
|
|
||||||
|
Mark as read:
|
||||||
|
```http
|
||||||
|
PATCH /api/v1/notifications/{id}
|
||||||
|
{"is_read": true}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Webhook Notifications
|
||||||
|
|
||||||
|
When `notify_webhook: true` and a `webhook_id` is set, Aegis POSTs to the configured
|
||||||
|
webhook URL when the alert fires.
|
||||||
|
|
||||||
|
Webhook payload:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"event": "alert.fired",
|
||||||
|
"alert_id": "uuid",
|
||||||
|
"rule_name": "Coverage below 70%",
|
||||||
|
"severity": "high",
|
||||||
|
"details": {"current_score": 67.3, "threshold": 70.0},
|
||||||
|
"fired_at": "2024-03-15T10:00:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/alerts/summary
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"total": 12,
|
||||||
|
"by_status": {"open": 5, "acknowledged": 3, "resolved": 3, "dismissed": 1},
|
||||||
|
"by_severity": {"critical": 1, "high": 4, "medium": 5, "low": 2, "info": 0},
|
||||||
|
"by_type": {
|
||||||
|
"coverage_drop": 2,
|
||||||
|
"stale_test": 4,
|
||||||
|
"unvalidated_test": 3,
|
||||||
|
"high_risk_uncovered": 2,
|
||||||
|
"detection_gap": 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|||||||
Reference in New Issue
Block a user