Files
Aegis/docs/ARCHITECTURE.md

221 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Aegis — Architecture
## High-Level Overview
```
┌────────────────────┐ ┌─────────────────────┐
│ React Frontend │──────▶│ FastAPI Backend │
│ (Vite / TS / TW) │ REST │ (Python 3.11) │
└────────────────────┘ └──────┬──────┬────────┘
│ │
┌─────────┘ └─────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ MinIO │
│ (Data Store) │ │ (Object Storage) │
└─────────────────┘ └─────────────────┘
```
- **Frontend** — React 19 + TypeScript + Tailwind CSS v4 + TanStack Query
- **Backend** — FastAPI with SQLAlchemy ORM + Alembic migrations
- **Database** — PostgreSQL 15 with UUID primary keys and JSONB columns
- **Object Storage** — MinIO (S3-compatible) for evidence files
- **Scheduler** — APScheduler (in-process) for background jobs
---
## Database Schema
### Core Tables
| Table | Description |
|-------|-------------|
| `users` | User accounts with role-based access (admin, red_tech, blue_tech, red_lead, blue_lead, viewer) |
| `techniques` | MITRE ATT&CK techniques with coverage status, tactic, platforms (JSONB) |
| `tests` | Security tests with full Red/Blue workflow fields, dual validation, remediation, and retest chain |
| `test_templates` | Predefined test catalog from Atomic Red Team, Sigma, CALDERA, LOLBAS, custom |
| `evidences` | Evidence files separated by team (red/blue) with SHA256 integrity verification |
### Detection & Defense
| Table | Description |
|-------|-------------|
| `detection_rules` | Imported detection rules (Sigma, Elastic, custom) linked to ATT&CK techniques |
| `test_detection_results` | Per-test detection rule evaluation results (triggered / not triggered) |
| `test_template_detection_rules` | Template ↔ detection rule associations |
| `defensive_techniques` | MITRE D3FEND defensive techniques |
| `defensive_technique_mappings` | ATT&CK technique ↔ D3FEND defensive technique mappings |
### Campaigns & Scheduling
| Table | Description |
|-------|-------------|
| `campaigns` | Test campaign groupings with scheduling (recurring, weekly/monthly/quarterly) |
| `campaign_tests` | Ordered test assignments within campaigns with dependency support |
### Intelligence & Actors
| Table | Description |
|-------|-------------|
| `threat_actors` | MITRE CTI intrusion sets with aliases, country, motivation, JSONB targets |
| `threat_actor_techniques` | Threat actor ↔ ATT&CK technique mappings |
| `intel_items` | Threat intelligence items from RSS feeds |
### Compliance
| Table | Description |
|-------|-------------|
| `compliance_frameworks` | Compliance frameworks (e.g., NIST 800-53) |
| `compliance_controls` | Individual controls within a framework |
| `compliance_control_mappings` | Control ↔ ATT&CK technique mappings |
### Operational
| Table | Description |
|-------|-------------|
| `coverage_snapshots` | Point-in-time coverage status captures with aggregate metrics |
| `snapshot_technique_states` | Normalized per-technique state within a snapshot |
| `audit_logs` | System-wide audit trail with JSONB details |
| `notifications` | In-app notifications with read status |
| `data_sources` | External data source configuration and sync status |
### Key Relationships
```
Technique ──1:N── Test ──1:N── Evidence
│ │
│ ├── TestDetectionResult ──N:1── DetectionRule
│ └── CampaignTest ──N:1── Campaign
├── ThreatActorTechnique ──N:1── ThreatActor
├── DefensiveTechniqueMapping ──N:1── DefensiveTechnique
├── ComplianceControlMapping ──N:1── ComplianceControl ──N:1── ComplianceFramework
└── SnapshotTechniqueState ──N:1── CoverageSnapshot
Test ──retest_of──▶ Test (self-referential retest chain)
Campaign ──parent_campaign_id──▶ Campaign (recurring execution history)
```
---
## Backend Architecture
### Layered Structure
```
routers/ ← HTTP endpoints (input validation, auth, response shaping)
services/ ← Business logic (state machines, calculations, imports)
models/ ← SQLAlchemy ORM models
database.py ← Engine + session management (lazy initialization)
```
### Services
| Service | Responsibility |
|---------|---------------|
| `test_workflow_service` | Test state machine (draft → validated/rejected) with dual validation |
| `scoring_service` | 0100 scoring for techniques, tactics, actors, organization |
| `score_cache` | In-memory TTL cache (5 min) for expensive score/metric calculations |
| `operational_metrics_service` | MTTD, MTTR, detection efficacy, alert fidelity, coverage velocity |
| `snapshot_service` | Coverage snapshot creation, temporal comparison, cleanup |
| `campaign_service` | Campaign CRUD, progress tracking, circular dependency prevention |
| `campaign_scheduler_service` | Recurring campaign execution (clone + schedule next run) |
| `status_service` | Technique status recalculation from test results |
| `notification_service` | In-app notification CRUD and state-change alerts |
| `audit_service` | Immutable audit trail logging |
| `mitre_sync_service` | MITRE ATT&CK sync via TAXII 2.0 / GitHub fallback |
| `atomic_import_service` | Atomic Red Team template import from GitHub |
| `sigma_import_service` | SigmaHQ detection rule import |
| `elastic_import_service` | Elastic detection rule import (TOML) |
| `caldera_import_service` | CALDERA ability import |
| `lolbas_import_service` | LOLBAS/GTFOBins template import |
| `d3fend_import_service` | MITRE D3FEND defensive technique import |
| `threat_actor_import_service` | MITRE CTI threat actor import (STIX) |
| `compliance_import_service` | NIST 800-53 ↔ ATT&CK mapping import |
| `intel_service` | RSS-based threat intelligence scanning |
### Scheduled Jobs (APScheduler)
| Job | Schedule | Description |
|-----|----------|-------------|
| MITRE Sync | Every 24h | Sync ATT&CK techniques from TAXII/GitHub |
| Intel Scan | Every 7 days | Scan RSS feeds for threat intelligence |
| Notification Cleanup | Every 24h | Remove old read notifications |
| Weekly Snapshot | Sundays 00:00 | Create coverage snapshot + cleanup old ones |
| Recurring Campaigns | Every 24h | Check and execute due recurring campaigns |
---
## Test Lifecycle (State Machine)
```
┌──────┐ ┌──────────────┐ ┌─────────────────┐ ┌───────────┐
│ DRAFT│───▶│RED_EXECUTING │───▶│ BLUE_EVALUATING │───▶│ IN_REVIEW │
└──────┘ └──────────────┘ └─────────────────┘ └─────┬─────┘
┌───────────────────┤
▼ ▼
┌──────────┐ ┌──────────┐
│ REJECTED │ │VALIDATED │
└────┬─────┘ └──────────┘
│ │
└──▶ Back to DRAFT ├──▶ Remediation
└──▶ Auto Re-test
```
**Dual Validation in IN_REVIEW:**
- Red Lead votes approve/reject
- Blue Lead votes approve/reject
- Both approve → VALIDATED
- Either rejects → REJECTED
- One votes, other pending → stays IN_REVIEW
**Auto Re-testing:** When remediation is completed on a validated test, the system automatically creates a follow-up retest (up to `MAX_RETEST_COUNT` = 3).
---
## Frontend Architecture
### Key Technologies
- **React 19** with TypeScript
- **Vite 7** for bundling
- **Tailwind CSS v4** for styling
- **TanStack Query** for server state management
- **TanStack Virtual** for table virtualization
- **React Router v7** for routing
- **Recharts** for charts and visualizations
- **Lucide React** for icons
### Page Lazy Loading
All pages except `LoginPage` and `DashboardPage` are lazy-loaded via `React.lazy()` with `<Suspense>` fallbacks for optimal initial bundle size.
### Role-Based Navigation
The sidebar dynamically filters navigation items based on the current user's role:
| Section | Visible to |
|---------|-----------|
| Dashboard | All roles |
| Executive Dashboard | admin, red_lead, blue_lead |
| ATT&CK Matrix | All roles |
| Tests (sub-menu) | All roles |
| Campaigns | All roles |
| Threat Actors | All roles |
| Compliance | All roles |
| Comparison | admin, red_lead, blue_lead |
| Reports | All roles |
| System (admin section) | admin only |
### Performance Optimizations
- **React.memo** on `HeatmapCell` (renders 3000+ times in full matrix)
- **useMemo** / **useCallback** for expensive calculations in memoized components
- **useDebounce** hook for search inputs (300ms delay)
- **TanStack Virtual** for large table virtualization (test templates, detection rules, audit logs)
- **Lazy loading** for all non-critical page bundles