Files
Aegis/docs/ARCHITECTURE.md

261 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Aegis — Architecture
## High-Level Overview
```
┌────────────────────┐ ┌─────────────────────┐
│ React Frontend │──────▶│ FastAPI Backend │
│ (Vite / TS / TW) │ REST │ (Python 3.11) │
└────────────────────┘ └──────┬──────┬────────┘
│ │
┌─────────┘ └─────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ MinIO │
│ (Data Store) │ │ (Object Storage) │
└─────────────────┘ └─────────────────┘
```
- **Frontend** — React 19 + TypeScript + Tailwind CSS v4 + TanStack Query
- **Backend** — FastAPI with SQLAlchemy ORM + Alembic migrations
- **Database** — PostgreSQL 15 with UUID primary keys and JSONB columns
- **Object Storage** — MinIO (S3-compatible) for evidence files
- **Scheduler** — APScheduler (in-process) for background jobs
---
## Database Schema
### Core Tables
| Table | Description |
|-------|-------------|
| `users` | User accounts with role-based access (admin, red_tech, blue_tech, red_lead, blue_lead, viewer) |
| `techniques` | MITRE ATT&CK techniques with coverage status, tactic, platforms (JSONB) |
| `tests` | Security tests with full Red/Blue workflow fields, dual validation, remediation, and retest chain |
| `test_templates` | Predefined test catalog from Atomic Red Team, Sigma, CALDERA, LOLBAS, custom |
| `evidences` | Evidence files separated by team (red/blue) with SHA256 integrity verification |
### Detection & Defense
| Table | Description |
|-------|-------------|
| `detection_rules` | Imported detection rules (Sigma, Elastic, custom) linked to ATT&CK techniques |
| `test_detection_results` | Per-test detection rule evaluation results (triggered / not triggered) |
| `test_template_detection_rules` | Template ↔ detection rule associations |
| `defensive_techniques` | MITRE D3FEND defensive techniques |
| `defensive_technique_mappings` | ATT&CK technique ↔ D3FEND defensive technique mappings |
### Campaigns & Scheduling
| Table | Description |
|-------|-------------|
| `campaigns` | Test campaign groupings with scheduling (recurring, weekly/monthly/quarterly) |
| `campaign_tests` | Ordered test assignments within campaigns with dependency support |
### Intelligence & Actors
| Table | Description |
|-------|-------------|
| `threat_actors` | MITRE CTI intrusion sets with aliases, country, motivation, JSONB targets |
| `threat_actor_techniques` | Threat actor ↔ ATT&CK technique mappings |
| `intel_items` | Threat intelligence items from RSS feeds |
### Compliance
| Table | Description |
|-------|-------------|
| `compliance_frameworks` | Compliance frameworks (e.g., NIST 800-53) |
| `compliance_controls` | Individual controls within a framework |
| `compliance_control_mappings` | Control ↔ ATT&CK technique mappings |
### Operational
| Table | Description |
|-------|-------------|
| `coverage_snapshots` | Point-in-time coverage status captures with aggregate metrics |
| `snapshot_technique_states` | Normalized per-technique state within a snapshot |
| `audit_logs` | System-wide audit trail with JSONB details |
| `notifications` | In-app notifications with read status |
| `data_sources` | External data source configuration and sync status |
### Key Relationships
```
Technique ──1:N── Test ──1:N── Evidence
│ │
│ ├── TestDetectionResult ──N:1── DetectionRule
│ └── CampaignTest ──N:1── Campaign
├── ThreatActorTechnique ──N:1── ThreatActor
├── DefensiveTechniqueMapping ──N:1── DefensiveTechnique
├── ComplianceControlMapping ──N:1── ComplianceControl ──N:1── ComplianceFramework
└── SnapshotTechniqueState ──N:1── CoverageSnapshot
Test ──retest_of──▶ Test (self-referential retest chain)
Campaign ──parent_campaign_id──▶ Campaign (recurring execution history)
```
---
## Backend Architecture
### Layered Structure
```
routers/ ← HTTP endpoints (input validation, auth, response shaping)
services/ ← Business logic (state machines, calculations, imports)
models/ ← SQLAlchemy ORM models
database.py ← Engine + session management (lazy initialization)
```
### Services
#### Business Logic Services
| Service | Responsibility |
|---------|---------------|
| `test_workflow_service` | Test state machine (draft → validated/rejected) with dual validation |
| `test_crud_service` | Test CRUD, query logic, permission validation |
| `scoring_service` | 0100 scoring for techniques, tactics, actors, organization |
| `scoring_config_service` | DB-persisted scoring weights with validation |
| `score_cache` | In-memory TTL cache (5 min) for expensive score/metric calculations |
| `operational_metrics_service` | MTTD, MTTR, detection efficacy, alert fidelity, coverage velocity |
| `metrics_query_service` | Dashboard aggregation queries |
| `snapshot_service` | Coverage snapshot creation, temporal comparison, cleanup |
| `campaign_crud_service` | Campaign CRUD, lifecycle, scheduling |
| `campaign_service` | Campaign progress tracking, circular dependency prevention |
| `campaign_scheduler_service` | Recurring campaign execution (clone + schedule next run) |
| `status_service` | Technique status recalculation from test results |
| `coverage_report_service` | Coverage report generation and CSV export |
| `compliance_service` | Compliance framework analysis and gap detection |
| `detection_rule_service` | Detection rule queries, auto-association, evaluation |
| `threat_actor_service` | Threat actor queries, coverage, gap analysis |
| `evidence_service` | Evidence permission validation and queries |
| `heatmap_service` | ATT&CK Navigator layer generation |
| `user_service` | User CRUD, role validation, password hashing |
| `audit_query_service` | Paginated audit log queries and distinct lookups |
| `audit_service` | Immutable audit trail logging (write-only) |
| `data_source_service` | Data source CRUD, sync dispatch, statistics |
| `notification_service` | In-app notification CRUD and state-change alerts |
| `intel_service` | RSS-based threat intelligence scanning |
#### Import Services (all satisfy `ImportService` protocol)
| Service | Responsibility |
|---------|---------------|
| `mitre_sync_service` | MITRE ATT&CK sync via TAXII 2.0 / GitHub fallback |
| `atomic_import_service` | Atomic Red Team template import from GitHub |
| `sigma_import_service` | SigmaHQ detection rule import |
| `elastic_import_service` | Elastic detection rule import (TOML) |
| `caldera_import_service` | CALDERA ability import |
| `lolbas_import_service` | LOLBAS/GTFOBins template import |
| `d3fend_import_service` | MITRE D3FEND defensive technique import |
| `threat_actor_import_service` | MITRE CTI threat actor import (STIX) |
| `compliance_import_service` | NIST 800-53 ↔ ATT&CK mapping import |
### Domain Layer
```
domain/
├── entities/ # Rich domain entities with business logic
│ ├── technique.py # TechniqueEntity with status recalculation
│ ├── campaign.py # CampaignEntity with lifecycle state machine
│ └── compliance.py # ComplianceFrameworkEntity with coverage calculation
├── value_objects/ # Immutable value types
│ ├── mitre_id.py # MITRE ATT&CK ID validation
│ └── scoring_weights.py # Scoring weights (sum=100, non-negative)
├── ports/ # Interfaces (Protocol contracts)
│ ├── repositories/ # TechniqueRepository, TestRepository
│ └── import_service.py # ImportService protocol + IMPORT_REGISTRY
├── errors.py # Domain exceptions (EntityNotFoundError, etc.)
├── enums.py # TestState, TechniqueStatus, TestResult
├── test_entity.py # TestEntity with state machine + domain events
└── unit_of_work.py # UnitOfWork context manager
```
### Scheduled Jobs (APScheduler)
| Job | Schedule | Description |
|-----|----------|-------------|
| MITRE Sync | Every 24h | Sync ATT&CK techniques from TAXII/GitHub |
| Intel Scan | Every 7 days | Scan RSS feeds for threat intelligence |
| Notification Cleanup | Every 24h | Remove old read notifications |
| Weekly Snapshot | Sundays 00:00 | Create coverage snapshot + cleanup old ones |
| Recurring Campaigns | Every 24h | Check and execute due recurring campaigns |
---
## Test Lifecycle (State Machine)
```
┌──────┐ ┌──────────────┐ ┌─────────────────┐ ┌───────────┐
│ DRAFT│───▶│RED_EXECUTING │───▶│ BLUE_EVALUATING │───▶│ IN_REVIEW │
└──────┘ └──────────────┘ └─────────────────┘ └─────┬─────┘
┌───────────────────┤
▼ ▼
┌──────────┐ ┌──────────┐
│ REJECTED │ │VALIDATED │
└────┬─────┘ └──────────┘
│ │
└──▶ Back to DRAFT ├──▶ Remediation
└──▶ Auto Re-test
```
**Dual Validation in IN_REVIEW:**
- Red Lead votes approve/reject
- Blue Lead votes approve/reject
- Both approve → VALIDATED
- Either rejects → REJECTED
- One votes, other pending → stays IN_REVIEW
**Auto Re-testing:** When remediation is completed on a validated test, the system automatically creates a follow-up retest (up to `MAX_RETEST_COUNT` = 3).
---
## Frontend Architecture
### Key Technologies
- **React 19** with TypeScript
- **Vite 7** for bundling
- **Tailwind CSS v4** for styling
- **TanStack Query** for server state management
- **TanStack Virtual** for table virtualization
- **React Router v7** for routing
- **Recharts** for charts and visualizations
- **Lucide React** for icons
### Page Lazy Loading
All pages except `LoginPage` and `DashboardPage` are lazy-loaded via `React.lazy()` with `<Suspense>` fallbacks for optimal initial bundle size.
### Role-Based Navigation
The sidebar dynamically filters navigation items based on the current user's role:
| Section | Visible to |
|---------|-----------|
| Dashboard | All roles |
| Executive Dashboard | admin, red_lead, blue_lead |
| ATT&CK Matrix | All roles |
| Tests (sub-menu) | All roles |
| Campaigns | All roles |
| Threat Actors | All roles |
| Compliance | All roles |
| Comparison | admin, red_lead, blue_lead |
| Reports | All roles |
| System (admin section) | admin only |
### Performance Optimizations
- **React.memo** on `HeatmapCell` (renders 3000+ times in full matrix)
- **useMemo** / **useCallback** for expensive calculations in memoized components
- **useDebounce** hook for search inputs (300ms delay)
- **TanStack Virtual** for large table virtualization (test templates, detection rules, audit logs)
- **Lazy loading** for all non-critical page bundles