13 KiB
13 KiB
Aegis — Architecture
High-Level Overview
┌────────────────────┐ ┌─────────────────────┐
│ React Frontend │──────▶│ FastAPI Backend │
│ (Vite / TS / TW) │ REST │ (Python 3.11) │
└────────────────────┘ └──────┬──────┬────────┘
│ │
┌─────────┘ └─────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ MinIO │
│ (Data Store) │ │ (Object Storage) │
└─────────────────┘ └─────────────────┘
- Frontend — React 19 + TypeScript + Tailwind CSS v4 + TanStack Query
- Backend — FastAPI with SQLAlchemy ORM + Alembic migrations
- Database — PostgreSQL 15 with UUID primary keys and JSONB columns
- Object Storage — MinIO (S3-compatible) for evidence files
- Scheduler — APScheduler (in-process) for background jobs
Database Schema
Core Tables
| Table | Description |
|---|---|
users |
User accounts with role-based access (admin, red_tech, blue_tech, red_lead, blue_lead, viewer) |
techniques |
MITRE ATT&CK techniques with coverage status, tactic, platforms (JSONB) |
tests |
Security tests with full Red/Blue workflow fields, dual validation, remediation, and retest chain |
test_templates |
Predefined test catalog from Atomic Red Team, Sigma, CALDERA, LOLBAS, custom |
evidences |
Evidence files separated by team (red/blue) with SHA256 integrity verification |
Detection & Defense
| Table | Description |
|---|---|
detection_rules |
Imported detection rules (Sigma, Elastic, custom) linked to ATT&CK techniques |
test_detection_results |
Per-test detection rule evaluation results (triggered / not triggered) |
test_template_detection_rules |
Template ↔ detection rule associations |
defensive_techniques |
MITRE D3FEND defensive techniques |
defensive_technique_mappings |
ATT&CK technique ↔ D3FEND defensive technique mappings |
Campaigns & Scheduling
| Table | Description |
|---|---|
campaigns |
Test campaign groupings with scheduling (recurring, weekly/monthly/quarterly) |
campaign_tests |
Ordered test assignments within campaigns with dependency support |
Intelligence & Actors
| Table | Description |
|---|---|
threat_actors |
MITRE CTI intrusion sets with aliases, country, motivation, JSONB targets |
threat_actor_techniques |
Threat actor ↔ ATT&CK technique mappings |
intel_items |
Threat intelligence items from RSS feeds |
Compliance
| Table | Description |
|---|---|
compliance_frameworks |
Compliance frameworks (e.g., NIST 800-53) |
compliance_controls |
Individual controls within a framework |
compliance_control_mappings |
Control ↔ ATT&CK technique mappings |
Operational
| Table | Description |
|---|---|
coverage_snapshots |
Point-in-time coverage status captures with aggregate metrics |
snapshot_technique_states |
Normalized per-technique state within a snapshot |
audit_logs |
System-wide audit trail with JSONB details |
notifications |
In-app notifications with read status |
data_sources |
External data source configuration and sync status |
Key Relationships
Technique ──1:N── Test ──1:N── Evidence
│ │
│ ├── TestDetectionResult ──N:1── DetectionRule
│ └── CampaignTest ──N:1── Campaign
│
├── ThreatActorTechnique ──N:1── ThreatActor
├── DefensiveTechniqueMapping ──N:1── DefensiveTechnique
├── ComplianceControlMapping ──N:1── ComplianceControl ──N:1── ComplianceFramework
└── SnapshotTechniqueState ──N:1── CoverageSnapshot
Test ──retest_of──▶ Test (self-referential retest chain)
Campaign ──parent_campaign_id──▶ Campaign (recurring execution history)
Backend Architecture
Layered Structure
routers/ ← Thin HTTP adapters (auth, params, response shaping — zero inline ORM)
↓
services/ ← Framework-agnostic business logic (46 service modules, ~250 functions)
↓
domain/ ← Pure business rules (entities, value objects, ports, errors — zero framework imports)
↓
infrastructure/ ← Repository implementations (SQLAlchemy), Redis, mappers
↓
models/ ← SQLAlchemy ORM models (persistence mapping only)
↓
database.py ← Engine + session management (lazy initialization)
Dependency rule: routers → services → domain ← infrastructure. Dependencies always point inward toward domain.
Transaction management: Services never call db.commit(). Routers manage transactions via UnitOfWork. Import services and background jobs are documented exceptions (self-contained batch operations).
Services
Business Logic Services
| Service | Responsibility |
|---|---|
test_workflow_service |
Test state machine (draft → validated/rejected) with dual validation |
test_crud_service |
Test CRUD, query logic, permission validation |
scoring_service |
0–100 scoring for techniques, tactics, actors, organization |
scoring_config_service |
DB-persisted scoring weights with validation |
score_cache |
In-memory TTL cache (5 min) for expensive score/metric calculations |
operational_metrics_service |
MTTD, MTTR, detection efficacy, alert fidelity, coverage velocity |
metrics_query_service |
Dashboard aggregation queries |
advanced_metrics_service |
Coverage by tactic, never-tested, avg validation time, detection trends |
analytics_service |
BI-ready flat datasets (coverage, tests, trends, operators) |
snapshot_service |
Coverage snapshot CRUD, temporal comparison, cleanup |
campaign_crud_service |
Campaign CRUD, lifecycle, scheduling |
campaign_service |
Campaign progress tracking, circular dependency prevention |
campaign_scheduler_service |
Recurring campaign execution (clone + schedule next run) |
status_service |
Technique status recalculation from test results |
coverage_report_service |
Coverage report generation and CSV export |
compliance_service |
Compliance framework analysis and gap detection |
detection_rule_service |
Detection rule queries, auto-association, evaluation |
threat_actor_service |
Threat actor queries, coverage, gap analysis |
evidence_service |
Evidence permission validation and queries |
heatmap_service |
ATT&CK Navigator layer generation |
test_template_service |
Test template CRUD, stats, bulk-activate, filtered queries |
auth_service |
Credential validation, password management |
user_service |
User CRUD, role validation, password hashing |
audit_query_service |
Paginated audit log queries and distinct lookups |
audit_service |
Immutable audit trail logging (write-only) |
data_source_service |
Data source CRUD, sync dispatch, statistics |
notification_service |
In-app notification CRUD, state-change alerts, role-based dispatch |
technique_query_service |
Technique detail queries with test/D3FEND aggregation |
d3fend_query_service |
D3FEND defensive technique listing and tactic queries |
osint_enrichment_service |
OSINT item queries, enrichment, summary statistics |
worklog_service |
Worklog CRUD, integrity verification |
intel_service |
RSS-based threat intelligence scanning |
Import Services (all satisfy ImportService protocol)
| Service | Responsibility |
|---|---|
mitre_sync_service |
MITRE ATT&CK sync via TAXII 2.0 / GitHub fallback |
atomic_import_service |
Atomic Red Team template import from GitHub |
sigma_import_service |
SigmaHQ detection rule import |
elastic_import_service |
Elastic detection rule import (TOML) |
caldera_import_service |
CALDERA ability import |
lolbas_import_service |
LOLBAS/GTFOBins template import |
d3fend_import_service |
MITRE D3FEND defensive technique import |
threat_actor_import_service |
MITRE CTI threat actor import (STIX) |
compliance_import_service |
NIST 800-53 ↔ ATT&CK mapping import |
Domain Layer
domain/
├── entities/ # Rich domain entities with business logic
│ ├── technique.py # TechniqueEntity with status recalculation
│ ├── campaign.py # CampaignEntity with lifecycle state machine
│ ├── compliance.py # ComplianceFrameworkEntity with coverage calculation
│ └── threat_actor.py # ThreatActorEntity with coverage analysis
├── value_objects/ # Immutable value types
│ ├── mitre_id.py # MITRE ATT&CK ID validation
│ └── scoring_weights.py # Scoring weights (sum=100, non-negative)
├── ports/ # Interfaces (Protocol contracts)
│ ├── repositories/ # TechniqueRepository, TestRepository
│ └── import_service.py # ImportService protocol + IMPORT_REGISTRY
├── errors.py # Domain exceptions (EntityNotFoundError, etc.)
├── enums.py # TestState, TechniqueStatus, TestResult
├── test_entity.py # TestEntity with state machine + domain events
└── unit_of_work.py # UnitOfWork context manager
Scheduled Jobs (APScheduler)
| Job | Schedule | Description |
|---|---|---|
| MITRE Sync | Every 24h | Sync ATT&CK techniques from TAXII/GitHub |
| Intel Scan | Every 7 days | Scan RSS feeds for threat intelligence |
| Notification Cleanup | Every 24h | Remove old read notifications |
| Weekly Snapshot | Sundays 00:00 | Create coverage snapshot + cleanup old ones |
| Recurring Campaigns | Every 24h | Check and execute due recurring campaigns |
Test Lifecycle (State Machine)
┌──────┐ ┌──────────────┐ ┌─────────────────┐ ┌───────────┐
│ DRAFT│───▶│RED_EXECUTING │───▶│ BLUE_EVALUATING │───▶│ IN_REVIEW │
└──────┘ └──────────────┘ └─────────────────┘ └─────┬─────┘
│
┌───────────────────┤
▼ ▼
┌──────────┐ ┌──────────┐
│ REJECTED │ │VALIDATED │
└────┬─────┘ └──────────┘
│ │
└──▶ Back to DRAFT ├──▶ Remediation
└──▶ Auto Re-test
Dual Validation in IN_REVIEW:
- Red Lead votes approve/reject
- Blue Lead votes approve/reject
- Both approve → VALIDATED
- Either rejects → REJECTED
- One votes, other pending → stays IN_REVIEW
Auto Re-testing: When remediation is completed on a validated test, the system automatically creates a follow-up retest (up to MAX_RETEST_COUNT = 3).
Frontend Architecture
Key Technologies
- React 19 with TypeScript
- Vite 7 for bundling
- Tailwind CSS v4 for styling
- TanStack Query for server state management
- TanStack Virtual for table virtualization
- React Router v7 for routing
- Recharts for charts and visualizations
- Lucide React for icons
Page Lazy Loading
All pages except LoginPage and DashboardPage are lazy-loaded via React.lazy() with <Suspense> fallbacks for optimal initial bundle size.
Role-Based Navigation
The sidebar dynamically filters navigation items based on the current user's role:
| Section | Visible to |
|---|---|
| Dashboard | All roles |
| Executive Dashboard | admin, red_lead, blue_lead |
| ATT&CK Matrix | All roles |
| Tests (sub-menu) | All roles |
| Campaigns | All roles |
| Threat Actors | All roles |
| Compliance | All roles |
| Comparison | admin, red_lead, blue_lead |
| Reports | All roles |
| System (admin section) | admin only |
Performance Optimizations
- React.memo on
HeatmapCell(renders 3000+ times in full matrix) - useMemo / useCallback for expensive calculations in memoized components
- useDebounce hook for search inputs (300ms delay)
- TanStack Virtual for large table virtualization (test templates, detection rules, audit logs)
- Lazy loading for all non-critical page bundles