Files
Aegis/docs/ARCHITECTURE.md
2026-02-20 16:16:22 +01:00

13 KiB
Raw Permalink Blame History

Aegis — Architecture

High-Level Overview

┌────────────────────┐       ┌─────────────────────┐
│   React Frontend   │──────▶│   FastAPI Backend    │
│  (Vite / TS / TW)  │ REST  │  (Python 3.11)      │
└────────────────────┘       └──────┬──────┬────────┘
                                    │      │
                          ┌─────────┘      └─────────┐
                          ▼                          ▼
                 ┌─────────────────┐       ┌─────────────────┐
                 │   PostgreSQL    │       │     MinIO        │
                 │  (Data Store)   │       │ (Object Storage) │
                 └─────────────────┘       └─────────────────┘
  • Frontend — React 19 + TypeScript + Tailwind CSS v4 + TanStack Query
  • Backend — FastAPI with SQLAlchemy ORM + Alembic migrations
  • Database — PostgreSQL 15 with UUID primary keys and JSONB columns
  • Object Storage — MinIO (S3-compatible) for evidence files
  • Scheduler — APScheduler (in-process) for background jobs

Database Schema

Core Tables

Table Description
users User accounts with role-based access (admin, red_tech, blue_tech, red_lead, blue_lead, viewer)
techniques MITRE ATT&CK techniques with coverage status, tactic, platforms (JSONB)
tests Security tests with full Red/Blue workflow fields, dual validation, remediation, and retest chain
test_templates Predefined test catalog from Atomic Red Team, Sigma, CALDERA, LOLBAS, custom
evidences Evidence files separated by team (red/blue) with SHA256 integrity verification

Detection & Defense

Table Description
detection_rules Imported detection rules (Sigma, Elastic, custom) linked to ATT&CK techniques
test_detection_results Per-test detection rule evaluation results (triggered / not triggered)
test_template_detection_rules Template ↔ detection rule associations
defensive_techniques MITRE D3FEND defensive techniques
defensive_technique_mappings ATT&CK technique ↔ D3FEND defensive technique mappings

Campaigns & Scheduling

Table Description
campaigns Test campaign groupings with scheduling (recurring, weekly/monthly/quarterly)
campaign_tests Ordered test assignments within campaigns with dependency support

Intelligence & Actors

Table Description
threat_actors MITRE CTI intrusion sets with aliases, country, motivation, JSONB targets
threat_actor_techniques Threat actor ↔ ATT&CK technique mappings
intel_items Threat intelligence items from RSS feeds

Compliance

Table Description
compliance_frameworks Compliance frameworks (e.g., NIST 800-53)
compliance_controls Individual controls within a framework
compliance_control_mappings Control ↔ ATT&CK technique mappings

Operational

Table Description
coverage_snapshots Point-in-time coverage status captures with aggregate metrics
snapshot_technique_states Normalized per-technique state within a snapshot
audit_logs System-wide audit trail with JSONB details
notifications In-app notifications with read status
data_sources External data source configuration and sync status

Key Relationships

Technique ──1:N── Test ──1:N── Evidence
    │                │
    │                ├── TestDetectionResult ──N:1── DetectionRule
    │                └── CampaignTest ──N:1── Campaign
    │
    ├── ThreatActorTechnique ──N:1── ThreatActor
    ├── DefensiveTechniqueMapping ──N:1── DefensiveTechnique
    ├── ComplianceControlMapping ──N:1── ComplianceControl ──N:1── ComplianceFramework
    └── SnapshotTechniqueState ──N:1── CoverageSnapshot

Test ──retest_of──▶ Test  (self-referential retest chain)
Campaign ──parent_campaign_id──▶ Campaign  (recurring execution history)

Backend Architecture

Layered Structure

routers/          ← Thin HTTP adapters (auth, params, response shaping — zero inline ORM)
  ↓
services/         ← Framework-agnostic business logic (46 service modules, ~250 functions)
  ↓
domain/           ← Pure business rules (entities, value objects, ports, errors — zero framework imports)
  ↓
infrastructure/   ← Repository implementations (SQLAlchemy), Redis, mappers
  ↓
models/           ← SQLAlchemy ORM models (persistence mapping only)
  ↓
database.py       ← Engine + session management (lazy initialization)

Dependency rule: routers → services → domain ← infrastructure. Dependencies always point inward toward domain.

Transaction management: Services never call db.commit(). Routers manage transactions via UnitOfWork. Import services and background jobs are documented exceptions (self-contained batch operations).

Services

Business Logic Services

Service Responsibility
test_workflow_service Test state machine (draft → validated/rejected) with dual validation
test_crud_service Test CRUD, query logic, permission validation
scoring_service 0100 scoring for techniques, tactics, actors, organization
scoring_config_service DB-persisted scoring weights with validation
score_cache In-memory TTL cache (5 min) for expensive score/metric calculations
operational_metrics_service MTTD, MTTR, detection efficacy, alert fidelity, coverage velocity
metrics_query_service Dashboard aggregation queries
advanced_metrics_service Coverage by tactic, never-tested, avg validation time, detection trends
analytics_service BI-ready flat datasets (coverage, tests, trends, operators)
snapshot_service Coverage snapshot CRUD, temporal comparison, cleanup
campaign_crud_service Campaign CRUD, lifecycle, scheduling
campaign_service Campaign progress tracking, circular dependency prevention
campaign_scheduler_service Recurring campaign execution (clone + schedule next run)
status_service Technique status recalculation from test results
coverage_report_service Coverage report generation and CSV export
compliance_service Compliance framework analysis and gap detection
detection_rule_service Detection rule queries, auto-association, evaluation
threat_actor_service Threat actor queries, coverage, gap analysis
evidence_service Evidence permission validation and queries
heatmap_service ATT&CK Navigator layer generation
test_template_service Test template CRUD, stats, bulk-activate, filtered queries
auth_service Credential validation, password management
user_service User CRUD, role validation, password hashing
audit_query_service Paginated audit log queries and distinct lookups
audit_service Immutable audit trail logging (write-only)
data_source_service Data source CRUD, sync dispatch, statistics
notification_service In-app notification CRUD, state-change alerts, role-based dispatch
technique_query_service Technique detail queries with test/D3FEND aggregation
d3fend_query_service D3FEND defensive technique listing and tactic queries
osint_enrichment_service OSINT item queries, enrichment, summary statistics
worklog_service Worklog CRUD, integrity verification
intel_service RSS-based threat intelligence scanning

Import Services (all satisfy ImportService protocol)

Service Responsibility
mitre_sync_service MITRE ATT&CK sync via TAXII 2.0 / GitHub fallback
atomic_import_service Atomic Red Team template import from GitHub
sigma_import_service SigmaHQ detection rule import
elastic_import_service Elastic detection rule import (TOML)
caldera_import_service CALDERA ability import
lolbas_import_service LOLBAS/GTFOBins template import
d3fend_import_service MITRE D3FEND defensive technique import
threat_actor_import_service MITRE CTI threat actor import (STIX)
compliance_import_service NIST 800-53 ↔ ATT&CK mapping import

Domain Layer

domain/
├── entities/              # Rich domain entities with business logic
│   ├── technique.py       # TechniqueEntity with status recalculation
│   ├── campaign.py        # CampaignEntity with lifecycle state machine
│   ├── compliance.py      # ComplianceFrameworkEntity with coverage calculation
│   └── threat_actor.py    # ThreatActorEntity with coverage analysis
├── value_objects/          # Immutable value types
│   ├── mitre_id.py        # MITRE ATT&CK ID validation
│   └── scoring_weights.py # Scoring weights (sum=100, non-negative)
├── ports/                  # Interfaces (Protocol contracts)
│   ├── repositories/      # TechniqueRepository, TestRepository
│   └── import_service.py  # ImportService protocol + IMPORT_REGISTRY
├── errors.py              # Domain exceptions (EntityNotFoundError, etc.)
├── enums.py               # TestState, TechniqueStatus, TestResult
├── test_entity.py         # TestEntity with state machine + domain events
└── unit_of_work.py        # UnitOfWork context manager

Scheduled Jobs (APScheduler)

Job Schedule Description
MITRE Sync Every 24h Sync ATT&CK techniques from TAXII/GitHub
Intel Scan Every 7 days Scan RSS feeds for threat intelligence
Notification Cleanup Every 24h Remove old read notifications
Weekly Snapshot Sundays 00:00 Create coverage snapshot + cleanup old ones
Recurring Campaigns Every 24h Check and execute due recurring campaigns

Test Lifecycle (State Machine)

┌──────┐    ┌──────────────┐    ┌─────────────────┐    ┌───────────┐
│ DRAFT│───▶│RED_EXECUTING │───▶│ BLUE_EVALUATING  │───▶│ IN_REVIEW │
└──────┘    └──────────────┘    └─────────────────┘    └─────┬─────┘
                                                             │
                                         ┌───────────────────┤
                                         ▼                   ▼
                                   ┌──────────┐       ┌──────────┐
                                   │ REJECTED │       │VALIDATED │
                                   └────┬─────┘       └──────────┘
                                        │                    │
                                        └──▶ Back to DRAFT   ├──▶ Remediation
                                                             └──▶ Auto Re-test

Dual Validation in IN_REVIEW:

  • Red Lead votes approve/reject
  • Blue Lead votes approve/reject
  • Both approve → VALIDATED
  • Either rejects → REJECTED
  • One votes, other pending → stays IN_REVIEW

Auto Re-testing: When remediation is completed on a validated test, the system automatically creates a follow-up retest (up to MAX_RETEST_COUNT = 3).


Frontend Architecture

Key Technologies

  • React 19 with TypeScript
  • Vite 7 for bundling
  • Tailwind CSS v4 for styling
  • TanStack Query for server state management
  • TanStack Virtual for table virtualization
  • React Router v7 for routing
  • Recharts for charts and visualizations
  • Lucide React for icons

Page Lazy Loading

All pages except LoginPage and DashboardPage are lazy-loaded via React.lazy() with <Suspense> fallbacks for optimal initial bundle size.

Role-Based Navigation

The sidebar dynamically filters navigation items based on the current user's role:

Section Visible to
Dashboard All roles
Executive Dashboard admin, red_lead, blue_lead
ATT&CK Matrix All roles
Tests (sub-menu) All roles
Campaigns All roles
Threat Actors All roles
Compliance All roles
Comparison admin, red_lead, blue_lead
Reports All roles
System (admin section) admin only

Performance Optimizations

  • React.memo on HeatmapCell (renders 3000+ times in full matrix)
  • useMemo / useCallback for expensive calculations in memoized components
  • useDebounce hook for search inputs (300ms delay)
  • TanStack Virtual for large table virtualization (test templates, detection rules, audit logs)
  • Lazy loading for all non-critical page bundles