Files
Aegis/docs/TECHNOLOGY_JUSTIFICATION.md

22 KiB
Raw Permalink Blame History

Aegis — Technology Justification

Document type: Architecture Board Submission
Author: Platform Architecture Team
Date: February 11, 2026
Classification: Internal
Status: Approved


1. Purpose

This document provides a formal justification for the technology selections made in the Aegis platform. Each technology choice is evaluated against the project's operational requirements, organizational constraints, security posture, and long-term sustainability. This document is intended for review by the Architecture Board and serves as the authoritative reference for technology governance.


2. Project Context

Aegis is an internal security operations platform that manages MITRE ATT&CK technique coverage through structured Red Team / Blue Team validation workflows. The platform integrates with 9 external threat intelligence and detection rule sources, enforces role-based access for 6 distinct user roles, and provides coverage analytics including heatmaps, scoring, compliance mapping, and executive reporting.

Operational Requirements

Requirement Detail
Deployment model On-premise, single-server, air-gap compatible
User base 10100 concurrent security analysts and leads
Data model 18+ relational entities with many-to-many relationships and semi-structured metadata
External integrations 9 data sources (MITRE TAXII 2.0, GitHub REST, D3FEND REST, Sigma YAML, Elastic TOML, CALDERA YAML, LOLBAS YAML, GTFOBins Markdown, STIX 2.0 JSON)
File storage Binary evidence files (screenshots, logs, PCAPs) ranging from KB to hundreds of MB
Scheduled operations 5 periodic background jobs (24h7d cycles)
Security RBAC, JWT authentication, audit logging, evidence chain of custody

Organizational Constraints

Constraint Detail
Team expertise Primary competency in Python and TypeScript
Target operators Security engineers, not DevOps specialists
Infrastructure Docker available; Kubernetes not guaranteed
Network Outbound HTTPS required for data source sync; inbound limited to platform UI
Budget Open-source preference; no commercial license dependencies for core platform

3. Backend Framework: FastAPI

Selection: FastAPI 0.x (latest stable) with Uvicorn ASGI server

Justification

FastAPI was selected as the backend framework based on four primary evaluation criteria: API development velocity, ecosystem compatibility, runtime performance, and developer experience.

API Development Velocity. Aegis exposes 80+ REST endpoints across 21 domain modules. FastAPI's automatic OpenAPI specification generation from Python type annotations eliminates the need for separate API documentation tooling. Pydantic integration provides request and response validation at the framework level, reducing boilerplate code for schema enforcement. The Depends() dependency injection system enables composable middleware chains for authentication, authorization, and database session management without requiring a third-party DI container.

Ecosystem Compatibility. The platform's 9 external data source integrations depend on Python-specific libraries with no mature equivalents in other ecosystems:

Library Purpose Ecosystem
taxii2-client STIX/TAXII 2.0 protocol Python only
pySigma Sigma rule parsing and transformation Python only
PyYAML YAML parsing (Atomic Red Team, CALDERA, LOLBAS) Python preferred
toml TOML parsing (Elastic detection rules) Python preferred
boto3 S3-compatible storage API (MinIO) Python preferred
defusedxml Secure XML processing Python preferred

Selecting a non-Python backend would require reimplementing or wrapping these libraries, introducing significant engineering risk.

Runtime Performance. FastAPI's ASGI foundation provides asynchronous request handling capability. While the current implementation uses synchronous route handlers (due to SQLAlchemy's synchronous session model), the framework does not impose a performance ceiling for the target user base (10100 concurrent users). Benchmark data from independent testing consistently places FastAPI among the highest-performing Python web frameworks.

Developer Experience. Interactive Swagger UI (/docs) and ReDoc (/redoc) are available in non-production environments, accelerating API exploration and frontend integration. These documentation endpoints are automatically disabled in production to reduce attack surface.

Alternatives Evaluated

Framework Evaluation Summary Disposition
Django + Django REST Framework Mature and feature-rich, but introduces heavier ORM opinions, an unnecessary admin panel, and slower cold-start times. Django's ORM lacks SQLAlchemy's flexibility for JSONB column handling and complex join patterns required by the scoring engine. Rejected
Flask + Flask-RESTful Lightweight but lacks built-in request validation, automatic OpenAPI generation, and dependency injection. Would require additional libraries (marshmallow, flask-apispec) to achieve parity with FastAPI's built-in capabilities. Rejected
Go (Gin / Echo) Superior raw throughput, but the team's primary expertise is Python. The 9 data source integrations depend on Python libraries with no Go equivalents. The development velocity loss would outweigh performance gains for a 10100 user internal platform. Rejected
NestJS (Node.js / TypeScript) Would unify frontend and backend language, but splits runtime expertise. No mature Node.js equivalents for STIX/TAXII and Sigma rule parsing. The Python data science and security tooling ecosystem is substantially deeper. Rejected

4. Primary Database: PostgreSQL 15

Selection: PostgreSQL 15 (Alpine) with SQLAlchemy ORM and Alembic migrations

Justification

PostgreSQL was selected as the primary relational data store based on three requirements: relational integrity for a complex domain model, semi-structured data support for external source metadata, and operational maturity for on-premise deployment.

Relational Integrity. The Aegis data model comprises 18+ entities with deep relational dependencies: techniques relate to tests, tests belong to campaigns, campaigns map to threat actors, threat actors link to techniques, compliance controls map to techniques, and detection rules associate with both techniques and test templates. This graph of many-to-many relationships demands foreign key enforcement, transactional consistency, and efficient join operations — core strengths of a relational database.

Semi-Structured Data (JSONB). Several entities carry metadata with variable structure imported from external sources (STIX 2.0, Sigma YAML, Elastic TOML). PostgreSQL's native JSONB column type stores this data in a binary-indexed format that supports containment queries and GIN indexing, eliminating the need for a separate document store. Current JSONB usage is contained to 12 columns across 6 tables:

Entity JSONB Fields Content
Technique platforms OS platform array from ATT&CK
Threat Actor aliases, target_sectors, target_regions, references STIX 2.0 metadata
Detection Rule platforms, log_sources Rule targeting metadata
Data Source last_sync_stats, config Import statistics and source-specific configuration
Campaign tags User-defined classification
Audit Log details Action-specific metadata (variable per action type)

Operational Maturity. PostgreSQL 15 provides built-in health checking (pg_isready), mature backup tooling (pg_dump/pg_restore), extensive monitoring capabilities, and a 25+ year track record of production reliability. The Alpine-based Docker image is approximately 80MB, suitable for on-premise deployments with limited resources.

Schema Management. Alembic provides version-controlled database migrations (18 versions to date), enabling reproducible schema evolution and rollback capability.

Alternatives Evaluated

Database Evaluation Summary Disposition
MongoDB The core domain is deeply relational. Modeling technique-test-campaign-actor relationships in MongoDB would require denormalization or manual reference integrity, trading the JSONB advantage for relational integrity loss. Rejected
MySQL 8 (JSON) PostgreSQL's JSONB is binary-indexed and faster for containment queries than MySQL's text-based JSON type. PostgreSQL also provides native UUID support (vs. BINARY(16) in MySQL), which aligns with the platform's UUID-based primary keys. Rejected
PostgreSQL + MongoDB (dual) The operational complexity of maintaining two database systems is unjustified for 12 JSONB columns. A dual-database architecture would also complicate transactional consistency across relational and document data. Rejected

5. Object Storage: MinIO

Selection: MinIO (S3-compatible) accessed via boto3 (AWS S3 SDK for Python)

Justification

MinIO was selected as the evidence storage system based on three requirements: S3 API compatibility for portability, on-premise deployment capability, and separation of binary data from the relational database.

S3 API Compatibility. MinIO implements the Amazon S3 API specification, accessed via the industry-standard boto3 client library. This provides a zero-code-change migration path to AWS S3, Google Cloud Storage (via S3-compatible mode), or any other S3-compatible storage service should the deployment model change from on-premise to cloud. The storage interface (upload_file, get_presigned_url, ensure_bucket_exists) is a thin abstraction layer that is storage-backend agnostic.

On-Premise Deployment. The platform is designed for deployment within organizational security environments where external cloud storage services may not be permitted due to data classification or regulatory requirements. MinIO runs as a single Docker container with persistent volume storage, requiring no external dependencies or network egress for storage operations.

Binary Data Separation. Evidence files (screenshots, packet captures, log extracts) range from kilobytes to hundreds of megabytes. Storing binary data in PostgreSQL (BYTEA columns) would degrade database backup performance, increase storage costs, and complicate streaming downloads. MinIO's presigned URL mechanism offloads download bandwidth from the application server — the browser fetches evidence files directly from MinIO without proxying through the backend.

Administrative Visibility. MinIO Console (port 9001) provides a web-based management interface for administrators to inspect, audit, and manage stored evidence files without requiring command-line access.

Alternatives Evaluated

Storage Evaluation Summary Disposition
PostgreSQL BYTEA Stores binary files in the relational database. Bloats backups, degrades query performance on large tables, and requires the backend to proxy all file downloads. Not designed as a file store. Rejected
Local filesystem Not portable across container restarts without host volume mounts. No presigned URL support (backend must proxy all downloads). No replication, versioning, or management interface. Rejected
AWS S3 Requires a cloud account, internet connectivity for storage operations, and AWS credential management. Incompatible with air-gap or restricted-network deployment requirements. Rejected (as primary; migration path preserved)
SeaweedFS Smaller community and less mature S3-compatible API layer. boto3 compatibility is not fully guaranteed. Insufficient adoption for long-term support confidence. Rejected

6. Frontend: React 19 + TypeScript 5.9

Selection: React 19, TypeScript 5.9, Vite 7.3, Tailwind CSS 4, TanStack React Query 5

Justification

The frontend technology selection was driven by four criteria: component ecosystem maturity, type safety for a complex domain, build tooling performance, and developer productivity.

Component Ecosystem Maturity. Aegis presents a complex user interface comprising 21 pages, 30+ components, and specialized visualizations including ATT&CK Navigator-compatible heatmaps, campaign timelines, compliance gauges, and multi-role workflow views. React's component model and its ecosystem (Recharts for data visualization, Lucide for iconography, TanStack Virtual for list virtualization) provide production-ready solutions for each of these requirements.

Type Safety. TypeScript's static type system enforces correctness across the API communication layer (22 domain-specific API modules), shared type definitions (types/models.ts), and component props. With strict: true in tsconfig.json, the compiler catches null reference errors, incorrect property access, and type mismatches at build time rather than runtime. This is particularly valuable for the complex test workflow state machine, where state-dependent UI behavior must correctly reflect 6 possible test states and 6 user roles.

Build Tooling. Vite provides sub-second hot module replacement during development and optimized production builds via Rollup. The multi-stage Docker build produces a minimal Nginx image (~25MB) serving pre-compiled static assets, eliminating the need for a Node.js runtime in production.

Server State Management. TanStack React Query manages all server-side state (caching, refetching, mutation invalidation), eliminating the need for a client-side state management library (Redux, Zustand, MobX) for data fetching concerns. Authentication state is managed via React Context, and UI feedback via a Toast context — both lightweight patterns that avoid unnecessary library dependencies.

Styling. Tailwind CSS 4 provides utility-first styling with zero-runtime CSS generation. The design system is consistent across all 21 pages without maintaining a separate CSS architecture or component library.

Alternatives Evaluated

Framework Evaluation Summary Disposition
Angular Comprehensive framework with built-in DI, routing, and HTTP client. However, the heavier abstraction layer and steeper learning curve are unnecessary for a team with React experience. Angular's opinionated module system adds boilerplate for a project of this scale. Rejected
Vue 3 + TypeScript Viable alternative with good TypeScript support and a smaller learning curve. However, the React ecosystem offers deeper library coverage for specialized components (ATT&CK heatmaps, data grids, chart libraries). The team's existing React proficiency favors continuity. Rejected
Svelte / SvelteKit Excellent developer experience and smaller bundle sizes, but a significantly smaller ecosystem for complex data visualization. Library availability for heatmaps, virtual scrolling, and charting is limited compared to React. Rejected
HTMX + server-rendered templates Would reduce frontend complexity but cannot support the interactive heatmap, drag-and-drop campaign management, real-time notification updates, and complex multi-step workflow forms required by the platform. Rejected

7. Containerization and Deployment: Docker Compose

Selection: Docker with Docker Compose (V2), multi-stage Dockerfiles

Justification

Docker Compose was selected as the deployment orchestration tool based on three requirements: single-command deployment for non-DevOps operators, consistent development-to-production environments, and minimal infrastructure prerequisites.

Operator Accessibility. The platform is deployed by security engineers who may not have Kubernetes expertise or access to container orchestration infrastructure. Docker Compose provides single-command deployment (docker compose up -d --build) with an interactive installation script (install.sh) that generates secrets, prompts for configuration, and produces a .env file. This reduces deployment complexity to a level appropriate for the target operator profile.

Environment Consistency. Two compose files maintain parity between development and production:

Aspect Development Production
Frontend Vite dev server, hot reload Nginx serving static build
Backend Source volume-mounted, auto-reload Multi-stage build, non-root user
Ports All services exposed Only frontend exposed
Secrets Auto-generated ephemeral Required via environment

Infrastructure Footprint. The entire platform (4 services) runs on a single server with Docker as the only prerequisite. Named volumes provide data persistence across container rebuilds. Health checks and dependency ordering ensure correct startup sequencing.

Security Hardening. The backend Dockerfile follows container security best practices: non-root user (appuser, UID 1001), minimal base image (python:3.11-slim), and no unnecessary system packages beyond build dependencies.

Alternatives Evaluated

Platform Evaluation Summary Disposition
Kubernetes Provides horizontal scaling, rolling deployments, and self-healing. However, it requires a cluster, kubectl expertise, Helm charts, ingress controllers, and persistent volume claims. This operational overhead is disproportionate for a 4-service application targeting single-server deployment. Rejected (viable future evolution for multi-server)
Docker Swarm Adds orchestration with lower complexity than Kubernetes but provides minimal benefit over Compose for < 5 services. Docker Swarm's development trajectory has stalled relative to Compose V2. Rejected
Bare metal / systemd Loses containerization benefits: isolation, reproducibility, and dependency management. Would require manual installation of Python, Node.js, PostgreSQL, and MinIO on each target system, increasing deployment failure risk. Rejected

8. CI/CD and Artifact Management: GitHub Enterprise + Artifactory

Selection: GitHub Enterprise for source control and CI/CD; JFrog Artifactory for artifact storage

Status: Planned — not yet implemented

Justification

GitHub Enterprise and Artifactory are designated as the CI/CD and artifact management platforms for Aegis based on organizational standardization, security requirements, and the artifact lifecycle.

Organizational Standardization. GitHub Enterprise is the organization's standard source control and CI/CD platform. Adopting it for Aegis ensures consistency with existing developer workflows, access control policies, and audit mechanisms. Security teams reviewing the Aegis codebase will use familiar tooling and processes.

CI/CD Pipeline (Planned). The following GitHub Actions workflow stages are planned:

Stage Tools Trigger
Lint ruff (Python), ESLint (TypeScript) Push to any branch
Type check mypy (Python), tsc --noEmit (TypeScript) Push to any branch
Unit tests pytest (backend), vitest (frontend) Push to any branch
Integration tests pytest with PostgreSQL service container Pull request to main
Docker build Multi-stage Dockerfile verification Pull request to main
Image publish Docker build + push to Artifactory Merge to main
Deploy Docker Compose pull + restart Manual trigger or tag

Artifact Repository. Artifactory serves as the Docker image registry for versioned backend and frontend images. This provides:

  • Versioned releases: Each merge to main produces a tagged image (aegis-backend:1.2.3, aegis-frontend:1.2.3).
  • Rollback capability: Previous image versions remain available for rapid rollback.
  • Vulnerability scanning: Artifactory's Xray integration enables automated CVE scanning of Docker image layers.
  • Access control: Image pull/push permissions align with organizational RBAC policies.

Air-Gap Deployment Support. For restricted-network deployments, Docker images can be exported from Artifactory as tarballs (docker save), transferred via secure media, and loaded into the target environment (docker load) without requiring network connectivity to the registry.

Implementation Timeline

Phase Scope Estimated Effort
Phase 1 Basic CI: lint + type check + unit tests 12 days
Phase 2 Integration tests with PostgreSQL service container 23 days
Phase 3 Docker image build + Artifactory publish 12 days
Phase 4 Automated deployment trigger 23 days

9. Technology Stack Summary

Layer Technology Version License Purpose
Backend Python 3.11 PSF Runtime
FastAPI latest MIT Web framework
Uvicorn latest BSD ASGI server
SQLAlchemy latest MIT ORM
Alembic latest MIT Migrations
Pydantic v2 MIT Validation
APScheduler latest MIT Background jobs
boto3 latest Apache 2.0 S3 storage client
Frontend React 19.2 MIT UI framework
TypeScript 5.9 Apache 2.0 Type safety
Vite 7.3 MIT Build tooling
Tailwind CSS 4.1 MIT Styling
TanStack Query 5.90 MIT Server state
Recharts 2.15 MIT Visualization
Database PostgreSQL 15 PostgreSQL Relational store
Storage MinIO latest AGPL-3.0 Object storage
Infrastructure Docker latest Apache 2.0 Containerization
Docker Compose V2 Apache 2.0 Orchestration
Nginx Alpine BSD Reverse proxy
CI/CD GitHub Enterprise Commercial Source control + CI
Artifacts Artifactory Commercial Image registry

License Compliance Note

All core platform dependencies use permissive open-source licenses (MIT, BSD, Apache 2.0, PSF, PostgreSQL License). The only copyleft dependency is MinIO (AGPL-3.0), which is used as a standalone service (not linked into application code) and therefore does not impose AGPL obligations on the Aegis codebase. GitHub Enterprise and Artifactory are covered under existing organizational commercial licenses.


10. Approval

Role Name Date Signature
Platform Architect
Security Architect
Infrastructure Lead
Development Lead
Architecture Board Chair