diff --git a/# Aegis — Plan de Implementación Consolidado v3.0 b/# Aegis — Plan de Implementación Consolidado v3.0 new file mode 100644 index 0000000..3e81916 --- /dev/null +++ b/# Aegis — Plan de Implementación Consolidado v3.0 @@ -0,0 +1,4440 @@ +# Aegis — Plan de Implementación Consolidado v3.0 + +**Documento de Referencia para Ralph y Claude Code** +**Fecha:** 17 de febrero de 2026 +**Alcance:** Deuda técnica + Features nuevas + Transformación a Detection Assurance Platform +**Estimación total:** ~42-52 semanas + +--- + +## Visión + +Aegis evoluciona en tres etapas: + +1. **Fundamentos (Fase 0):** Resolver deuda técnica bloqueante +2. **Features Operativas (Fases 1-7):** Integrations, reporting, compliance, inteligencia +3. **Detection Assurance Platform (Fases 8-14):** Transformar Aegis de tracker MITRE a sistema de garantía continua de detección — donde cada detección tiene ciclo de vida, ownership, salud medible, y el sistema orquesta proactivamente la revalidación + +--- + +## Índice de Fases + +| Fase | Nombre | Duración | Prioridad/Dependencias | +|------|--------|----------|----------------------| +| 0 | Fundamentos Técnicos (deuda técnica bloqueante) | 2 semanas | CRÍTICA — sin dependencias | +| 1 | Integración Jira + Tempo | 3 semanas | Fase 0 (Redis) | +| 2 | Motor de Reporting Profesional | 3-4 semanas | Fase 0 (excepciones de dominio) | +| 3 | Compliance & Hardening de Seguridad | 2-3 semanas | Fase 0 (audit mejorado) | +| 4 | Inteligencia Automática (OSINT + Stale Detection) | 2-3 semanas | Fase 0 (índices BD) | +| 5 | Gestión Operativa Avanzada | 2-3 semanas | Fase 0 (scoring en BD) | +| 6 | Analytics para BI + Webhooks | 2 semanas | Fase 2 | +| 7 | Notificaciones Multi-Canal | 1-2 semanas | Fase 3 | +| 8 | Detection Lifecycle Management (DLM) | 3-4 semanas | Fase 0 (Redis, índices, excepciones) | +| 9 | Ownership & Operativa Diaria | 2-3 semanas | Fase 8 | +| 10 | Attack Paths & Purple Team Avanzado | 3-4 semanas | Fases 8, 9 | +| 11 | Knowledge Management | 2-3 semanas | Fase 8 | +| 12 | Risk Intelligence & Recomendaciones | 2-3 semanas | Fases 4, 8, 9 | +| 13 | Alertas Inteligentes & Integraciones Avanzadas | 2-3 semanas | Fases 6, 8, 9, 12 | +| 14 | SSO/SAML & API Keys (Enterprise Readiness) | 2 semanas | Fase 0 | + +> **Nota sobre paralelismo:** Las Fases 1-7 y la Fase 8 pueden ejecutarse en paralelo tras completar Fase 0, ya que no tienen dependencias cruzadas directas. Sin embargo, se recomienda completar al menos Fases 0-5 antes de iniciar Fase 8 para tener una base operativa sólida. + +--- + +## Mapa de Dependencias +``` +Fase 0 (Fundamentos) ───────────────────────────────────────────────── + │ │ + ├──► Fase 1 (Jira + Tempo) ─── necesita Redis │ + │ │ + ├──► Fase 2 (Reporting) ────── necesita excepciones de dominio │ + │ │ │ + │ └──► Fase 6 (Analytics/Webhooks) │ + │ │ │ + │ └──► Fase 13 (Alertas) ◄── usa webhooks │ + │ ▲ como canal │ + │ │ │ + ├──► Fase 3 (Compliance) ───── necesita audit mejorado │ + │ │ │ + │ └──► Fase 7 (Notificaciones) │ + │ │ + ├──► Fase 4 (Intel Auto) ───── necesita índices BD │ + │ │ │ + │ └──► Fase 12 (Risk) ◄── usa OsintItem │ + │ ▲ │ + │ │ │ + ├──► Fase 5 (Operativa) ────── necesita scoring en BD │ + │ │ + ├──► Fase 8 (Detection Lifecycle) ───────────────────────────── │ + │ │ │ + │ ├──► Fase 9 (Ownership) ── necesita decay + confidence │ + │ │ │ │ + │ │ ├──► Fase 10 (Attack Paths) │ + │ │ │ necesita ownership + revalidation │ + │ │ │ │ + │ │ └──► Fase 12 (Risk & Recomendaciones) │ + │ │ necesita confidence + ownership │ + │ │ │ + │ ├──► Fase 11 (Knowledge Management) ── independiente │ + │ │ │ + │ └──► Fase 13 (Alertas Inteligentes) │ + │ necesita todas las fuentes anteriores │ + │ │ + └──► Fase 14 (Enterprise/SSO) ── independiente │ +``` + +> **Nota:** Fase 4 (Stale Detection) implementa una versión simple de detección de obsolescencia. Fase 8 (Decay Engine) la reemplaza con un motor completo y configurable. Fase 4 sirve como stepping stone y puede coexistir — el decay engine de Fase 8 es la versión definitiva. + +--- + +## FASE 0 — Fundamentos Técnicos + +**Por qué primero:** Las nuevas features (Jira, Tempo, reports, DLM, etc.) necesitan Redis, excepciones de dominio limpias, e índices en BD para no colapsar el sistema. Esta fase resuelve la deuda técnica que bloquea todo lo demás. + +**Duración estimada:** 2 semanas +**Dependencias:** Ninguna + +--- + +### Tarea 0.1: Redis como servicio de infraestructura + +**Qué:** Añadir un contenedor Redis al stack Docker Compose para token blacklist, cache de scores y futuras colas. + +**Implementación:** + +1. **docker-compose.yml y docker-compose.prod.yml** — Agregar servicio Redis: +```yaml +redis: + image: redis:7-alpine + command: redis-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru + ports: + - "6379:6379" # solo en dev + volumes: + - aegis_redis_data:/data + healthcheck: + test: ["CMD", "redis-cli", "ping"] + interval: 5s + timeout: 3s + retries: 5 + restart: always +``` + +2. **requirements.txt** — Añadir `redis>=5.0.0` + +3. **app/config.py** — Nuevas variables: +```python +REDIS_URL: str = "redis://redis:6379/0" +REDIS_TOKEN_BLACKLIST_DB: int = 1 +REDIS_CACHE_DB: int = 2 +``` + +4. **app/infrastructure/redis_client.py** — Cliente singleton: +```python +import redis +from app.config import settings + +_redis_client: redis.Redis | None = None + +def get_redis() -> redis.Redis: + global _redis_client + if _redis_client is None: + _redis_client = redis.from_url( + settings.REDIS_URL, + decode_responses=True + ) + return _redis_client +``` + +5. **Backend Depends en docker-compose** — Añadir `redis: condition: service_healthy` + +**Verificación:** +- `docker compose up redis` arranca y responde a `redis-cli ping` con `PONG` +- Test unitario: conectar, set/get una clave, verificar TTL funciona +- Health check Docker pasa en `docker compose ps` + +--- + +### Tarea 0.2: Mover Token Blacklist a Redis (SEC-001) + +**Qué:** Reemplazar el `set()` en memoria de auth.py por Redis con TTL automático. + +**Implementación:** + +1. **app/auth.py** — Reemplazar: +```python +# ANTES (eliminar): +_blacklisted_tokens: set[str] = set() + +# DESPUÉS: +from app.infrastructure.redis_client import get_redis + +def blacklist_token(token: str, expires_in_seconds: int): + """Añade token al blacklist con TTL igual a su tiempo restante de expiración.""" + r = get_redis() + r.setex(f"blacklist:{token}", expires_in_seconds, "1") + +def is_token_blacklisted(token: str) -> bool: + r = get_redis() + return r.exists(f"blacklist:{token}") > 0 +``` + +2. **Endpoint de logout** — Calcular TTL restante del JWT antes de blacklistear: +```python +from jose import jwt +import time + +payload = jwt.decode(token, settings.SECRET_KEY, algorithms=[settings.ALGORITHM]) +exp = payload.get("exp", 0) +ttl = max(int(exp - time.time()), 0) +blacklist_token(token, ttl) +``` + +3. **Middleware de verificación** — En `get_current_user`, añadir check: +```python +if is_token_blacklisted(token): + raise HTTPException(status_code=401, detail="Token has been revoked") +``` + +**Verificación:** +- Login → obtener token → Logout → usar mismo token → debe dar 401 +- Reiniciar backend container → token sigue blacklisteado en Redis +- Verificar en Redis CLI: `keys blacklist:*` muestra tokens activos +- TTL correcto: `ttl blacklist:` debe ser menor al token expiration + +--- + +### Tarea 0.3: Índices de Base de Datos (SR-006) + +**Qué:** Crear migración Alembic con los índices compuestos faltantes. + +**Implementación:** + +1. Crear nueva migración Alembic: +```bash +alembic revision --autogenerate -m "add_composite_indexes" +``` + +2. En el archivo de migración generado, agregar los índices manualmente: +```python +def upgrade(): + # Tests - usado por scoring, heatmap, metrics, reports + op.create_index('ix_tests_technique_id_state', 'tests', ['technique_id', 'state']) + op.create_index('ix_tests_state_red_validated_at', 'tests', ['state', 'red_validated_at']) + op.create_index('ix_tests_remediation_status', 'tests', ['remediation_status']) + op.create_index('ix_tests_campaign_id', 'tests', ['campaign_id']) + + # Techniques + op.create_index('ix_techniques_tactic', 'techniques', ['tactic']) + op.create_index('ix_techniques_status_global', 'techniques', ['status_global']) + + # Audit logs - para MTTD/MTTR + op.create_index('ix_audit_logs_entity_type_entity_id', 'audit_logs', ['entity_type', 'entity_id', 'action']) + op.create_index('ix_audit_logs_timestamp', 'audit_logs', ['timestamp']) + op.create_index('ix_audit_logs_user_id', 'audit_logs', ['user_id']) + + # Detection rules + op.create_index('ix_detection_rules_mitre_technique_id', 'detection_rules', ['mitre_technique_id']) + + # Notifications + op.create_index('ix_notifications_user_id_is_read', 'notifications', ['user_id', 'is_read']) + +def downgrade(): + op.drop_index('ix_tests_technique_id_state') + op.drop_index('ix_tests_state_red_validated_at') + # ... drop todos +``` + +**Verificación:** +- `alembic upgrade head` ejecuta sin error +- `\di` en psql muestra todos los nuevos índices +- Query `EXPLAIN ANALYZE SELECT * FROM tests WHERE technique_id = '...' AND state = 'validated'` usa el índice (Index Scan, no Seq Scan) + +--- + +### Tarea 0.4: Excepciones de Dominio + Error Handler Middleware (TD-003) + +**Qué:** Crear excepciones de dominio y un middleware que las mapee a HTTP, para eliminar HTTPException de los servicios. + +**Implementación:** + +1. **app/domain/exceptions.py** (nuevo archivo): +```python +class DomainException(Exception): + """Base para todas las excepciones de dominio.""" + def __init__(self, message: str, code: str = "DOMAIN_ERROR"): + self.message = message + self.code = code + super().__init__(message) + +class EntityNotFoundError(DomainException): + def __init__(self, entity: str, identifier: str): + super().__init__(f"{entity} not found: {identifier}", "NOT_FOUND") + self.entity = entity + self.identifier = identifier + +class DuplicateEntityError(DomainException): + def __init__(self, entity: str, field: str, value: str): + super().__init__(f"{entity} with {field}='{value}' already exists", "DUPLICATE") + +class InvalidTransitionError(DomainException): + def __init__(self, current_state: str, target_state: str): + super().__init__( + f"Cannot transition from '{current_state}' to '{target_state}'", + "INVALID_TRANSITION" + ) + +class InvalidOperationError(DomainException): + def __init__(self, message: str): + super().__init__(message, "INVALID_OPERATION") + +class AuthorizationError(DomainException): + def __init__(self, message: str = "Insufficient permissions"): + super().__init__(message, "FORBIDDEN") +``` + +2. **app/middleware/error_handler.py** (nuevo): +```python +from fastapi import Request +from fastapi.responses import JSONResponse +from app.domain.exceptions import ( + EntityNotFoundError, DuplicateEntityError, + InvalidTransitionError, InvalidOperationError, + AuthorizationError, DomainException +) + +EXCEPTION_STATUS_MAP = { + EntityNotFoundError: 404, + DuplicateEntityError: 409, + InvalidTransitionError: 400, + InvalidOperationError: 400, + AuthorizationError: 403, +} + +async def domain_exception_handler(request: Request, exc: DomainException): + status = EXCEPTION_STATUS_MAP.get(type(exc), 400) + return JSONResponse( + status_code=status, + content={"detail": exc.message, "code": exc.code} + ) +``` + +3. **app/main.py** — Registrar handler: +```python +from app.domain.exceptions import DomainException +from app.middleware.error_handler import domain_exception_handler + +app.add_exception_handler(DomainException, domain_exception_handler) +``` + +4. **test_workflow_service.py** — Reemplazar HTTPException por excepciones de dominio: +```python +# ANTES: +from fastapi import HTTPException +raise HTTPException(status_code=400, detail={...}) + +# DESPUÉS: +from app.domain.exceptions import InvalidTransitionError +raise InvalidTransitionError(current_state=test.state, target_state="red_executing") +``` + +5. Repetir para `campaign_service.py` (el otro servicio que importa HTTPException). + +**Verificación:** +- `grep -r "from fastapi import HTTPException" app/services/` devuelve 0 resultados +- Tests existentes de workflow siguen pasando (errores 400 se mantienen) +- Nuevo test: llamar endpoint con transición inválida → respuesta JSON con `{code: "INVALID_TRANSITION"}` + +--- + +### Tarea 0.5: Arreglar Excepciones Silenciadas (TD-007) + +**Qué:** Reemplazar `except Exception: pass` por logging en workflow service. + +**Implementación:** + +En `test_workflow_service.py`, localizar los 4 bloques `except Exception: pass` y reemplazar: +```python +# ANTES: +try: + notify_test_state_change(...) +except Exception: + pass + +# DESPUÉS: +import logging +logger = logging.getLogger(__name__) + +try: + notify_test_state_change(...) +except Exception as e: + logger.warning(f"Notification failed for test {test.id}: {e}", exc_info=True) +``` + +**Verificación:** +- `grep -r "except Exception: pass" app/` devuelve 0 resultados +- Simular fallo de notificación → ver warning en logs del backend +- Workflow sigue funcionando aunque notificación falle + +--- + +### Tarea 0.6: CI/CD Básico con GitHub Actions (TD-009) + +**Qué:** Pipeline mínimo: lint + tests. + +**Implementación:** + +**.github/workflows/ci.yml:** +```yaml +name: Aegis CI +on: + push: + branches: [main, develop] + pull_request: + branches: [main] + +jobs: + lint-and-test: + runs-on: ubuntu-latest + services: + postgres: + image: postgres:15-alpine + env: + POSTGRES_DB: testdb + POSTGRES_USER: test + POSTGRES_PASSWORD: test + ports: ["5432:5432"] + options: --health-cmd pg_isready --health-interval 10s + redis: + image: redis:7-alpine + ports: ["6379:6379"] + options: --health-cmd "redis-cli ping" --health-interval 10s + + steps: + - uses: actions/checkout@v4 + - uses: actions/setup-python@v5 + with: + python-version: '3.11' + - name: Install deps + run: | + cd backend + pip install -r requirements.txt + pip install ruff pytest + - name: Lint + run: cd backend && ruff check app/ + - name: Test + env: + DATABASE_URL: postgresql://test:test@localhost:5432/testdb + REDIS_URL: redis://localhost:6379/0 + SECRET_KEY: test-secret-key-for-ci + run: cd backend && pytest tests/ -v --tb=short +``` + +**Verificación:** +- Push a branch → GitHub Actions ejecuta sin error +- Badge verde en README +- Lint falla si introduces código mal formateado + +--- + +## FASE 1 — Integración Jira + Tempo + +**Duración estimada:** 3 semanas +**Dependencias:** Fase 0 (Redis) + +--- + +### Tarea 1.1: Modelo de Datos — Tabla de Asociaciones Jira + +**Qué:** Crear modelo y migración para vincular entidades Aegis con tickets Jira. + +**Implementación:** + +1. **app/models/jira_link.py** (nuevo): +```python +import uuid +from datetime import datetime +from sqlalchemy import Column, String, DateTime, ForeignKey, Enum as SQLEnum, Text +from sqlalchemy.dialects.postgresql import UUID, JSONB +from sqlalchemy.orm import relationship +from app.database import Base +import enum + +class JiraLinkEntityType(str, enum.Enum): + test = "test" + technique = "technique" + campaign = "campaign" + evidence = "evidence" + +class JiraSyncDirection(str, enum.Enum): + aegis_to_jira = "aegis_to_jira" + jira_to_aegis = "jira_to_aegis" + bidirectional = "bidirectional" + +class JiraLink(Base): + __tablename__ = "jira_links" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + entity_type = Column(SQLEnum(JiraLinkEntityType), nullable=False) + entity_id = Column(UUID(as_uuid=True), nullable=False, index=True) + jira_issue_key = Column(String(50), nullable=False, index=True) # e.g. "SEC-1234" + jira_issue_id = Column(String(50)) # Jira internal numeric ID + jira_project_key = Column(String(20)) + jira_status = Column(String(100)) + jira_priority = Column(String(50)) + jira_assignee = Column(String(255)) + jira_story_points = Column(String(10)) + sync_direction = Column(SQLEnum(JiraSyncDirection), default=JiraSyncDirection.bidirectional) + last_synced_at = Column(DateTime) + sync_metadata = Column(JSONB, default={}) + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + created_at = Column(DateTime, default=datetime.utcnow) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) +``` + +2. **Alembic migration:** +```bash +alembic revision --autogenerate -m "add_jira_links_table" +alembic upgrade head +``` + +3. **app/schemas/jira_schema.py** (nuevo): +```python +from pydantic import BaseModel, Field +from typing import Optional +from uuid import UUID +from datetime import datetime +from app.models.jira_link import JiraLinkEntityType, JiraSyncDirection + +class JiraLinkCreate(BaseModel): + entity_type: JiraLinkEntityType + entity_id: UUID + jira_issue_key: str = Field(..., pattern=r'^[A-Z]+-\d+$') + sync_direction: JiraSyncDirection = JiraSyncDirection.bidirectional + +class JiraLinkOut(BaseModel): + id: UUID + entity_type: JiraLinkEntityType + entity_id: UUID + jira_issue_key: str + jira_status: Optional[str] + jira_priority: Optional[str] + jira_assignee: Optional[str] + last_synced_at: Optional[datetime] + created_at: datetime + + class Config: + from_attributes = True + +class JiraIssueSearch(BaseModel): + query: str + +class JiraIssueResult(BaseModel): + issue_key: str + summary: str + status: str + assignee: Optional[str] + priority: Optional[str] +``` + +**Verificación:** +- Migración ejecuta sin error +- Tabla `jira_links` existe con todas las columnas e índices +- Schema valida que `jira_issue_key` tenga formato `PROJ-123` + +--- + +### Tarea 1.2: Servicio de Integración Jira + +**Qué:** Servicio que encapsula toda la comunicación con Jira REST API usando `atlassian-python-api`. + +**Implementación:** + +1. **requirements.txt** — Añadir: `atlassian-python-api>=4.0.0` + +2. **app/config.py** — Nuevas settings: +```python +# Jira Integration +JIRA_ENABLED: bool = False +JIRA_URL: str = "" +JIRA_USERNAME: str = "" +JIRA_API_TOKEN: str = "" +JIRA_IS_CLOUD: bool = True +JIRA_DEFAULT_PROJECT: str = "" +JIRA_ISSUE_TYPE_TEST: str = "Task" +JIRA_ISSUE_TYPE_CAMPAIGN: str = "Epic" +``` + +3. **app/services/jira_service.py** (nuevo): +```python +import logging +from typing import Optional +from atlassian import Jira +from app.config import settings +from sqlalchemy.orm import Session +from app.models.jira_link import JiraLink, JiraLinkEntityType +from datetime import datetime + +logger = logging.getLogger(__name__) + +_jira_client: Optional[Jira] = None + +def get_jira_client() -> Jira: + global _jira_client + if not settings.JIRA_ENABLED: + raise InvalidOperationError("Jira integration is not enabled") + if _jira_client is None: + _jira_client = Jira( + url=settings.JIRA_URL, + username=settings.JIRA_USERNAME, + password=settings.JIRA_API_TOKEN, + cloud=settings.JIRA_IS_CLOUD, + ) + return _jira_client + +def search_jira_issues(query: str, max_results: int = 10) -> list[dict]: + """Busca issues en Jira por JQL o texto.""" + jira = get_jira_client() + jql = query if " " not in query or "=" in query else f'summary ~ "{query}"' + results = jira.jql(jql, limit=max_results) + return [ + { + "issue_key": issue["key"], + "summary": issue["fields"]["summary"], + "status": issue["fields"]["status"]["name"], + "assignee": (issue["fields"].get("assignee") or {}).get("displayName"), + "priority": (issue["fields"].get("priority") or {}).get("name"), + } + for issue in results.get("issues", []) + ] + +def create_jira_issue( + project_key: str, + summary: str, + description: str, + issue_type: str = "Task", + labels: list[str] = None, + custom_fields: dict = None, +) -> dict: + """Crea un issue en Jira y retorna key + id.""" + jira = get_jira_client() + fields = { + "project": {"key": project_key}, + "summary": summary, + "description": description, + "issuetype": {"name": issue_type}, + } + if labels: + fields["labels"] = labels + if custom_fields: + fields.update(custom_fields) + + result = jira.issue_create(fields=fields) + return {"issue_key": result["key"], "issue_id": result["id"]} + +def sync_aegis_to_jira(db: Session, link: JiraLink, entity_data: dict): + """Sincroniza datos de Aegis hacia Jira (añade comentario con resultados).""" + jira = get_jira_client() + comment_body = _build_sync_comment(entity_data) + jira.issue_add_comment(link.jira_issue_key, comment_body) + link.last_synced_at = datetime.utcnow() + db.commit() + +def sync_jira_to_aegis(db: Session, link: JiraLink): + """Sincroniza estado de Jira hacia Aegis.""" + jira = get_jira_client() + issue = jira.issue(link.jira_issue_key) + link.jira_status = issue["fields"]["status"]["name"] + link.jira_priority = (issue["fields"].get("priority") or {}).get("name") + link.jira_assignee = (issue["fields"].get("assignee") or {}).get("displayName") + link.jira_story_points = issue["fields"].get("customfield_10016") + link.last_synced_at = datetime.utcnow() + db.commit() + +def _build_sync_comment(data: dict) -> str: + """Genera un comentario formateado para Jira.""" + lines = ["h3. Aegis Sync Update", ""] + for key, value in data.items(): + lines.append(f"*{key}:* {value}") + lines.append(f"\n_Synced at {datetime.utcnow().isoformat()}_") + return "\n".join(lines) +``` + +**Verificación:** +- Con `JIRA_ENABLED=False`, llamar a `get_jira_client()` lanza `InvalidOperationError` +- Con credenciales válidas de test: `search_jira_issues("project = TEST")` retorna issues +- `create_jira_issue(...)` crea ticket y retorna key válida +- Mock test: verificar que `sync_aegis_to_jira` llama a `issue_add_comment` + +--- + +### Tarea 1.3: Router de Jira Links + +**Qué:** Endpoints para crear/listar/sincronizar asociaciones Jira. + +**Implementación:** + +**app/routers/jira.py** (nuevo): +```python +from fastapi import APIRouter, Depends, Query +from sqlalchemy.orm import Session +from uuid import UUID +from typing import Optional +from app.database import get_db +from app.dependencies.auth import get_current_user, require_role +from app.models.jira_link import JiraLink, JiraLinkEntityType +from app.schemas.jira_schema import ( + JiraLinkCreate, JiraLinkOut, JiraIssueSearch, JiraIssueResult +) +from app.services import jira_service, audit_service + +router = APIRouter(prefix="/jira", tags=["jira"]) + +@router.get("/search", response_model=list[JiraIssueResult]) +def search_issues( + q: str = Query(..., min_length=2), + max_results: int = Query(10, le=50), + user=Depends(get_current_user), +): + """Buscar issues en Jira por JQL o texto.""" + return jira_service.search_jira_issues(q, max_results) + +@router.post("/links", response_model=JiraLinkOut, status_code=201) +def create_link( + body: JiraLinkCreate, + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + """Asociar una entidad Aegis con un ticket Jira.""" + link = JiraLink( + entity_type=body.entity_type, + entity_id=body.entity_id, + jira_issue_key=body.jira_issue_key, + sync_direction=body.sync_direction, + created_by=user.id, + ) + jira_service.sync_jira_to_aegis(db, link) + db.add(link) + db.commit() + db.refresh(link) + audit_service.log_action(db, user.id, "JIRA_LINK_CREATED", "jira_link", str(link.id), + details={"entity_type": body.entity_type, "issue_key": body.jira_issue_key}) + return link + +@router.get("/links", response_model=list[JiraLinkOut]) +def list_links( + entity_type: Optional[JiraLinkEntityType] = None, + entity_id: Optional[UUID] = None, + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + """Listar asociaciones Jira, filtrando opcionalmente por entidad.""" + query = db.query(JiraLink) + if entity_type: + query = query.filter(JiraLink.entity_type == entity_type) + if entity_id: + query = query.filter(JiraLink.entity_id == entity_id) + return query.order_by(JiraLink.created_at.desc()).all() + +@router.post("/links/{link_id}/sync") +def sync_link( + link_id: UUID, + db: Session = Depends(get_db), + user=Depends(require_role("admin")), +): + """Forzar sincronización bidireccional de un link.""" + link = db.query(JiraLink).filter(JiraLink.id == link_id).first() + if not link: + raise EntityNotFoundError("JiraLink", str(link_id)) + jira_service.sync_jira_to_aegis(db, link) + return {"message": "Sync completed", "jira_status": link.jira_status} + +@router.post("/create-issue") +def create_issue_from_entity( + entity_type: JiraLinkEntityType, + entity_id: UUID, + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + """Auto-crear un ticket Jira desde una entidad Aegis.""" + summary, description = _build_issue_data(db, entity_type, entity_id) + result = jira_service.create_jira_issue( + project_key=settings.JIRA_DEFAULT_PROJECT, + summary=summary, + description=description, + labels=["aegis", entity_type.value], + ) + link = JiraLink( + entity_type=entity_type, + entity_id=entity_id, + jira_issue_key=result["issue_key"], + jira_issue_id=result["issue_id"], + jira_project_key=settings.JIRA_DEFAULT_PROJECT, + created_by=user.id, + ) + db.add(link) + db.commit() + return {"issue_key": result["issue_key"], "link_id": str(link.id)} +``` + +Registrar en main.py: `app.include_router(jira_router, prefix="/api/v1")` + +**Verificación:** +- `POST /api/v1/jira/links` con issue_key inválido (sin formato) → 422 +- `POST /api/v1/jira/links` con issue_key válido → 201 + datos de Jira populados +- `GET /api/v1/jira/links?entity_type=test&entity_id=...` → lista filtrada +- Audit log registra la acción `JIRA_LINK_CREATED` + +--- + +### Tarea 1.4: Servicio de Integración Tempo + +**Qué:** Servicio para registrar worklogs automáticos en Tempo desde tests completados. + +**Implementación:** + +1. **requirements.txt** — Añadir: `tempo-api-python-client>=0.8.0` + +2. **app/config.py** — Nuevas settings: +```python +TEMPO_ENABLED: bool = False +TEMPO_API_TOKEN: str = "" +TEMPO_API_VERSION: int = 4 +TEMPO_DEFAULT_WORK_TYPE: str = "Red Team" +``` + +3. **app/services/tempo_service.py** (nuevo): +```python +import logging +from typing import Optional +from tempoapiclient import client_v4 as tempo_client +from app.config import settings +from sqlalchemy.orm import Session +from app.models.jira_link import JiraLink + +logger = logging.getLogger(__name__) + +def get_tempo_client(): + if not settings.TEMPO_ENABLED: + raise InvalidOperationError("Tempo integration is not enabled") + return tempo_client.Tempo(auth_token=settings.TEMPO_API_TOKEN) + +def log_worklog( + jira_issue_id: int, + author_account_id: str, + date: str, + time_spent_seconds: int, + description: str, + work_type: str = None, +) -> dict: + """Registra un worklog en Tempo.""" + tempo = get_tempo_client() + worklog = tempo.create_worklog( + accountId=author_account_id, + issueId=jira_issue_id, + dateFrom=date, + timeSpentSeconds=time_spent_seconds, + description=description, + ) + return worklog + +def auto_log_test_worklog(db: Session, test, user, activity_type: str): + """Si el test tiene un Jira link, registrar tiempo automáticamente en Tempo.""" + if not settings.TEMPO_ENABLED: + return + + link = db.query(JiraLink).filter( + JiraLink.entity_id == test.id, + JiraLink.entity_type == "test" + ).first() + + if not link or not link.jira_issue_id: + logger.debug(f"No Jira link for test {test.id}, skipping Tempo worklog") + return + + duration = _calculate_duration(test, activity_type) + if duration <= 0: + return + + try: + log_worklog( + jira_issue_id=int(link.jira_issue_id), + author_account_id=user.jira_account_id or "", + date=test.updated_at.strftime("%Y-%m-%d"), + time_spent_seconds=duration, + description=f"[Aegis] {activity_type}: {test.name}", + ) + logger.info(f"Tempo worklog created for test {test.id}, {duration}s") + except Exception as e: + logger.warning(f"Tempo worklog failed for test {test.id}: {e}") +``` + +**Verificación:** +- Con `TEMPO_ENABLED=False`, la función retorna silenciosamente +- Mock test: verificar que `create_worklog` se llama con parámetros correctos +- Integración: crear worklog en Tempo sandbox y verificar que aparece + +--- + +### Tarea 1.5: Worklog Interno Auditado + +**Qué:** Tabla de registro de tiempo interno e inmutable, previo a envío a Tempo. + +**Implementación:** + +1. **app/models/worklog.py** (nuevo): +```python +import uuid +from datetime import datetime +from sqlalchemy import Column, String, Integer, DateTime, ForeignKey, Text +from sqlalchemy.dialects.postgresql import UUID, JSONB +from app.database import Base + +class Worklog(Base): + __tablename__ = "worklogs" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + entity_type = Column(String(50), nullable=False) + entity_id = Column(UUID(as_uuid=True), nullable=False, index=True) + user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + activity_type = Column(String(100), nullable=False) + started_at = Column(DateTime, nullable=False) + ended_at = Column(DateTime) + duration_seconds = Column(Integer, nullable=False) + description = Column(Text) + tempo_synced = Column(DateTime) + tempo_worklog_id = Column(String(100)) + integrity_hash = Column(String(64)) + created_at = Column(DateTime, default=datetime.utcnow) + metadata = Column(JSONB, default={}) +``` + +2. **app/services/worklog_service.py** — funciones CRUD + cálculo de hash de integridad: +```python +import hashlib +from datetime import datetime +from sqlalchemy.orm import Session +from app.models.worklog import Worklog + +def create_worklog(db: Session, **kwargs) -> Worklog: + wl = Worklog(**kwargs) + wl.integrity_hash = _compute_hash(wl) + db.add(wl) + db.commit() + db.refresh(wl) + return wl + +def _compute_hash(wl: Worklog) -> str: + data = f"{wl.entity_type}:{wl.entity_id}:{wl.user_id}:{wl.activity_type}:{wl.started_at}:{wl.duration_seconds}" + return hashlib.sha256(data.encode()).hexdigest() + +def verify_worklog_integrity(wl: Worklog) -> bool: + return wl.integrity_hash == _compute_hash(wl) +``` + +3. **app/routers/worklogs.py** — Endpoints para consultar y crear worklogs manuales. + +**Verificación:** +- Crear worklog → `integrity_hash` se genera automáticamente +- Modificar un campo en DB directo → `verify_worklog_integrity()` retorna `False` +- Listar worklogs por entidad funciona con filtros + +--- + +### Tarea 1.6: Frontend — Componente JiraLink + Tempo + +**Qué:** Componente React para vincular entidades con Jira y ver worklogs. + +**Implementación:** +- `src/api/jira.ts` — funciones API (search, createLink, listLinks, syncLink) +- `src/components/JiraLinkPanel.tsx` — Panel con buscar, vincular y ver estado del ticket +- `src/components/WorklogTimeline.tsx` — Timeline de worklogs con duración y tipo +- Integrar JiraLinkPanel en páginas de Test Detail, Campaign Detail y Technique Detail +- `src/types/models.ts` — Añadir tipos JiraLink, Worklog, etc. + +**Verificación:** +- En Test Detail, el panel de Jira aparece y permite buscar issues +- Vincular un issue muestra su estado actualizado +- Timeline de worklogs muestra registro cronológico + +--- + +### Tarea 1.7: Job de Sincronización Jira Automática + +**Qué:** Job APScheduler que sincroniza el estado de Jira links cada hora. + +**Implementación:** + +En `jobs/jira_sync_job.py`: +```python +from app.database import SessionLocal +from app.models.jira_link import JiraLink +from app.services import jira_service +from app.config import settings +import logging + +logger = logging.getLogger(__name__) + +def sync_all_jira_links(): + if not settings.JIRA_ENABLED: + return + db = SessionLocal() + try: + links = db.query(JiraLink).all() + synced = 0 + for link in links: + try: + jira_service.sync_jira_to_aegis(db, link) + synced += 1 + except Exception as e: + logger.warning(f"Jira sync failed for link {link.id}: {e}") + logger.info(f"Jira sync completed: {synced}/{len(links)} links updated") + finally: + db.close() +``` + +Registrar en scheduler: `scheduler.add_job(sync_all_jira_links, "interval", hours=1, replace_existing=True)` + +**Verificación:** +- Job aparece en logs al arrancar el backend +- Tras 1 hora (o trigger manual), los links muestran `last_synced_at` actualizado +- Links con issues borrados en Jira se logean como warning sin crashear + +--- + +## FASE 2 — Motor de Reporting Profesional + +**Duración estimada:** 3-4 semanas +**Dependencias:** Fase 0 (excepciones de dominio) + +--- + +### Tarea 2.1: Motor de Plantillas — Backend + +**Qué:** Sistema de plantillas Jinja2 que genera HTML, que luego se convierte a PDF con WeasyPrint y a DOCX con docxtpl. + +**Implementación:** + +1. **requirements.txt** — Añadir: +``` +weasyprint>=62.0 +docxtpl>=0.18.0 +Jinja2>=3.1.0 # (ya instalado por FastAPI) +``` + +2. **app/config.py:** +```python +REPORT_TEMPLATES_DIR: str = "app/templates/reports" +REPORT_OUTPUT_DIR: str = "/tmp/aegis_reports" +COMPANY_NAME: str = "Organization" +COMPANY_LOGO_PATH: str = "app/templates/reports/assets/logo.png" +``` + +3. **app/services/report_engine.py** (nuevo): +```python +import os +import uuid +from datetime import datetime +from jinja2 import Environment, FileSystemLoader +from weasyprint import HTML, CSS +from docxtpl import DocxTemplate +from app.config import settings + +class ReportEngine: + def __init__(self): + self.jinja_env = Environment( + loader=FileSystemLoader(settings.REPORT_TEMPLATES_DIR), + autoescape=True + ) + os.makedirs(settings.REPORT_OUTPUT_DIR, exist_ok=True) + + def render_html(self, template_name: str, context: dict) -> str: + template = self.jinja_env.get_template(f"{template_name}.html") + context["company_name"] = settings.COMPANY_NAME + context["generated_at"] = datetime.utcnow().isoformat() + return template.render(context) + + def generate_pdf(self, template_name: str, context: dict) -> str: + html_content = self.render_html(template_name, context) + css_path = os.path.join(settings.REPORT_TEMPLATES_DIR, "styles", "report.css") + output_path = os.path.join( + settings.REPORT_OUTPUT_DIR, + f"{template_name}_{uuid.uuid4().hex[:8]}.pdf" + ) + css = CSS(filename=css_path) if os.path.exists(css_path) else None + stylesheets = [css] if css else [] + HTML(string=html_content, base_url=settings.REPORT_TEMPLATES_DIR).write_pdf( + output_path, stylesheets=stylesheets + ) + return output_path + + def generate_docx(self, template_name: str, context: dict) -> str: + template_path = os.path.join(settings.REPORT_TEMPLATES_DIR, f"{template_name}.docx") + output_path = os.path.join( + settings.REPORT_OUTPUT_DIR, + f"{template_name}_{uuid.uuid4().hex[:8]}.docx" + ) + doc = DocxTemplate(template_path) + context["company_name"] = settings.COMPANY_NAME + context["generated_at"] = datetime.utcnow().strftime("%B %d, %Y") + doc.render(context) + doc.save(output_path) + return output_path + + def generate_html(self, template_name: str, context: dict) -> str: + html_content = self.render_html(template_name, context) + output_path = os.path.join( + settings.REPORT_OUTPUT_DIR, + f"{template_name}_{uuid.uuid4().hex[:8]}.html" + ) + with open(output_path, "w") as f: + f.write(html_content) + return output_path + +report_engine = ReportEngine() +``` + +4. **Estructura de plantillas:** +``` +app/templates/reports/ +├── styles/ +│ └── report.css +├── assets/ +│ └── logo.png +├── purple_campaign.html +├── purple_campaign.docx +├── coverage_report.html +├── technique_detail.html +├── quarterly_summary.html +└── executive_summary.html +``` + +**Verificación:** +- `report_engine.generate_pdf("coverage_report", sample_context)` genera PDF legible +- `report_engine.generate_docx("purple_campaign", sample_context)` genera DOCX válido +- CSS se aplica correctamente (colores corporativos, headers, page breaks) + +--- + +### Tarea 2.2: Plantilla de Informe Purple Team + +**Qué:** La plantilla HTML Jinja2 para informes de campaña Purple Team. + +**Implementación:** + +**app/templates/reports/purple_campaign.html:** +```html + + + + + + + +
+ +

Purple Team Assessment Report

+

{{ campaign.name }}

+

{{ generated_at }}

+

{{ classification | default('INTERNAL') }}

+
+ +
+

Table of Contents

+
+ +
+

1. Executive Summary

+

Campaign {{ campaign.name }} tested + {{ tests | length }} techniques across {{ tactics | length }} tactics. + Overall coverage score: {{ org_score }}%.

+
+
+ {{ tests_validated }} + Validated +
+
+ {{ tests_detected }} + Detected +
+
+ {{ tests_not_detected }} + Not Detected +
+
+
+ +
+

2. Scope & Methodology

+

{{ campaign.description }}

+

Period: {{ campaign.start_date }} – {{ campaign.end_date }}

+

Threat actors modeled: {% for actor in threat_actors %}{{ actor.name }}{% if not loop.last %}, {% endif %}{% endfor %}

+
+ +
+

3. Techniques Tested

+ + + + + + + + + {% for test in tests %} + + + + + + + + {% endfor %} + +
MITRE IDNameTacticResultDetection
{{ test.technique_mitre_id }}{{ test.name }}{{ test.tactic }}{{ test.state }}{{ test.detection_result }}
+
+ +
+

4. Critical Findings

+ {% for finding in critical_findings %} +
+

{{ finding.technique_id }}: {{ finding.name }}

+

{{ finding.description }}

+

Recommendation: {{ finding.recommendation }}

+
+ {% endfor %} +
+ +
+

5. Coverage Evolution

+ {% if previous_campaign %} +

Compared to previous campaign ({{ previous_campaign.name }}): + Coverage changed from {{ previous_score }}% to {{ org_score }}%.

+ {% endif %} +
+ + + + +``` + +**Verificación:** +- Generar PDF con datos de campaña real → todas las secciones se renderizan +- Tabla de técnicas muestra colores según resultado (rojo/verde/amarillo) +- Page breaks funcionan entre secciones + +--- + +### Tarea 2.3: Servicio de Generación de Reportes + +**Qué:** Servicio que recopila datos del dominio y llama al ReportEngine. + +**Implementación:** + +**app/services/report_generation_service.py** (nuevo): +```python +from sqlalchemy.orm import Session +from app.services.report_engine import report_engine +from app.services import scoring_service +from app.models import Campaign, CampaignTest, Test, Technique, ThreatActor + +def generate_purple_campaign_report( + db: Session, campaign_id: str, output_format: str = "pdf" +) -> str: + """Genera el informe completo de una campaña Purple Team.""" + campaign = db.query(Campaign).filter(Campaign.id == campaign_id).first() + if not campaign: + raise EntityNotFoundError("Campaign", campaign_id) + + campaign_tests = ( + db.query(Test) + .join(CampaignTest) + .filter(CampaignTest.campaign_id == campaign_id) + .all() + ) + + tests_data = [] + for test in campaign_tests: + technique = db.query(Technique).filter(Technique.id == test.technique_id).first() + tests_data.append({ + "technique_mitre_id": technique.mitre_id if technique else "N/A", + "name": test.name, + "tactic": technique.tactic if technique else "N/A", + "state": test.state.value if test.state else "draft", + "detection_result": test.detection_result.value if test.detection_result else "pending", + }) + + validated = [t for t in campaign_tests if t.state and t.state.value == "validated"] + detected = [t for t in validated if t.detection_result and t.detection_result.value == "detected"] + not_detected = [t for t in validated if t.detection_result and t.detection_result.value == "not_detected"] + + critical_findings = [ + { + "technique_id": t.get("technique_mitre_id"), + "name": t.get("name"), + "severity": "critical", + "description": f"Technique not detected during campaign execution.", + "recommendation": "Implement detection rule or review existing SIEM rules.", + } + for t in tests_data if t["detection_result"] == "not_detected" + ] + + context = { + "campaign": campaign, + "tests": tests_data, + "tests_validated": len(validated), + "tests_detected": len(detected), + "tests_not_detected": len(not_detected), + "critical_findings": critical_findings, + "org_score": scoring_service.calculate_organization_score(db).get("overall", 0), + "tactics": list(set(t.get("tactic") for t in tests_data)), + "threat_actors": [], # TODO: poblar + } + + if output_format == "pdf": + return report_engine.generate_pdf("purple_campaign", context) + elif output_format == "docx": + return report_engine.generate_docx("purple_campaign", context) + else: + return report_engine.generate_html("purple_campaign", context) +``` + +--- + +### Tarea 2.4: Router de Reportes Profesionales + +**Implementación:** + +**app/routers/professional_reports.py** (nuevo): +```python +from fastapi import APIRouter, Depends, Query +from fastapi.responses import FileResponse +from sqlalchemy.orm import Session +from uuid import UUID +from app.database import get_db +from app.dependencies.auth import get_current_user, require_role +from app.services import report_generation_service + +router = APIRouter(prefix="/reports/generate", tags=["professional-reports"]) + +@router.get("/purple-campaign/{campaign_id}") +def generate_purple_report( + campaign_id: UUID, + format: str = Query("pdf", regex="^(pdf|docx|html)$"), + db: Session = Depends(get_db), + user=Depends(require_role("red_lead", "blue_lead", "admin")), +): + filepath = report_generation_service.generate_purple_campaign_report( + db, str(campaign_id), output_format=format + ) + media_types = { + "pdf": "application/pdf", + "docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + "html": "text/html" + } + return FileResponse(filepath, media_type=media_types[format], + filename=f"purple_report.{format}") + +@router.get("/coverage-summary") +def generate_coverage_report( + format: str = Query("pdf", regex="^(pdf|docx|html)$"), + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + filepath = report_generation_service.generate_coverage_report(db, output_format=format) + media_types = { + "pdf": "application/pdf", + "docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document", + "html": "text/html" + } + return FileResponse(filepath, media_type=media_types[format], + filename=f"coverage_report.{format}") + +# Similares para: quarterly_summary, technique_detail, executive_summary +``` + +**Verificación:** +- `GET /api/v1/reports/generate/purple-campaign/{id}?format=pdf` → descarga PDF +- `GET /api/v1/reports/generate/purple-campaign/{id}?format=docx` → descarga DOCX +- PDF tiene portada, tabla de contenidos, datos de tests, hallazgos +- DOCX se abre correctamente en Word/LibreOffice + +--- + +### Tarea 2.5: Endpoints de Analytics para PowerBI + +**Qué:** Endpoints JSON limpios, sin lógica de presentación, optimizados para consumo BI. + +**Implementación:** + +**app/routers/analytics.py** (nuevo): +```python +router = APIRouter(prefix="/analytics", tags=["analytics"]) + +@router.get("/coverage") +def analytics_coverage(db: Session = Depends(get_db), user=Depends(get_current_user)): + """Cobertura por táctica, plataforma y estado — formato plano para BI.""" + techniques = db.query(Technique).all() + return [ + { + "mitre_id": t.mitre_id, + "name": t.name, + "tactic": t.tactic, + "status": t.status_global.value if t.status_global else "not_evaluated", + "test_count": len(t.tests) if t.tests else 0, + "review_required": t.review_required, + "last_review_date": t.last_review_date.isoformat() if t.last_review_date else None, + } + for t in techniques + ] + +@router.get("/tests") +def analytics_tests( + date_from: str = Query(None), date_to: str = Query(None), + db: Session = Depends(get_db), user=Depends(get_current_user) +): + """Todos los tests con timestamps — formato plano para BI.""" + query = db.query(Test) + if date_from: + query = query.filter(Test.created_at >= date_from) + if date_to: + query = query.filter(Test.created_at <= date_to) + tests = query.all() + return [ + { + "id": str(t.id), + "technique_id": str(t.technique_id), + "name": t.name, + "state": t.state.value if t.state else None, + "detection_result": t.detection_result.value if t.detection_result else None, + "created_at": t.created_at.isoformat() if t.created_at else None, + "platform": t.platform, + "tool_used": t.tool_used, + } + for t in tests + ] + +@router.get("/trends") +def analytics_trends(db: Session = Depends(get_db), user=Depends(get_current_user)): + """Snapshots históricos de cobertura para visualización de tendencias.""" + snapshots = db.query(CoverageSnapshot).order_by(CoverageSnapshot.created_at).all() + return [ + { + "date": s.created_at.isoformat(), + "total_techniques": s.total_techniques, + "validated_count": s.validated_count, + "coverage_percentage": s.coverage_percentage, + "org_score": s.org_score, + } + for s in snapshots + ] + +@router.get("/operators") +def analytics_operators(db: Session = Depends(get_db), user=Depends(require_role("admin"))): + """Métricas por operador — para gestión de carga.""" + from sqlalchemy import func + results = ( + db.query( + User.username, + User.role, + func.count(Test.id).label("test_count"), + ) + .outerjoin(Test, Test.created_by == User.id) + .group_by(User.id) + .all() + ) + return [{"username": r[0], "role": r[1], "test_count": r[2]} for r in results] +``` + +**Verificación:** +- Cada endpoint retorna JSON plano (no anidado más de 1 nivel) +- Filtros de fecha funcionan correctamente +- Datos pueden importarse directamente en PowerBI desde URL +- Sin paginación (datasets completos para BI) + +--- + +### Tarea 2.6: Métricas Avanzadas + +**Qué:** Añadir endpoints de métricas avanzadas: % cobertura por táctica, técnicas nunca probadas, tiempo medio de validación, heatmap histórico. + +**Implementación:** + +Extender `app/routers/metrics.py` o crear `app/routers/advanced_metrics.py`: +```python +@router.get("/coverage-by-tactic") +def coverage_by_tactic(db: Session = Depends(get_db), user=Depends(get_current_user)): + """% de cobertura desglosado por táctica MITRE.""" + from sqlalchemy import func, case + results = ( + db.query( + Technique.tactic, + func.count(Technique.id).label("total"), + func.sum(case((Technique.status_global == "validated", 1), else_=0)).label("validated"), + ) + .group_by(Technique.tactic) + .all() + ) + return [ + { + "tactic": r[0], + "total": r[1], + "validated": r[2], + "coverage_pct": round((r[2] / r[1]) * 100, 1) if r[1] > 0 else 0, + } + for r in results + ] + +@router.get("/never-tested") +def never_tested_techniques(db: Session = Depends(get_db), user=Depends(get_current_user)): + """Técnicas que nunca han sido probadas.""" + techniques = ( + db.query(Technique) + .filter(~Technique.id.in_(db.query(Test.technique_id).distinct())) + .order_by(Technique.mitre_id) + .all() + ) + return [{"mitre_id": t.mitre_id, "name": t.name, "tactic": t.tactic} for t in techniques] + +@router.get("/avg-validation-time") +def avg_validation_time(db: Session = Depends(get_db), user=Depends(get_current_user)): + """Tiempo medio desde creación del test hasta validación.""" + # Implementar con audit log queries optimizadas (batch) + pass +``` + +**Verificación:** +- `/coverage-by-tactic` retorna una fila por táctica con porcentaje +- `/never-tested` lista técnicas sin ningún test asociado +- Datos son consistentes con lo que muestra el heatmap existente + +--- + +## FASE 3 — Compliance & Hardening de Seguridad + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fase 0 (audit mejorado) + +--- + +### Tarea 3.1: Audit Trail Mejorado (IP, Hash de Integridad) + +**Qué:** Ampliar el audit log con IP del request, hash de integridad y campos adicionales. + +**Implementación:** + +1. **Migración Alembic** — Añadir columnas a `audit_logs`: +```python +def upgrade(): + op.add_column('audit_logs', Column('ip_address', String(45))) + op.add_column('audit_logs', Column('user_agent', String(500))) + op.add_column('audit_logs', Column('integrity_hash', String(64))) + op.add_column('audit_logs', Column('session_id', String(100))) +``` + +2. **Middleware de request context** — Capturar IP: +```python +# app/middleware/request_context.py +from starlette.middleware.base import BaseHTTPMiddleware +from contextvars import ContextVar + +request_ip: ContextVar[str] = ContextVar("request_ip", default="") +request_user_agent: ContextVar[str] = ContextVar("request_user_agent", default="") + +class RequestContextMiddleware(BaseHTTPMiddleware): + async def dispatch(self, request, call_next): + ip = request.client.host if request.client else "unknown" + forwarded = request.headers.get("X-Forwarded-For") + if forwarded: + ip = forwarded.split(",")[0].strip() + request_ip.set(ip) + request_user_agent.set(request.headers.get("User-Agent", "")) + response = await call_next(request) + return response +``` + +3. **audit_service.py** — Incorporar IP y hash: +```python +import hashlib +from app.middleware.request_context import request_ip, request_user_agent + +def log_action(db, user_id, action, entity_type, entity_id, details=None): + ip = request_ip.get("") + ua = request_user_agent.get("") + entry = AuditLog( + user_id=user_id, + action=action, + entity_type=entity_type, + entity_id=entity_id, + details=details or {}, + ip_address=ip, + user_agent=ua, + ) + data = f"{user_id}:{action}:{entity_type}:{entity_id}:{entry.timestamp.isoformat()}" + entry.integrity_hash = hashlib.sha256(data.encode()).hexdigest() + db.add(entry) + # NO commit aquí — deja que el caller lo haga +``` + +**Verificación:** +- Crear un test → audit log tiene IP y user agent del request +- `integrity_hash` no es null en nuevos registros +- Modificar un audit log en BD directo → recalcular hash no coincide + +--- + +### Tarea 3.2: Login Attempt Auditing (SEC-009) + +**Qué:** Registrar intentos de login exitosos y fallidos. + +**Implementación:** + +En el router `auth.py`, después de cada intento de login: +```python +@router.post("/login") +def login(body: LoginRequest, request: Request, db: Session = Depends(get_db)): + user = db.query(User).filter(User.username == body.username).first() + + # SIEMPRE ejecutar bcrypt para evitar timing attack (SEC-005) + dummy_hash = "$2b$12$LJ3m4ys3Lg7E3cDpBH0pVe8y2V3hZ2n1KX3X5X5X5X5X5X5X5X" + target_hash = user.hashed_password if user else dummy_hash + password_valid = verify_password(body.password, target_hash) + + ip = request.client.host if request.client else "unknown" + + if not user or not password_valid: + audit_service.log_action( + db, user.id if user else None, "LOGIN_FAILED", + "auth", None, + details={"username": body.username, "ip": ip, "reason": "invalid_credentials"} + ) + db.commit() + raise HTTPException(400, detail="Incorrect credentials") + + audit_service.log_action( + db, user.id, "LOGIN_SUCCESS", "auth", str(user.id), + details={"ip": ip} + ) + token = create_access_token({"sub": user.username}) + # ... resto del login +``` + +**Verificación:** +- Login fallido → registro en `audit_logs` con `action="LOGIN_FAILED"` +- Login exitoso → registro con `action="LOGIN_SUCCESS"` +- Ambos incluyen IP +- Timing de respuesta similar para usuario existente vs no existente + +--- + +### Tarea 3.3: Validación de Password y Username (SEC-004, SEC-007) + +**Implementación:** + +En `app/schemas/user_schema.py`: +```python +from pydantic import BaseModel, field_validator +import re + +RESERVED_USERNAMES = {"admin", "system", "root", "aegis", "api", "null", "undefined"} + +class UserCreate(BaseModel): + username: str + password: str + email: str + role: str + + @field_validator("username") + @classmethod + def validate_username(cls, v): + if len(v) < 3 or len(v) > 50: + raise ValueError("Username must be 3-50 characters") + if not re.match(r'^[a-zA-Z0-9_-]+$', v): + raise ValueError("Username may only contain letters, numbers, hyphens, underscores") + if v.lower() in RESERVED_USERNAMES: + raise ValueError(f"Username '{v}' is reserved") + return v + + @field_validator("password") + @classmethod + def validate_password(cls, v): + if len(v) < 10: + raise ValueError("Password must be at least 10 characters") + if not re.search(r'[A-Z]', v): + raise ValueError("Password must contain at least one uppercase letter") + if not re.search(r'[a-z]', v): + raise ValueError("Password must contain at least one lowercase letter") + if not re.search(r'[0-9]', v): + raise ValueError("Password must contain at least one digit") + return v +``` + +**Verificación:** +- `POST /users` con password "123" → 422 con mensaje claro +- Username `"../admin"` → 422 +- Username `"system"` → 422 (reservado) +- Password `"ValidPass123"` → acepta + +--- + +### Tarea 3.4: Rate Limiting Extendido (SEC-003) + +**Implementación:** + +En `main.py` y routers relevantes: +```python +from slowapi import Limiter +from slowapi.util import get_remote_address + +limiter = Limiter(key_func=get_remote_address) + +# En routers de sync/import: +@router.post("/system/sync-mitre") +@limiter.limit("2/hour") +def sync_mitre(request: Request, ...): + ... + +@router.post("/system/import-atomic-tests") +@limiter.limit("2/hour") +def import_atomic(request: Request, ...): + ... + +# En routers de escritura: +@router.post("/tests") +@limiter.limit("30/minute") +def create_test(request: Request, ...): + ... + +# En routers de upload: +@router.post("/tests/{id}/evidence") +@limiter.limit("10/minute") +def upload_evidence(request: Request, ...): + ... + +# En reports (costosos): +@router.get("/reports/generate/{type}") +@limiter.limit("5/minute") +def generate_report(request: Request, ...): + ... +``` + +**Verificación:** +- Más de 2 syncs/hora → 429 Too Many Requests +- Más de 30 tests/min → 429 +- Headers de response incluyen `X-RateLimit-*` + +--- + +### Tarea 3.5: Clasificación de Datos y Retención + +**Qué:** Añadir campo de clasificación a entidades y políticas de retención. + +**Implementación:** + +1. **Migración:** +```python +class DataClassification(str, enum.Enum): + public = "public" + internal = "internal" + sensitive = "sensitive" + restricted = "restricted" + +# Añadir a tablas: tests, evidence, campaigns +op.add_column('tests', Column('data_classification', String(20), default='internal')) +op.add_column('evidence', Column('data_classification', String(20), default='internal')) +op.add_column('campaigns', Column('data_classification', String(20), default='internal')) +``` + +2. **Job de retención** — Limpieza automática según política: +```python +def apply_retention_policies(): + """Job diario que aplica políticas de retención.""" + db = SessionLocal() + try: + cutoff = datetime.utcnow() - timedelta(days=730) + deleted = db.query(AuditLog).filter(AuditLog.timestamp < cutoff).delete() + logger.info(f"Retention: deleted {deleted} audit logs older than 2 years") + db.commit() + finally: + db.close() +``` + +**Verificación:** +- Tests nuevos tienen `data_classification = 'internal'` por defecto +- Admin puede cambiar clasificación de un test +- Job de retención elimina logs antiguos sin error + +--- + +## FASE 4 — Inteligencia Automática + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fase 0 (índices BD) + +> **Nota:** La detección de stale coverage de esta fase (Tarea 4.2) es una versión simplificada que será reemplazada por el Decay Engine completo de Fase 8. Sirve como stepping stone funcional. + +--- + +### Tarea 4.1: Enriquecimiento OSINT por Técnica + +**Qué:** Para cada técnica, buscar automáticamente blogs, PoCs y CVEs relacionados. + +**Implementación:** + +1. **Modelo — OsintItem:** +```python +class OsintItem(Base): + __tablename__ = "osint_items" + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id"), index=True) + source_type = Column(String(50)) # "cve", "blog", "poc", "advisory" + source_url = Column(Text) + title = Column(String(500)) + description = Column(Text) + severity = Column(String(20)) + discovered_at = Column(DateTime, default=datetime.utcnow) + reviewed = Column(Boolean, default=False) + metadata = Column(JSONB, default={}) +``` + +2. **app/services/osint_enrichment_service.py:** +```python +import requests +import logging +from sqlalchemy.orm import Session +from app.models.osint_item import OsintItem +from app.models.technique import Technique + +logger = logging.getLogger(__name__) +NVD_API_BASE = "https://services.nvd.nist.gov/rest/json/cves/2.0" + +def enrich_technique_with_cves(db: Session, technique: Technique): + """Buscar CVEs relacionados via NVD API usando CAPEC→CWE→CVE mapping.""" + try: + params = { + "keywordSearch": technique.name, + "resultsPerPage": 10, + } + resp = requests.get(NVD_API_BASE, params=params, timeout=30) + if resp.status_code != 200: + logger.warning(f"NVD API error for {technique.mitre_id}: {resp.status_code}") + return 0 + + data = resp.json() + count = 0 + for vuln in data.get("vulnerabilities", []): + cve = vuln.get("cve", {}) + cve_id = cve.get("id") + exists = db.query(OsintItem).filter( + OsintItem.technique_id == technique.id, + OsintItem.source_url.contains(cve_id) + ).first() + if exists: + continue + + descriptions = cve.get("descriptions", []) + desc = next((d["value"] for d in descriptions if d["lang"] == "en"), "") + metrics = cve.get("metrics", {}) + cvss = metrics.get("cvssMetricV31", [{}])[0] if metrics.get("cvssMetricV31") else {} + severity = cvss.get("cvssData", {}).get("baseSeverity", "UNKNOWN") if cvss else "UNKNOWN" + + item = OsintItem( + technique_id=technique.id, + source_type="cve", + source_url=f"https://nvd.nist.gov/vuln/detail/{cve_id}", + title=cve_id, + description=desc[:500], + severity=severity, + metadata={"cvss_score": cvss.get("cvssData", {}).get("baseScore") if cvss else None}, + ) + db.add(item) + count += 1 + + if count > 0: + technique.review_required = True + db.commit() + logger.info(f"Added {count} CVEs for {technique.mitre_id}") + return count + except Exception as e: + logger.error(f"OSINT enrichment failed for {technique.mitre_id}: {e}") + return 0 + +def enrich_all_techniques(db: Session): + """Enriquecer todas las técnicas (rate-limited por NVD: 5 req/30s sin API key).""" + import time + techniques = db.query(Technique).all() + total = 0 + for i, tech in enumerate(techniques): + total += enrich_technique_with_cves(db, tech) + if i % 5 == 4: + time.sleep(30) + return total +``` + +3. **Job semanal:** +```python +scheduler.add_job( + lambda: enrich_all_techniques(SessionLocal()), + "interval", days=7, id="osint_enrichment", + replace_existing=True +) +``` + +**Verificación:** +- Ejecutar `enrich_technique_with_cves(db, technique)` para T1059 → obtiene CVEs +- No se crean duplicados si se ejecuta dos veces +- Técnicas con nuevos CVEs se marcan `review_required = True` +- Rate limiting respeta NVD API limits (5 req/30s) + +--- + +### Tarea 4.2: Detección de Stale Coverage (versión simple) + +**Qué:** Marcar técnicas cuya última prueba fue hace más de N meses. + +> **Nota:** Esta es la versión simple. Fase 8 (Decay Engine) la reemplaza con un motor completo y configurable con políticas, factores múltiples y confidence scores. + +**Implementación:** +```python +# app/services/stale_detection_service.py + +from datetime import datetime, timedelta +from sqlalchemy.orm import Session +from sqlalchemy import func +from app.models import Technique, Test +import logging + +logger = logging.getLogger(__name__) + +STALE_THRESHOLD_DAYS = 365 + +def detect_stale_coverage(db: Session) -> int: + """Marcar técnicas con cobertura stale.""" + cutoff = datetime.utcnow() - timedelta(days=STALE_THRESHOLD_DAYS) + + latest_test = ( + db.query( + Test.technique_id, + func.max(Test.updated_at).label("last_tested") + ) + .filter(Test.state == "validated") + .group_by(Test.technique_id) + .subquery() + ) + + stale_techniques = ( + db.query(Technique) + .outerjoin(latest_test, Technique.id == latest_test.c.technique_id) + .filter( + (latest_test.c.last_tested < cutoff) | + (latest_test.c.last_tested.is_(None)) + ) + .filter(Technique.status_global != "not_evaluated") + .all() + ) + + count = 0 + for tech in stale_techniques: + if not tech.review_required: + tech.review_required = True + count += 1 + logger.info(f"Marked {tech.mitre_id} as stale coverage") + + if count > 0: + db.commit() + return count +``` + +Registrar como job diario. + +**Verificación:** +- Técnica con último test hace 13 meses → se marca `review_required = True` +- Técnica con test reciente → no se marca +- Técnica nunca probada pero con status `not_evaluated` → no se marca + +--- + +## FASE 5 — Gestión Operativa Avanzada + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fase 0 (scoring en BD) + +--- + +### Tarea 5.1: Scoring Compuesto Maduro + +**Qué:** Reformar el scoring para que sea compuesto: % técnicas probadas × peso por criticidad × recencia × severidad. + +**Implementación:** + +Actualizar `scoring_service.py` para incluir factor de recencia (decay): +```python +def _recency_factor(last_tested: datetime) -> float: + """Factor de decaimiento: 1.0 si reciente, disminuye con el tiempo.""" + if not last_tested: + return 0.0 + days_ago = (datetime.utcnow() - last_tested).days + if days_ago <= 90: + return 1.0 + elif days_ago <= 180: + return 0.8 + elif days_ago <= 365: + return 0.5 + else: + return 0.2 +``` + +Y persistir pesos en BD en vez de settings: +```python +class ScoringConfig(Base): + __tablename__ = "scoring_config" + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + weight_tests = Column(Float, default=40.0) + weight_detection = Column(Float, default=25.0) + weight_d3fend = Column(Float, default=15.0) + weight_recency = Column(Float, default=10.0) + weight_severity = Column(Float, default=10.0) + updated_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + updated_at = Column(DateTime, default=datetime.utcnow) +``` + +**Verificación:** +- Score de técnica probada hace 1 año es menor que una probada ayer (mismo resultado) +- Cambiar pesos en BD se refleja inmediatamente en scores +- Pesos persisten tras reinicio del servidor +- Score compuesto total suma 100 siempre + +--- + +### Tarea 5.2: Histórico y Evolución de Cobertura + +**Qué:** Expandir snapshots para comparar entre meses, equipos y tácticas. + +**Implementación:** + +Añadir campos al modelo CoverageSnapshot: +```python +op.add_column('coverage_snapshots', Column('by_tactic', JSONB, default={})) +op.add_column('coverage_snapshots', Column('by_status', JSONB, default={})) +op.add_column('coverage_snapshots', Column('stale_count', Integer, default=0)) +op.add_column('coverage_snapshots', Column('never_tested_count', Integer, default=0)) +``` + +Y endpoint para consumir tendencias: +```python +@router.get("/evolution") +def coverage_evolution(months: int = 12, db: Session = Depends(get_db), user=Depends(get_current_user)): + cutoff = datetime.utcnow() - timedelta(days=months * 30) + snapshots = db.query(CoverageSnapshot).filter( + CoverageSnapshot.created_at >= cutoff + ).order_by(CoverageSnapshot.created_at).all() + return [ + { + "date": s.created_at.isoformat(), + "org_score": s.org_score, + "coverage_pct": s.coverage_percentage, + "by_tactic": s.by_tactic, + "stale_count": s.stale_count, + } + for s in snapshots + ] +``` + +**Verificación:** +- Snapshot semanal ahora incluye desglose por táctica +- `/evolution?months=6` retorna snapshots de los últimos 6 meses +- Gráfico de tendencias en frontend muestra línea temporal + +--- + +## FASE 6 — Analytics para BI + Webhooks (Feature Adicional) + +**Duración estimada:** 2 semanas +**Dependencias:** Fase 2 + +--- + +### Tarea 6.1: Sistema de Webhooks + +**Qué:** Permitir que Aegis envíe notificaciones HTTP a sistemas externos cuando ocurren eventos. + +**Implementación:** + +1. **Modelo — WebhookConfig:** +```python +class WebhookConfig(Base): + __tablename__ = "webhook_configs" + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(200), nullable=False) + url = Column(Text, nullable=False) + secret = Column(String(256)) # Para HMAC signature + events = Column(JSONB, default=[]) # ["test.validated", "campaign.completed", ...] + is_active = Column(Boolean, default=True) + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + last_triggered_at = Column(DateTime) + failure_count = Column(Integer, default=0) +``` + +2. **Servicio de dispatch:** +```python +import requests +import hashlib +import hmac +import json + +def dispatch_webhook(event_type: str, payload: dict): + db = SessionLocal() + try: + configs = db.query(WebhookConfig).filter( + WebhookConfig.is_active == True, + WebhookConfig.events.contains([event_type]) + ).all() + for config in configs: + try: + body = json.dumps({ + "event": event_type, + "data": payload, + "timestamp": datetime.utcnow().isoformat() + }) + headers = {"Content-Type": "application/json"} + if config.secret: + sig = hmac.new(config.secret.encode(), body.encode(), hashlib.sha256).hexdigest() + headers["X-Aegis-Signature"] = sig + resp = requests.post(config.url, data=body, headers=headers, timeout=10) + config.last_triggered_at = datetime.utcnow() + if resp.status_code >= 400: + config.failure_count += 1 + except Exception as e: + config.failure_count += 1 + logger.warning(f"Webhook {config.name} failed: {e}") + db.commit() + finally: + db.close() +``` + +3. Llamar `dispatch_webhook` en puntos clave: post-validación de test, completar campaña, sync MITRE, etc. + +**Verificación:** +- Crear webhook para `test.validated` → validar un test → webhook recibe POST +- Signature HMAC es verificable en el receptor +- Webhook con URL inválida incrementa `failure_count` sin crashear +- Desactivar webhook → no se dispara + +--- + +## FASE 7 — Notificaciones Multi-Canal (Feature Adicional) + +**Duración estimada:** 1-2 semanas +**Dependencias:** Fase 3 + +--- + +### Tarea 7.1: Notificaciones por Email + +**Qué:** Añadir capacidad de enviar notificaciones por email además de in-app. + +**Implementación:** + +1. **Config:** +```python +SMTP_ENABLED: bool = False +SMTP_HOST: str = "" +SMTP_PORT: int = 587 +SMTP_USERNAME: str = "" +SMTP_PASSWORD: str = "" +SMTP_FROM_EMAIL: str = "aegis@company.com" +SMTP_USE_TLS: bool = True +``` + +2. **app/services/email_service.py:** +```python +import smtplib +from email.mime.text import MIMEText +from email.mime.multipart import MIMEMultipart +from app.config import settings + +def send_email(to: str, subject: str, html_body: str): + if not settings.SMTP_ENABLED: + return + msg = MIMEMultipart("alternative") + msg["Subject"] = f"[Aegis] {subject}" + msg["From"] = settings.SMTP_FROM_EMAIL + msg["To"] = to + msg.attach(MIMEText(html_body, "html")) + + with smtplib.SMTP(settings.SMTP_HOST, settings.SMTP_PORT) as server: + if settings.SMTP_USE_TLS: + server.starttls() + if settings.SMTP_USERNAME: + server.login(settings.SMTP_USERNAME, settings.SMTP_PASSWORD) + server.send_message(msg) +``` + +3. Integrar en `notification_service.py` — para eventos críticos (test validado, campaña completada, nueva técnica MITRE). + +**Verificación:** +- Con SMTP configurado, validar test → email al lead correspondiente +- Email incluye link directo al test en la plataforma +- Con SMTP deshabilitado, no falla — solo skip + +--- + +### Tarea 7.2: Preferencias de Notificación por Usuario + +**Migración:** +```python +op.add_column('users', Column('notification_preferences', JSONB, default={ + "email_on_test_validated": True, + "email_on_campaign_completed": True, + "email_on_new_mitre_techniques": False, + "in_app_all": True, +})) +op.add_column('users', Column('jira_account_id', String(100))) # para Tempo +``` + +--- + +## FASE 8 — Detection Lifecycle Management (DLM) + +**Duración estimada:** 3-4 semanas +**Dependencias:** Fase 0 (Redis, índices, excepciones de dominio) + +> **Por qué aquí:** Todo el resto de las fases avanzadas (9-14) depende de que las detecciones tengan ciclo de vida. Sin decay, confidence score y revalidación, el resto opera sobre datos estáticos que pierden valor con el tiempo. +> +> **Nota sobre paralelismo:** Esta fase puede ejecutarse en paralelo con Fases 1-7, ya que su única dependencia real es Fase 0. Sin embargo, se recomienda tener al menos Fases 1-5 funcionando para que haya datos reales sobre los que el DLM opere. + +--- + +### Tarea 8.1: Modelo de Datos — Infraestructura de Detección + +**Qué:** Crear las tablas que soportan el ciclo de vida de detecciones: metadatos de versión, asociaciones SIEM/EDR, y estado de salud. + +**Implementación:** + +**app/models/detection_lifecycle.py** (nuevo): +```python +import uuid +import enum +from datetime import datetime +from sqlalchemy import ( + Column, String, Integer, Float, Boolean, DateTime, + ForeignKey, Text, Enum as SQLEnum +) +from sqlalchemy.dialects.postgresql import UUID, JSONB +from sqlalchemy.orm import relationship +from app.database import Base + + +class DetectionConfidence(str, enum.Enum): + fresh = "fresh" # Validado recientemente, todo OK + aging = "aging" # Acercándose a caducidad + stale = "stale" # Caducado, necesita revalidación + broken = "broken" # Se detectó cambio que invalida + unknown = "unknown" # Sin datos suficientes + + +class DetectionHealthStatus(str, enum.Enum): + healthy = "healthy" # Regla activa, disparando normalmente + silent = "silent" # Regla no ha disparado en período esperado + noisy = "noisy" # Regla disparando excesivamente (false positives) + orphan = "orphan" # Regla sin owner asignado + deprecated = "deprecated" # Regla marcada como obsoleta + untested = "untested" # Regla nunca validada + + +class InvalidationReason(str, enum.Enum): + time_decay = "time_decay" + mitre_update = "mitre_update" + log_source_change = "log_source_change" + siem_update = "siem_update" + edr_update = "edr_update" + infrastructure_change = "infrastructure_change" + parser_change = "parser_change" + manual = "manual" + rule_modified = "rule_modified" + + +class DetectionAsset(Base): + """Representa un activo de detección concreto: una regla SIEM, + query SPL, regla YARA, regla EDR, etc.""" + __tablename__ = "detection_assets" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(500), nullable=False) + description = Column(Text) + + # Tipo y plataforma + asset_type = Column(String(50), nullable=False) + platform = Column(String(100)) + + # Contenido de la regla + rule_content = Column(Text) + rule_language = Column(String(50)) + rule_repository_url = Column(Text) + rule_file_path = Column(String(500)) + + # Versionado + rule_version = Column(String(50)) + rule_hash = Column(String(64)) + last_rule_change_at = Column(DateTime) + + # Log source tracking + log_source_name = Column(String(200)) + log_source_version = Column(String(50)) + log_source_config = Column(JSONB, default={}) + + # Infraestructura asociada + infrastructure_hash = Column(String(64)) + infrastructure_details = Column(JSONB, default={}) + + # Estado de salud + health_status = Column( + SQLEnum(DetectionHealthStatus), + default=DetectionHealthStatus.untested + ) + last_alert_at = Column(DateTime) + alert_count_30d = Column(Integer, default=0) + false_positive_rate = Column(Float) + expected_alert_frequency = Column(String(50)) + + # Ownership (se completa en Fase 9) + owner_id = Column(UUID(as_uuid=True), ForeignKey("users.id")) + backup_owner_id = Column(UUID(as_uuid=True), ForeignKey("users.id")) + team = Column(String(100)) + + # Metadata + is_active = Column(Boolean, default=True) + tags = Column(JSONB, default=[]) + metadata = Column(JSONB, default={}) + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + created_at = Column(DateTime, default=datetime.utcnow) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) + + # Relationships + technique_mappings = relationship( + "DetectionTechniqueMapping", back_populates="detection_asset" + ) + validations = relationship( + "DetectionValidation", back_populates="detection_asset" + ) + + +class DetectionTechniqueMapping(Base): + """Mapeo N:M entre activos de detección y técnicas MITRE.""" + __tablename__ = "detection_technique_mappings" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + detection_asset_id = Column( + UUID(as_uuid=True), + ForeignKey("detection_assets.id", ondelete="CASCADE"), + nullable=False + ) + technique_id = Column( + UUID(as_uuid=True), + ForeignKey("techniques.id", ondelete="CASCADE"), + nullable=False + ) + coverage_type = Column(String(50), default="detect") + confidence_level = Column(String(20), default="medium") + notes = Column(Text) + created_at = Column(DateTime, default=datetime.utcnow) + + detection_asset = relationship( + "DetectionAsset", back_populates="technique_mappings" + ) + + +class DetectionValidation(Base): + """Registro inmutable de cada validación de una detección. + Es el 'sello de calidad' con fecha de caducidad.""" + __tablename__ = "detection_validations" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + detection_asset_id = Column( + UUID(as_uuid=True), + ForeignKey("detection_assets.id", ondelete="CASCADE"), + nullable=False + ) + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id"), nullable=True) + test_id = Column(UUID(as_uuid=True), ForeignKey("tests.id"), nullable=True) + + # Resultado + validated_at = Column(DateTime, default=datetime.utcnow) + expires_at = Column(DateTime, nullable=False) + is_valid = Column(Boolean, default=True) + validation_result = Column(String(50)) + validation_method = Column(String(100)) + + # Snapshot del estado en el momento de validación + rule_hash_at_validation = Column(String(64)) + log_source_version_at_validation = Column(String(50)) + infrastructure_hash_at_validation = Column(String(64)) + environment_snapshot = Column(JSONB, default={}) + + # Invalidación + invalidated_at = Column(DateTime) + invalidation_reason = Column(SQLEnum(InvalidationReason)) + invalidation_details = Column(Text) + invalidated_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + + # Quién validó + validated_by = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + + # Integridad + integrity_hash = Column(String(64)) + notes = Column(Text) + evidence_ids = Column(JSONB, default=[]) + + detection_asset = relationship("DetectionAsset", back_populates="validations") + + +class TechniqueConfidenceScore(Base): + """Score calculado de confianza por técnica. + Se recalcula periódicamente por el decay engine.""" + __tablename__ = "technique_confidence_scores" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + technique_id = Column( + UUID(as_uuid=True), + ForeignKey("techniques.id", ondelete="CASCADE"), + nullable=False, unique=True + ) + + confidence_level = Column( + SQLEnum(DetectionConfidence), default=DetectionConfidence.unknown + ) + confidence_score = Column(Float, default=0.0) + detection_count = Column(Integer, default=0) + valid_detection_count = Column(Integer, default=0) + + last_validated_at = Column(DateTime) + next_validation_due = Column(DateTime) + last_recalculated_at = Column(DateTime, default=datetime.utcnow) + + recency_factor = Column(Float, default=0.0) + coverage_factor = Column(Float, default=0.0) + health_factor = Column(Float, default=0.0) + diversity_factor = Column(Float, default=0.0) + + score_breakdown = Column(JSONB, default={}) + risk_factors = Column(JSONB, default=[]) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) + + +class InfrastructureChangeLog(Base): + """Registro de cambios en infraestructura que pueden invalidar detecciones.""" + __tablename__ = "infrastructure_change_logs" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + change_type = Column(String(100), nullable=False) + description = Column(Text, nullable=False) + affected_platforms = Column(JSONB, default=[]) + affected_log_sources = Column(JSONB, default=[]) + change_date = Column(DateTime, default=datetime.utcnow) + reported_by = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + auto_invalidate = Column(Boolean, default=True) + invalidated_count = Column(Integer, default=0) + metadata = Column(JSONB, default={}) + created_at = Column(DateTime, default=datetime.utcnow) +``` + +**Migración Alembic:** +```bash +alembic revision --autogenerate -m "add_detection_lifecycle_tables" +alembic upgrade head +``` + +**Índices adicionales** (agregar manualmente en la migración): +```python +def upgrade(): + # ... tablas auto-generadas ... + + op.create_index('ix_detection_assets_platform', 'detection_assets', ['platform']) + op.create_index('ix_detection_assets_health_status', 'detection_assets', ['health_status']) + op.create_index('ix_detection_assets_owner_id', 'detection_assets', ['owner_id']) + op.create_index('ix_detection_technique_mappings_technique_id', 'detection_technique_mappings', ['technique_id']) + op.create_index('ix_detection_technique_mappings_asset_id', 'detection_technique_mappings', ['detection_asset_id']) + op.create_index('ix_detection_validations_asset_id_valid', 'detection_validations', ['detection_asset_id', 'is_valid']) + op.create_index('ix_detection_validations_expires_at', 'detection_validations', ['expires_at']) + op.create_index('ix_technique_confidence_scores_technique_id', 'technique_confidence_scores', ['technique_id']) + op.create_index('ix_technique_confidence_scores_confidence_level', 'technique_confidence_scores', ['confidence_level']) + op.create_index('ix_infrastructure_change_logs_change_date', 'infrastructure_change_logs', ['change_date']) +``` + +**app/schemas/detection_lifecycle_schema.py** (nuevo): +```python +from pydantic import BaseModel, Field +from typing import Optional +from uuid import UUID +from datetime import datetime +from app.models.detection_lifecycle import ( + DetectionConfidence, DetectionHealthStatus, InvalidationReason +) + + +class DetectionAssetCreate(BaseModel): + name: str = Field(..., min_length=3, max_length=500) + description: Optional[str] = None + asset_type: str = Field( + ..., + pattern=r'^(siem_rule|edr_rule|sigma_rule|yara_rule|spl_query|kql_query|custom_script)$' + ) + platform: Optional[str] = None + rule_content: Optional[str] = None + rule_language: Optional[str] = None + rule_repository_url: Optional[str] = None + rule_file_path: Optional[str] = None + rule_version: Optional[str] = None + log_source_name: Optional[str] = None + log_source_version: Optional[str] = None + log_source_config: Optional[dict] = {} + infrastructure_details: Optional[dict] = {} + expected_alert_frequency: Optional[str] = None + tags: Optional[list[str]] = [] + technique_ids: Optional[list[UUID]] = [] + + +class DetectionAssetUpdate(BaseModel): + name: Optional[str] = None + description: Optional[str] = None + rule_content: Optional[str] = None + rule_version: Optional[str] = None + log_source_version: Optional[str] = None + infrastructure_details: Optional[dict] = None + expected_alert_frequency: Optional[str] = None + health_status: Optional[DetectionHealthStatus] = None + last_alert_at: Optional[datetime] = None + alert_count_30d: Optional[int] = None + false_positive_rate: Optional[float] = None + owner_id: Optional[UUID] = None + backup_owner_id: Optional[UUID] = None + team: Optional[str] = None + tags: Optional[list[str]] = None + is_active: Optional[bool] = None + + +class DetectionAssetOut(BaseModel): + id: UUID + name: str + description: Optional[str] + asset_type: str + platform: Optional[str] + rule_language: Optional[str] + rule_version: Optional[str] + rule_hash: Optional[str] + health_status: DetectionHealthStatus + last_alert_at: Optional[datetime] + alert_count_30d: int + false_positive_rate: Optional[float] + expected_alert_frequency: Optional[str] + owner_id: Optional[UUID] + team: Optional[str] + is_active: bool + tags: list + created_at: datetime + updated_at: datetime + + class Config: + from_attributes = True + + +class DetectionValidationCreate(BaseModel): + detection_asset_id: UUID + technique_id: Optional[UUID] = None + test_id: Optional[UUID] = None + validation_result: str = Field( + ..., pattern=r'^(detected|not_detected|partial|error)$' + ) + validation_method: str + notes: Optional[str] = None + evidence_ids: Optional[list[UUID]] = [] + validity_days: int = Field(default=180, ge=30, le=730) + + +class DetectionValidationOut(BaseModel): + id: UUID + detection_asset_id: UUID + technique_id: Optional[UUID] + validated_at: datetime + expires_at: datetime + is_valid: bool + validation_result: str + validation_method: str + invalidated_at: Optional[datetime] + invalidation_reason: Optional[InvalidationReason] + validated_by: UUID + notes: Optional[str] + + class Config: + from_attributes = True + + +class TechniqueConfidenceOut(BaseModel): + technique_id: UUID + confidence_level: DetectionConfidence + confidence_score: float + detection_count: int + valid_detection_count: int + last_validated_at: Optional[datetime] + next_validation_due: Optional[datetime] + recency_factor: float + coverage_factor: float + health_factor: float + diversity_factor: float + risk_factors: list + + class Config: + from_attributes = True + + +class InfrastructureChangeCreate(BaseModel): + change_type: str + description: str = Field(..., min_length=10) + affected_platforms: list[str] = [] + affected_log_sources: list[str] = [] + change_date: Optional[datetime] = None + auto_invalidate: bool = True + + +class InfrastructureChangeOut(BaseModel): + id: UUID + change_type: str + description: str + affected_platforms: list + affected_log_sources: list + change_date: datetime + auto_invalidate: bool + invalidated_count: int + reported_by: UUID + created_at: datetime + + class Config: + from_attributes = True +``` + +**Verificación:** +- `alembic upgrade head` ejecuta sin error +- `\dt` en psql muestra las 5 nuevas tablas +- `\di` muestra todos los índices creados +- Schemas validan correctamente: `DetectionAssetCreate(name="x", asset_type="invalid")` → error +- Los JSONB defaults funcionan: crear un DetectionAsset sin tags → `tags = []` + +--- + +### Tarea 8.2: Servicio de Detection Assets — CRUD + Versionado + +**Qué:** Servicio que encapsula la gestión de activos de detección con auto-hash de contenido y detección de cambios. + +**Implementación:** + +**app/services/detection_asset_service.py** (nuevo): +```python +import hashlib +import logging +from datetime import datetime +from typing import Optional +from uuid import UUID + +from sqlalchemy.orm import Session, joinedload +from sqlalchemy import func + +from app.models.detection_lifecycle import ( + DetectionAsset, DetectionTechniqueMapping, + DetectionValidation, DetectionHealthStatus +) +from app.models.technique import Technique +from app.domain.exceptions import ( + EntityNotFoundError, DuplicateEntityError, InvalidOperationError +) +from app.services import audit_service + +logger = logging.getLogger(__name__) + + +def _compute_rule_hash(content: str) -> str: + """SHA256 del contenido de la regla, normalizado.""" + normalized = content.strip().replace('\r\n', '\n') + return hashlib.sha256(normalized.encode()).hexdigest() + + +def create_detection_asset( + db: Session, data: dict, user_id: UUID +) -> DetectionAsset: + """Crea un nuevo activo de detección con hash automático.""" + technique_ids = data.pop("technique_ids", []) + asset = DetectionAsset(**data, created_by=user_id) + + if asset.rule_content: + asset.rule_hash = _compute_rule_hash(asset.rule_content) + asset.last_rule_change_at = datetime.utcnow() + + if asset.infrastructure_details: + infra_str = str(sorted(asset.infrastructure_details.items())) + asset.infrastructure_hash = hashlib.sha256(infra_str.encode()).hexdigest() + + db.add(asset) + db.flush() + + for tech_id in technique_ids: + technique = db.query(Technique).filter(Technique.id == tech_id).first() + if technique: + mapping = DetectionTechniqueMapping( + detection_asset_id=asset.id, + technique_id=tech_id, + ) + db.add(mapping) + + db.commit() + db.refresh(asset) + + audit_service.log_action( + db, user_id, "DETECTION_ASSET_CREATED", + "detection_asset", str(asset.id), + details={ + "name": asset.name, + "type": asset.asset_type, + "platform": asset.platform, + "technique_count": len(technique_ids), + } + ) + return asset + + +def update_detection_asset( + db: Session, asset_id: UUID, data: dict, user_id: UUID +) -> DetectionAsset: + """Actualiza un activo, detectando cambios en contenido de regla.""" + asset = db.query(DetectionAsset).filter(DetectionAsset.id == asset_id).first() + if not asset: + raise EntityNotFoundError("DetectionAsset", str(asset_id)) + + changes = {} + rule_changed = False + + for key, value in data.items(): + if value is not None and hasattr(asset, key): + old_value = getattr(asset, key) + if old_value != value: + changes[key] = {"old": str(old_value), "new": str(value)} + setattr(asset, key, value) + + if "rule_content" in data and data["rule_content"]: + new_hash = _compute_rule_hash(data["rule_content"]) + if new_hash != asset.rule_hash: + rule_changed = True + asset.rule_hash = new_hash + asset.last_rule_change_at = datetime.utcnow() + changes["rule_hash"] = {"old": asset.rule_hash, "new": new_hash} + + if "infrastructure_details" in data and data["infrastructure_details"]: + infra_str = str(sorted(data["infrastructure_details"].items())) + new_hash = hashlib.sha256(infra_str.encode()).hexdigest() + if new_hash != asset.infrastructure_hash: + asset.infrastructure_hash = new_hash + changes["infrastructure_hash_changed"] = True + + db.commit() + db.refresh(asset) + + if changes: + audit_service.log_action( + db, user_id, "DETECTION_ASSET_UPDATED", + "detection_asset", str(asset.id), + details={"changes": changes, "rule_changed": rule_changed} + ) + + if rule_changed: + _invalidate_validations_for_asset(db, asset.id, user_id, "rule_modified") + + return asset + + +def _invalidate_validations_for_asset( + db: Session, asset_id: UUID, user_id: UUID, reason: str +): + """Invalida todas las validaciones vigentes de un asset.""" + validations = db.query(DetectionValidation).filter( + DetectionValidation.detection_asset_id == asset_id, + DetectionValidation.is_valid == True, + ).all() + + count = 0 + for v in validations: + v.is_valid = False + v.invalidated_at = datetime.utcnow() + v.invalidation_reason = reason + v.invalidated_by = user_id + count += 1 + + if count > 0: + db.commit() + logger.info(f"Invalidated {count} validations for asset {asset_id} due to {reason}") + return count + + +def get_asset_with_details(db: Session, asset_id: UUID) -> DetectionAsset: + """Obtiene un asset con sus mapeos y validaciones.""" + asset = ( + db.query(DetectionAsset) + .options( + joinedload(DetectionAsset.technique_mappings), + joinedload(DetectionAsset.validations), + ) + .filter(DetectionAsset.id == asset_id) + .first() + ) + if not asset: + raise EntityNotFoundError("DetectionAsset", str(asset_id)) + return asset + + +def list_assets( + db: Session, + platform: Optional[str] = None, + asset_type: Optional[str] = None, + health_status: Optional[str] = None, + technique_id: Optional[UUID] = None, + is_active: Optional[bool] = True, +) -> list[DetectionAsset]: + """Lista assets con filtros opcionales.""" + query = db.query(DetectionAsset) + if platform: + query = query.filter(DetectionAsset.platform == platform) + if asset_type: + query = query.filter(DetectionAsset.asset_type == asset_type) + if health_status: + query = query.filter(DetectionAsset.health_status == health_status) + if is_active is not None: + query = query.filter(DetectionAsset.is_active == is_active) + if technique_id: + query = query.join(DetectionTechniqueMapping).filter( + DetectionTechniqueMapping.technique_id == technique_id + ) + return query.order_by(DetectionAsset.name).all() + + +def get_technique_detection_summary(db: Session, technique_id: UUID) -> dict: + """Resumen de detecciones para una técnica específica.""" + mappings = ( + db.query(DetectionTechniqueMapping) + .options(joinedload(DetectionTechniqueMapping.detection_asset)) + .filter(DetectionTechniqueMapping.technique_id == technique_id) + .all() + ) + + assets = [m.detection_asset for m in mappings if m.detection_asset] + active_assets = [a for a in assets if a.is_active] + + valid_count = 0 + for asset in active_assets: + has_valid = db.query(DetectionValidation).filter( + DetectionValidation.detection_asset_id == asset.id, + DetectionValidation.is_valid == True, + DetectionValidation.expires_at > datetime.utcnow(), + ).first() + if has_valid: + valid_count += 1 + + health_distribution = {} + for asset in active_assets: + status = asset.health_status.value if asset.health_status else "unknown" + health_distribution[status] = health_distribution.get(status, 0) + 1 + + platforms = list(set(a.platform for a in active_assets if a.platform)) + + return { + "technique_id": str(technique_id), + "total_assets": len(active_assets), + "validated_assets": valid_count, + "health_distribution": health_distribution, + "platforms": platforms, + "coverage_types": list(set(m.coverage_type for m in mappings if m.coverage_type)), + } +``` + +**Verificación:** +- Crear un DetectionAsset con `rule_content` → `rule_hash` se genera automáticamente +- Actualizar `rule_content` con contenido diferente → hash cambia + `last_rule_change_at` se actualiza + validaciones vigentes se invalidan +- Actualizar con mismo contenido → hash no cambia, validaciones intactas +- `list_assets(platform="Splunk")` filtra correctamente +- `get_technique_detection_summary` retorna conteos correctos +- Audit log registra creación y actualización con detalles de cambios + +--- + +### Tarea 8.3: Decay Engine — Motor de Obsolescencia + +**Qué:** El cerebro del sistema de lifecycle. Evalúa periódicamente todas las detecciones y degrada su estado según políticas configurables. + +> **Nota:** Este motor reemplaza funcionalmente la detección simple de Fase 4 (Tarea 4.2). La Tarea 4.2 puede seguir activa como fallback simple, pero el Decay Engine es el sistema definitivo. + +**Implementación:** + +**app/models/decay_policy.py** (nuevo): +```python +import uuid +from datetime import datetime +from sqlalchemy import Column, String, Integer, Float, Boolean, DateTime +from sqlalchemy.dialects.postgresql import UUID, JSONB +from app.database import Base + + +class DecayPolicy(Base): + """Política de caducidad configurable por plataforma/tipo.""" + __tablename__ = "decay_policies" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(200), nullable=False) + description = Column(String(500)) + + # Alcance + applies_to_platform = Column(String(100)) # null = todas + applies_to_asset_type = Column(String(50)) # null = todos + applies_to_tactic = Column(String(100)) # null = todas + + # Umbrales de tiempo (días) + fresh_days = Column(Integer, default=90) # 0-90 = Fresh + aging_days = Column(Integer, default=180) # 91-180 = Aging + stale_days = Column(Integer, default=365) # 181-365 = Stale + # >365 = Broken (implícito) + + # Validación por defecto + default_validity_days = Column(Integer, default=180) + + # Health: umbrales + silent_threshold_days = Column(Integer, default=30) + noisy_threshold_daily = Column(Integer, default=100) + + # Pesos de factores + recency_weight = Column(Float, default=0.3) + coverage_weight = Column(Float, default=0.3) + health_weight = Column(Float, default=0.25) + diversity_weight = Column(Float, default=0.15) + + is_default = Column(Boolean, default=False) + is_active = Column(Boolean, default=True) + created_at = Column(DateTime, default=datetime.utcnow) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) +``` + +**app/services/decay_engine_service.py** (nuevo): +```python +import logging +from datetime import datetime, timedelta +from typing import Optional +from uuid import UUID + +from sqlalchemy.orm import Session +from sqlalchemy import func, and_ + +from app.models.detection_lifecycle import ( + DetectionAsset, DetectionValidation, + DetectionTechniqueMapping, TechniqueConfidenceScore, + DetectionConfidence, DetectionHealthStatus, + InfrastructureChangeLog +) +from app.models.decay_policy import DecayPolicy +from app.models.technique import Technique + +logger = logging.getLogger(__name__) + + +def get_applicable_policy( + db: Session, + platform: Optional[str] = None, + asset_type: Optional[str] = None, + tactic: Optional[str] = None, +) -> DecayPolicy: + """Obtiene la política más específica aplicable.""" + query = db.query(DecayPolicy).filter(DecayPolicy.is_active == True) + + if platform: + specific = query.filter(DecayPolicy.applies_to_platform == platform).first() + if specific: + return specific + + if asset_type: + specific = query.filter(DecayPolicy.applies_to_asset_type == asset_type).first() + if specific: + return specific + + default_policy = query.filter(DecayPolicy.is_default == True).first() + if default_policy: + return default_policy + + return DecayPolicy() + + +def calculate_confidence_for_technique( + db: Session, technique_id: UUID +) -> TechniqueConfidenceScore: + """Calcula el Confidence Score completo para una técnica.""" + technique = db.query(Technique).filter(Technique.id == technique_id).first() + if not technique: + return None + + policy = get_applicable_policy(db, tactic=technique.tactic) + + mappings = ( + db.query(DetectionTechniqueMapping) + .filter(DetectionTechniqueMapping.technique_id == technique_id) + .all() + ) + asset_ids = [m.detection_asset_id for m in mappings] + + if not asset_ids: + return _create_or_update_score( + db, technique_id, + confidence_level=DetectionConfidence.unknown, + confidence_score=0.0, + factors={"recency": 0.0, "coverage": 0.0, "health": 0.0, "diversity": 0.0}, + risk_factors=["no_detection_assets"], + detection_count=0, valid_count=0, + ) + + assets = ( + db.query(DetectionAsset) + .filter(DetectionAsset.id.in_(asset_ids), DetectionAsset.is_active == True) + .all() + ) + + now = datetime.utcnow() + + # 1. RECENCY FACTOR + valid_validations = ( + db.query(DetectionValidation) + .filter( + DetectionValidation.detection_asset_id.in_(asset_ids), + DetectionValidation.is_valid == True, + DetectionValidation.expires_at > now, + ) + .all() + ) + + recency_factor = 0.0 + last_validated = None + if valid_validations: + most_recent = max(v.validated_at for v in valid_validations) + last_validated = most_recent + days_since = (now - most_recent).days + + if days_since <= policy.fresh_days: + recency_factor = 1.0 + elif days_since <= policy.aging_days: + range_days = policy.aging_days - policy.fresh_days + elapsed = days_since - policy.fresh_days + recency_factor = 1.0 - (elapsed / range_days) * 0.4 + elif days_since <= policy.stale_days: + range_days = policy.stale_days - policy.aging_days + elapsed = days_since - policy.aging_days + recency_factor = 0.6 - (elapsed / range_days) * 0.4 + else: + recency_factor = max(0.1, 0.2 - ((days_since - policy.stale_days) / 365) * 0.1) + + # 2. COVERAGE FACTOR + active_count = len(assets) + valid_count = len(set(v.detection_asset_id for v in valid_validations)) + + if active_count == 0: + coverage_factor = 0.0 + elif valid_count >= 3: + coverage_factor = 1.0 + elif valid_count >= 2: + coverage_factor = 0.8 + elif valid_count >= 1: + coverage_factor = 0.5 + else: + coverage_factor = 0.1 + + # 3. HEALTH FACTOR + health_scores = { + DetectionHealthStatus.healthy: 1.0, + DetectionHealthStatus.silent: 0.4, + DetectionHealthStatus.noisy: 0.6, + DetectionHealthStatus.orphan: 0.3, + DetectionHealthStatus.deprecated: 0.0, + DetectionHealthStatus.untested: 0.2, + } + if assets: + health_values = [health_scores.get(a.health_status, 0.2) for a in assets] + health_factor = sum(health_values) / len(health_values) + else: + health_factor = 0.0 + + # 4. DIVERSITY FACTOR + platforms = set(a.platform for a in assets if a.platform) + asset_types = set(a.asset_type for a in assets) + diversity_factor = min(1.0, (len(platforms) * 0.3 + len(asset_types) * 0.2)) + + # SCORE COMPUESTO + confidence_score = ( + recency_factor * policy.recency_weight + + coverage_factor * policy.coverage_weight + + health_factor * policy.health_weight + + diversity_factor * policy.diversity_weight + ) * 100 + + # CONFIDENCE LEVEL + if confidence_score >= 75: + confidence_level = DetectionConfidence.fresh + elif confidence_score >= 50: + confidence_level = DetectionConfidence.aging + elif confidence_score >= 25: + confidence_level = DetectionConfidence.stale + elif confidence_score > 0: + confidence_level = DetectionConfidence.broken + else: + confidence_level = DetectionConfidence.unknown + + # RISK FACTORS + risk_factors = [] + if len(platforms) <= 1: + risk_factors.append("single_platform") + if valid_count == 0: + risk_factors.append("no_valid_detections") + if any(a.health_status == DetectionHealthStatus.silent for a in assets): + risk_factors.append("silent_rules_present") + if any(a.health_status == DetectionHealthStatus.orphan for a in assets): + risk_factors.append("orphan_rules_present") + if recency_factor < 0.5: + risk_factors.append("stale_validation") + if len(assets) < 2: + risk_factors.append("low_detection_diversity") + + next_due = None + if valid_validations: + earliest_expiry = min(v.expires_at for v in valid_validations) + next_due = earliest_expiry + + return _create_or_update_score( + db, technique_id, + confidence_level=confidence_level, + confidence_score=round(confidence_score, 1), + factors={ + "recency": round(recency_factor, 3), + "coverage": round(coverage_factor, 3), + "health": round(health_factor, 3), + "diversity": round(diversity_factor, 3), + }, + risk_factors=risk_factors, + detection_count=active_count, + valid_count=valid_count, + last_validated=last_validated, + next_due=next_due, + ) + + +def _create_or_update_score(db: Session, technique_id: UUID, **kwargs) -> TechniqueConfidenceScore: + """Crea o actualiza el score de confianza.""" + score = db.query(TechniqueConfidenceScore).filter( + TechniqueConfidenceScore.technique_id == technique_id + ).first() + + if not score: + score = TechniqueConfidenceScore(technique_id=technique_id) + db.add(score) + + score.confidence_level = kwargs["confidence_level"] + score.confidence_score = kwargs["confidence_score"] + score.detection_count = kwargs["detection_count"] + score.valid_detection_count = kwargs["valid_count"] + score.recency_factor = kwargs["factors"]["recency"] + score.coverage_factor = kwargs["factors"]["coverage"] + score.health_factor = kwargs["factors"]["health"] + score.diversity_factor = kwargs["factors"]["diversity"] + score.risk_factors = kwargs["risk_factors"] + score.score_breakdown = kwargs["factors"] + score.last_validated_at = kwargs.get("last_validated") + score.next_validation_due = kwargs.get("next_due") + score.last_recalculated_at = datetime.utcnow() + + db.commit() + db.refresh(score) + return score + + +def run_decay_engine(db: Session) -> dict: + """Ejecuta el motor de decay sobre todas las técnicas. Job diario.""" + techniques = db.query(Technique).all() + results = { + "total_techniques": len(techniques), + "fresh": 0, "aging": 0, "stale": 0, + "broken": 0, "unknown": 0, + "validations_expired": 0, + } + + now = datetime.utcnow() + + # 1. Expirar validaciones vencidas + expired = ( + db.query(DetectionValidation) + .filter(DetectionValidation.is_valid == True, DetectionValidation.expires_at <= now) + .all() + ) + for v in expired: + v.is_valid = False + v.invalidated_at = now + v.invalidation_reason = "time_decay" + results["validations_expired"] = len(expired) + + if expired: + db.commit() + + # 2. Recalcular confidence scores + for technique in techniques: + score = calculate_confidence_for_technique(db, technique.id) + if score: + level = score.confidence_level.value + results[level] = results.get(level, 0) + 1 + + logger.info(f"Decay engine completed: {results}") + return results + + +def process_infrastructure_change(db: Session, change_id: UUID) -> int: + """Procesa un cambio de infraestructura e invalida detecciones afectadas.""" + change = db.query(InfrastructureChangeLog).filter( + InfrastructureChangeLog.id == change_id + ).first() + if not change or not change.auto_invalidate: + return 0 + + query = db.query(DetectionAsset).filter(DetectionAsset.is_active == True) + + if change.affected_platforms: + query = query.filter(DetectionAsset.platform.in_(change.affected_platforms)) + + affected_assets = query.all() + total_invalidated = 0 + + for asset in affected_assets: + if change.affected_log_sources: + asset_log_source = asset.log_source_name or "" + if not any(ls in asset_log_source for ls in change.affected_log_sources): + continue + + count = _invalidate_validations_for_asset( + db, asset.id, change.reported_by, "infrastructure_change" + ) + total_invalidated += count + + change.invalidated_count = total_invalidated + db.commit() + + logger.info(f"Infrastructure change {change_id}: invalidated {total_invalidated} validations") + return total_invalidated + + +def _invalidate_validations_for_asset( + db: Session, asset_id: UUID, user_id: UUID, reason: str +) -> int: + """Invalida validaciones vigentes de un asset.""" + validations = db.query(DetectionValidation).filter( + DetectionValidation.detection_asset_id == asset_id, + DetectionValidation.is_valid == True, + ).all() + + for v in validations: + v.is_valid = False + v.invalidated_at = datetime.utcnow() + v.invalidation_reason = reason + v.invalidated_by = user_id + + return len(validations) +``` + +**Registrar job en scheduler:** +```python +scheduler.add_job( + decay_engine_job, + "cron", + hour=2, minute=0, + id="decay_engine", + replace_existing=True, +) + +def decay_engine_job(): + db = SessionLocal() + try: + from app.services.decay_engine_service import run_decay_engine + results = run_decay_engine(db) + logger.info(f"Decay engine job results: {results}") + except Exception as e: + logger.error(f"Decay engine job failed: {e}", exc_info=True) + finally: + db.close() +``` + +**Verificación:** +- Crear asset + validación con `expires_at = ayer` → ejecutar `run_decay_engine()` → validación marcada `is_valid=False` con `invalidation_reason="time_decay"` +- Técnica con validación vigente → `confidence_level = "fresh"` +- Técnica con validación de hace 100 días → `confidence_level = "aging"` +- Técnica sin validaciones → `confidence_level = "unknown"` +- Registrar infrastructure change con `affected_platforms=["Splunk"]` → invalida validaciones de assets Splunk +- Log muestra resumen completo del decay engine + +--- + +### Tarea 8.4: Router de Detection Lifecycle + +**Qué:** Endpoints API para gestionar todo el ciclo de vida de detecciones. + +**Implementación:** + +**app/routers/detection_lifecycle.py** (nuevo): +```python +from fastapi import APIRouter, Depends, Query +from sqlalchemy.orm import Session +from uuid import UUID +from typing import Optional +from datetime import datetime, timedelta + +from app.database import get_db +from app.dependencies.auth import get_current_user, require_role +from app.schemas.detection_lifecycle_schema import ( + DetectionAssetCreate, DetectionAssetUpdate, DetectionAssetOut, + DetectionValidationCreate, DetectionValidationOut, + TechniqueConfidenceOut, + InfrastructureChangeCreate, InfrastructureChangeOut, +) +from app.services import detection_asset_service, decay_engine_service +from app.services import audit_service +from app.models.detection_lifecycle import ( + DetectionAsset, DetectionValidation, + DetectionTechniqueMapping, TechniqueConfidenceScore, + InfrastructureChangeLog, DetectionHealthStatus, +) +from app.models.decay_policy import DecayPolicy +from app.domain.exceptions import EntityNotFoundError +import hashlib + +router = APIRouter(prefix="/detection-lifecycle", tags=["detection-lifecycle"]) + + +# — Detection Assets — + +@router.post("/assets", response_model=DetectionAssetOut, status_code=201) +def create_asset( + body: DetectionAssetCreate, + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + return detection_asset_service.create_detection_asset(db, body.model_dump(), user.id) + + +@router.get("/assets", response_model=list[DetectionAssetOut]) +def list_assets( + platform: Optional[str] = None, + asset_type: Optional[str] = None, + health_status: Optional[str] = None, + technique_id: Optional[UUID] = None, + is_active: Optional[bool] = True, + db: Session = Depends(get_db), + user=Depends(get_current_user), +): + return detection_asset_service.list_assets( + db, platform=platform, asset_type=asset_type, + health_status=health_status, technique_id=technique_id, is_active=is_active, + ) + + +@router.get("/assets/{asset_id}") +def get_asset( + asset_id: UUID, db: Session = Depends(get_db), user=Depends(get_current_user), +): + return detection_asset_service.get_asset_with_details(db, asset_id) + + +@router.patch("/assets/{asset_id}", response_model=DetectionAssetOut) +def update_asset( + asset_id: UUID, body: DetectionAssetUpdate, + db: Session = Depends(get_db), user=Depends(get_current_user), +): + return detection_asset_service.update_detection_asset( + db, asset_id, body.model_dump(exclude_unset=True), user.id + ) + + +# — Technique Mappings — + +@router.post("/assets/{asset_id}/techniques/{technique_id}") +def map_technique( + asset_id: UUID, technique_id: UUID, + coverage_type: str = Query("detect"), + confidence_level: str = Query("medium"), + db: Session = Depends(get_db), user=Depends(get_current_user), +): + mapping = DetectionTechniqueMapping( + detection_asset_id=asset_id, technique_id=technique_id, + coverage_type=coverage_type, confidence_level=confidence_level, + ) + db.add(mapping) + db.commit() + return {"message": "Technique mapped", "mapping_id": str(mapping.id)} + + +@router.get("/techniques/{technique_id}/detections") +def get_technique_detections( + technique_id: UUID, db: Session = Depends(get_db), user=Depends(get_current_user), +): + return detection_asset_service.get_technique_detection_summary(db, technique_id) + + +# — Validations — + +@router.post("/validations", response_model=DetectionValidationOut, status_code=201) +def create_validation( + body: DetectionValidationCreate, + db: Session = Depends(get_db), user=Depends(get_current_user), +): + asset = db.query(DetectionAsset).filter( + DetectionAsset.id == body.detection_asset_id + ).first() + if not asset: + raise EntityNotFoundError("DetectionAsset", str(body.detection_asset_id)) + + validation = DetectionValidation( + detection_asset_id=body.detection_asset_id, + technique_id=body.technique_id, + test_id=body.test_id, + validation_result=body.validation_result, + validation_method=body.validation_method, + notes=body.notes, + evidence_ids=body.evidence_ids or [], + validated_by=user.id, + expires_at=datetime.utcnow() + timedelta(days=body.validity_days), + rule_hash_at_validation=asset.rule_hash, + log_source_version_at_validation=asset.log_source_version, + infrastructure_hash_at_validation=asset.infrastructure_hash, + ) + + data = f"{validation.detection_asset_id}:{validation.validated_by}:{validation.validation_result}:{validation.validated_at}" + validation.integrity_hash = hashlib.sha256(data.encode()).hexdigest() + + db.add(validation) + db.commit() + db.refresh(validation) + + if body.technique_id: + decay_engine_service.calculate_confidence_for_technique(db, body.technique_id) + + audit_service.log_action( + db, user.id, "DETECTION_VALIDATED", + "detection_validation", str(validation.id), + details={ + "asset_id": str(body.detection_asset_id), + "result": body.validation_result, + "validity_days": body.validity_days, + } + ) + return validation + + +@router.get("/validations", response_model=list[DetectionValidationOut]) +def list_validations( + asset_id: Optional[UUID] = None, + technique_id: Optional[UUID] = None, + is_valid: Optional[bool] = None, + db: Session = Depends(get_db), user=Depends(get_current_user), +): + query = db.query(DetectionValidation) + if asset_id: + query = query.filter(DetectionValidation.detection_asset_id == asset_id) + if technique_id: + query = query.filter(DetectionValidation.technique_id == technique_id) + if is_valid is not None: + query = query.filter(DetectionValidation.is_valid == is_valid) + return query.order_by(DetectionValidation.validated_at.desc()).all() + + +@router.post("/validations/{validation_id}/invalidate") +def invalidate_validation( + validation_id: UUID, + reason: str = Query(...), + details: Optional[str] = None, + db: Session = Depends(get_db), + user=Depends(require_role("admin", "blue_lead")), +): + validation = db.query(DetectionValidation).filter( + DetectionValidation.id == validation_id + ).first() + if not validation: + raise EntityNotFoundError("DetectionValidation", str(validation_id)) + + validation.is_valid = False + validation.invalidated_at = datetime.utcnow() + validation.invalidation_reason = reason + validation.invalidation_details = details + validation.invalidated_by = user.id + db.commit() + return {"message": "Validation invalidated"} + + +# — Confidence Scores — + +@router.get("/confidence", response_model=list[TechniqueConfidenceOut]) +def list_confidence_scores( + confidence_level: Optional[str] = None, + min_score: Optional[float] = None, + max_score: Optional[float] = None, + db: Session = Depends(get_db), user=Depends(get_current_user), +): + query = db.query(TechniqueConfidenceScore) + if confidence_level: + query = query.filter(TechniqueConfidenceScore.confidence_level == confidence_level) + if min_score is not None: + query = query.filter(TechniqueConfidenceScore.confidence_score >= min_score) + if max_score is not None: + query = query.filter(TechniqueConfidenceScore.confidence_score <= max_score) + return query.order_by(TechniqueConfidenceScore.confidence_score.asc()).all() + + +@router.get("/confidence/{technique_id}", response_model=TechniqueConfidenceOut) +def get_technique_confidence( + technique_id: UUID, recalculate: bool = Query(False), + db: Session = Depends(get_db), user=Depends(get_current_user), +): + if recalculate: + return decay_engine_service.calculate_confidence_for_technique(db, technique_id) + score = db.query(TechniqueConfidenceScore).filter( + TechniqueConfidenceScore.technique_id == technique_id + ).first() + if not score: + return decay_engine_service.calculate_confidence_for_technique(db, technique_id) + return score + + +# — Infrastructure Changes — + +@router.post("/infrastructure-changes", response_model=InfrastructureChangeOut, status_code=201) +def report_infrastructure_change( + body: InfrastructureChangeCreate, + db: Session = Depends(get_db), + user=Depends(require_role("admin", "blue_lead")), +): + change = InfrastructureChangeLog( + change_type=body.change_type, + description=body.description, + affected_platforms=body.affected_platforms, + affected_log_sources=body.affected_log_sources, + change_date=body.change_date or datetime.utcnow(), + auto_invalidate=body.auto_invalidate, + reported_by=user.id, + ) + db.add(change) + db.commit() + db.refresh(change) + + if change.auto_invalidate: + decay_engine_service.process_infrastructure_change(db, change.id) + db.refresh(change) + + audit_service.log_action( + db, user.id, "INFRASTRUCTURE_CHANGE_REPORTED", + "infrastructure_change", str(change.id), + details={"type": body.change_type, "invalidated_count": change.invalidated_count} + ) + return change + + +@router.get("/infrastructure-changes", response_model=list[InfrastructureChangeOut]) +def list_infrastructure_changes( + days: int = Query(90, ge=1, le=730), + db: Session = Depends(get_db), user=Depends(get_current_user), +): + cutoff = datetime.utcnow() - timedelta(days=days) + return ( + db.query(InfrastructureChangeLog) + .filter(InfrastructureChangeLog.change_date >= cutoff) + .order_by(InfrastructureChangeLog.change_date.desc()) + .all() + ) + + +# — Decay Engine Control — + +@router.post("/decay-engine/run") +def trigger_decay_engine( + db: Session = Depends(get_db), user=Depends(require_role("admin")), +): + results = decay_engine_service.run_decay_engine(db) + return {"message": "Decay engine completed", "results": results} + + +# — Dashboard Summary — + +@router.get("/dashboard") +def lifecycle_dashboard( + db: Session = Depends(get_db), user=Depends(get_current_user), +): + """Resumen ejecutivo del estado de detecciones.""" + from sqlalchemy import func + + health_dist = dict( + db.query(DetectionAsset.health_status, func.count(DetectionAsset.id)) + .filter(DetectionAsset.is_active == True) + .group_by(DetectionAsset.health_status) + .all() + ) + + confidence_dist = dict( + db.query(TechniqueConfidenceScore.confidence_level, func.count(TechniqueConfidenceScore.id)) + .group_by(TechniqueConfidenceScore.confidence_level) + .all() + ) + + expiring_soon = ( + db.query(func.count(DetectionValidation.id)) + .filter( + DetectionValidation.is_valid == True, + DetectionValidation.expires_at <= (datetime.utcnow() + timedelta(days=7)), + ) + .scalar() + ) + + total_assets = db.query(func.count(DetectionAsset.id)).filter( + DetectionAsset.is_active == True + ).scalar() + total_valid = db.query(func.count(DetectionValidation.id)).filter( + DetectionValidation.is_valid == True + ).scalar() + + recent_changes = db.query(func.count(InfrastructureChangeLog.id)).filter( + InfrastructureChangeLog.change_date >= (datetime.utcnow() - timedelta(days=30)) + ).scalar() + + return { + "total_detection_assets": total_assets, + "total_valid_validations": total_valid, + "health_distribution": { + k.value if hasattr(k, 'value') else str(k): v for k, v in health_dist.items() + }, + "confidence_distribution": { + k.value if hasattr(k, 'value') else str(k): v for k, v in confidence_dist.items() + }, + "validations_expiring_7d": expiring_soon, + "infrastructure_changes_30d": recent_changes, + } +``` + +Registrar en main.py: +```python +from app.routers.detection_lifecycle import router as detection_lifecycle_router +app.include_router(detection_lifecycle_router, prefix="/api/v1") +``` + +**Verificación:** +- `POST /api/v1/detection-lifecycle/assets` con datos válidos → 201, asset creado con hash +- `POST /api/v1/detection-lifecycle/validations` → validación con `expires_at` correcto +- `GET /api/v1/detection-lifecycle/confidence/{technique_id}?recalculate=true` → score calculado +- `POST /api/v1/detection-lifecycle/infrastructure-changes` con `auto_invalidate=true` → validaciones afectadas invalidadas +- `GET /api/v1/detection-lifecycle/dashboard` → resumen completo con distribuciones +- `POST /api/v1/detection-lifecycle/decay-engine/run` → resultados del decay completo + +--- + +### Tarea 8.5: Seed de Decay Policy por Defecto + +**Qué:** Crear una política por defecto durante el seed de la aplicación. + +**Implementación:** + +En `app/seed.py` o el script de seed existente: +```python +from app.models.decay_policy import DecayPolicy + +def seed_decay_policies(db: Session): + """Crear política de decay por defecto si no existe.""" + existing = db.query(DecayPolicy).filter(DecayPolicy.is_default == True).first() + if existing: + return + + default_policy = DecayPolicy( + name="Default Decay Policy", + description="Standard detection decay: Fresh up to 90 days, Aging 91-180 days, Stale 181-365 days, Broken after 365 days.", + fresh_days=90, + aging_days=180, + stale_days=365, + default_validity_days=180, + silent_threshold_days=30, + noisy_threshold_daily=100, + recency_weight=0.30, + coverage_weight=0.30, + health_weight=0.25, + diversity_weight=0.15, + is_default=True, + is_active=True, + ) + db.add(default_policy) + + critical_policy = DecayPolicy( + name="Critical Techniques Policy", + description="Stricter policy for high-impact tactics: Fresh 60 days, Aging 90 days, Stale 180 days.", + applies_to_tactic="initial-access", + fresh_days=60, + aging_days=90, + stale_days=180, + default_validity_days=90, + silent_threshold_days=14, + noisy_threshold_daily=50, + recency_weight=0.35, + coverage_weight=0.30, + health_weight=0.25, + diversity_weight=0.10, + is_default=False, + is_active=True, + ) + db.add(critical_policy) + db.commit() +``` + +**Verificación:** +- Ejecutar seed → dos políticas creadas en `decay_policies` +- Ejecutar seed de nuevo → no duplica +- La política de initial-access tiene umbrales más estrictos + +--- + +## FASE 9 — Ownership & Operativa Diaria + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fase 8 + +--- + +### Tarea 9.1: Sistema de Ownership + +**Qué:** Cada técnica y regla de detección tiene un owner responsable, backup owner, y equipo. + +**Implementación:** + +1. **Migración Alembic** — Añadir columnas a `techniques`: +```python +def upgrade(): + op.add_column('techniques', Column('owner_id', UUID(as_uuid=True), ForeignKey("users.id"), nullable=True)) + op.add_column('techniques', Column('backup_owner_id', UUID(as_uuid=True), ForeignKey("users.id"), nullable=True)) + op.add_column('techniques', Column('team', String(100), nullable=True)) + op.add_column('techniques', Column('ownership_assigned_at', DateTime, nullable=True)) + op.create_index('ix_techniques_owner_id', 'techniques', ['owner_id']) +``` + +2. **app/services/ownership_service.py** (nuevo): +```python +import logging +from datetime import datetime +from uuid import UUID +from sqlalchemy.orm import Session +from sqlalchemy import func +from app.models.technique import Technique +from app.models.user import User + +logger = logging.getLogger(__name__) + + +def assign_owner(db: Session, technique_id: UUID, owner_id: UUID, + backup_owner_id: UUID = None, team: str = None): + technique = db.query(Technique).filter(Technique.id == technique_id).first() + if not technique: + from app.domain.exceptions import EntityNotFoundError + raise EntityNotFoundError("Technique", str(technique_id)) + + technique.owner_id = owner_id + technique.backup_owner_id = backup_owner_id + technique.team = team + technique.ownership_assigned_at = datetime.utcnow() + db.commit() + return technique + + +def bulk_assign_by_tactic(db: Session, tactic: str, owner_id: UUID, team: str = None): + """Asignar owner a todas las técnicas de una táctica.""" + techniques = db.query(Technique).filter(Technique.tactic == tactic).all() + count = 0 + for t in techniques: + if not t.owner_id: + t.owner_id = owner_id + t.team = team + t.ownership_assigned_at = datetime.utcnow() + count += 1 + db.commit() + return count + + +def get_orphan_techniques(db: Session) -> list: + """Técnicas sin owner asignado.""" + return db.query(Technique).filter(Technique.owner_id.is_(None)).order_by(Technique.mitre_id).all() + + +def get_workload_by_user(db: Session) -> list[dict]: + """Carga de trabajo por usuario.""" + results = ( + db.query(User.username, User.role, func.count(Technique.id).label("technique_count")) + .outerjoin(Technique, Technique.owner_id == User.id) + .group_by(User.id) + .all() + ) + return [{"username": r[0], "role": r[1], "technique_count": r[2]} for r in results] +``` + +3. **app/routers/ownership.py** (nuevo) — Endpoints para asignar owners, ver huérfanos, ver carga. + +**Verificación:** +- Asignar owner a técnica → `owner_id` y `ownership_assigned_at` se actualizan +- `get_orphan_techniques()` retorna técnicas sin owner +- Bulk assign por táctica asigna solo las que no tienen owner +- Workload muestra conteo correcto por usuario + +--- + +### Tarea 9.2: Cola de Revalidación + +**Qué:** Sistema que genera automáticamente una cola priorizada de técnicas/detecciones que necesitan revalidación. + +**Implementación:** + +**app/models/revalidation_queue.py** (nuevo): +```python +import uuid +import enum +from datetime import datetime +from sqlalchemy import ( + Column, String, Integer, DateTime, ForeignKey, + Text, Enum as SQLEnum, Boolean +) +from sqlalchemy.dialects.postgresql import UUID, JSONB +from app.database import Base + + +class RevalidationPriority(str, enum.Enum): + critical = "critical" + high = "high" + medium = "medium" + low = "low" + + +class RevalidationStatus(str, enum.Enum): + pending = "pending" + assigned = "assigned" + in_progress = "in_progress" + completed = "completed" + skipped = "skipped" + + +class RevalidationItem(Base): + __tablename__ = "revalidation_queue" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id"), nullable=False) + detection_asset_id = Column(UUID(as_uuid=True), ForeignKey("detection_assets.id"), nullable=True) + + reason = Column(String(200), nullable=False) + priority = Column(SQLEnum(RevalidationPriority), default=RevalidationPriority.medium) + status = Column(SQLEnum(RevalidationStatus), default=RevalidationStatus.pending) + + assigned_to = Column(UUID(as_uuid=True), ForeignKey("users.id")) + assigned_at = Column(DateTime) + due_date = Column(DateTime) + completed_at = Column(DateTime) + completed_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + result_validation_id = Column(UUID(as_uuid=True), ForeignKey("detection_validations.id")) + + notes = Column(Text) + metadata = Column(JSONB, default={}) + created_at = Column(DateTime, default=datetime.utcnow) +``` + +**app/services/revalidation_service.py** — Genera cola automáticamente basándose en: validaciones expiradas, cambios de infra, coverage stale, updates MITRE. Prioriza por criticidad de la táctica y risk score. + +**Verificación:** +- Ejecutar generación de cola → items creados con prioridad correcta +- Asignar item a usuario → status cambia a "assigned" +- Completar item con nueva validación → status "completed" + `result_validation_id` poblado +- Cola respeta priorización (critical > high > medium > low) + +--- + +### Tarea 9.3: Dashboard del Analista — API + +**Qué:** Endpoint que retorna la vista personalizada del día para cada analista. + +**Implementación:** + +**app/routers/analyst_dashboard.py** (nuevo): +```python +@router.get("/my-dashboard") +def get_my_dashboard( + db: Session = Depends(get_db), user=Depends(get_current_user), +): + """Vista personalizada del día para el analista logueado.""" + now = datetime.utcnow() + + # 1. Mis revalidaciones pendientes + my_revalidations = ( + db.query(RevalidationItem) + .filter( + RevalidationItem.assigned_to == user.id, + RevalidationItem.status.in_(["pending", "assigned", "in_progress"]), + ) + .order_by(RevalidationItem.priority.asc(), RevalidationItem.due_date.asc()) + .limit(10) + .all() + ) + + # 2. Mis técnicas con validaciones que caducan esta semana + my_expiring = ( + db.query(DetectionValidation) + .join(DetectionTechniqueMapping, + DetectionValidation.detection_asset_id == DetectionTechniqueMapping.detection_asset_id) + .join(Technique, DetectionTechniqueMapping.technique_id == Technique.id) + .filter( + Technique.owner_id == user.id, + DetectionValidation.is_valid == True, + DetectionValidation.expires_at <= (now + timedelta(days=7)), + ) + .all() + ) + + # 3. Mis tests pendientes de workflow + my_tests = ( + db.query(Test) + .filter( + Test.created_by == user.id, + Test.state.in_(["draft", "red_executing", "blue_evaluating"]) + ) + .order_by(Test.updated_at.desc()) + .limit(10) + .all() + ) + + # 4. Notificaciones no leídas + unread_notifications = ( + db.query(func.count(Notification.id)) + .filter(Notification.user_id == user.id, Notification.is_read == False) + .scalar() + ) + + # 5. Cambios de infra recientes + recent_changes = ( + db.query(InfrastructureChangeLog) + .filter(InfrastructureChangeLog.change_date >= (now - timedelta(days=7))) + .order_by(InfrastructureChangeLog.change_date.desc()) + .limit(5) + .all() + ) + + return { + "user": {"username": user.username, "role": user.role}, + "date": now.isoformat(), + "revalidations_pending": len(my_revalidations), + "revalidations": [_serialize_revalidation(r) for r in my_revalidations], + "expiring_validations": len(my_expiring), + "active_tests": len(my_tests), + "unread_notifications": unread_notifications, + "recent_infrastructure_changes": len(recent_changes), + } +``` + +**Verificación:** +- `GET /api/v1/analyst/my-dashboard` retorna datos personalizados del usuario logueado +- Revalidaciones aparecen ordenadas por prioridad +- Validaciones expirando muestran solo las del owner logueado +- Dashboard funciona para todos los roles + +--- + +## FASE 10 — Attack Paths & Purple Team Avanzado + +**Duración estimada:** 3-4 semanas +**Dependencias:** Fases 8, 9 + +--- + +### Tarea 10.1: Modelo de Attack Paths + +**Qué:** Permitir definir escenarios de ataque encadenados (Initial Access → Execution → Persistence → Lateral Movement → Exfiltration) como entidades de primer nivel. + +**Implementación:** + +**app/models/attack_path.py** (nuevo): +```python +import uuid +import enum +from datetime import datetime +from sqlalchemy import ( + Column, String, Integer, Float, DateTime, + ForeignKey, Text, Boolean, Enum as SQLEnum +) +from sqlalchemy.dialects.postgresql import UUID, JSONB +from sqlalchemy.orm import relationship +from app.database import Base + + +class AttackPathStatus(str, enum.Enum): + draft = "draft" + ready = "ready" + executing = "executing" + completed = "completed" + archived = "archived" + + +class AttackPath(Base): + """Escenario de ataque encadenado multi-fase.""" + __tablename__ = "attack_paths" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(300), nullable=False) + description = Column(Text) + threat_scenario = Column(Text) + + status = Column(SQLEnum(AttackPathStatus), default=AttackPathStatus.draft) + + threat_actor_id = Column(UUID(as_uuid=True), ForeignKey("threat_actors.id")) + campaign_id = Column(UUID(as_uuid=True), ForeignKey("campaigns.id")) + + # Resultado + detection_rate = Column(Float) + mean_time_to_detect = Column(Float) + furthest_step_reached = Column(Integer, default=0) + + complexity = Column(String(20), default="medium") + target_environment = Column(String(200)) + tags = Column(JSONB, default=[]) + metadata = Column(JSONB, default={}) + + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + started_at = Column(DateTime) + completed_at = Column(DateTime) + created_at = Column(DateTime, default=datetime.utcnow) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) + + steps = relationship("AttackPathStep", back_populates="attack_path", + order_by="AttackPathStep.step_order") + + +class AttackPathStep(Base): + """Un paso individual dentro de un attack path.""" + __tablename__ = "attack_path_steps" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + attack_path_id = Column(UUID(as_uuid=True), + ForeignKey("attack_paths.id", ondelete="CASCADE"), nullable=False) + + step_order = Column(Integer, nullable=False) + name = Column(String(300), nullable=False) + description = Column(Text) + tactic = Column(String(100)) + + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id")) + test_id = Column(UUID(as_uuid=True), ForeignKey("tests.id")) + + procedure = Column(Text) + tools = Column(JSONB, default=[]) + expected_detection = Column(Text) + + # Resultado + executed_at = Column(DateTime) + detected = Column(Boolean) + detection_time_seconds = Column(Integer) + detection_details = Column(Text) + red_notes = Column(Text) + blue_notes = Column(Text) + + events = Column(JSONB, default=[]) + metadata = Column(JSONB, default={}) + + attack_path = relationship("AttackPath", back_populates="steps") +``` + +Services y routers: CRUD completo + endpoint de ejecución colaborativa + auto-generación de informe del ejercicio. + +**Verificación:** +- Crear attack path con 5 pasos → pasos ordenados correctamente +- Ejecutar paso a paso → `detection_rate` se recalcula automáticamente +- `furthest_step_reached` refleja hasta dónde llegó sin detección +- Completar path → genera timeline completa en `events` + +--- + +### Tarea 10.2: Timeline Colaborativa + +**Qué:** Registro en tiempo real de acciones Red/Blue durante un ejercicio, con timestamps para medir tiempos de detección y respuesta. + +Este modelo ya existe parcialmente con los `events` JSONB en `AttackPathStep`. El servicio encapsula la lógica de agregar eventos y calcular métricas temporales. + +--- + +## FASE 11 — Knowledge Management + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fase 8 + +--- + +### Tarea 11.1: Modelo de Playbooks + +**Qué:** Cada técnica puede tener playbooks asociados: cómo atacar, cómo detectar, cómo investigar, cómo responder. + +**Implementación:** + +**app/models/playbook.py** (nuevo): +```python +import uuid +import enum +from datetime import datetime +from sqlalchemy import ( + Column, String, Integer, DateTime, ForeignKey, + Text, Boolean, Enum as SQLEnum +) +from sqlalchemy.dialects.postgresql import UUID, JSONB +from app.database import Base + + +class PlaybookType(str, enum.Enum): + attack = "attack" + detect = "detect" + investigate = "investigate" + respond = "respond" + hunt = "hunt" + + +class Playbook(Base): + __tablename__ = "playbooks" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id"), + nullable=False, index=True) + playbook_type = Column(SQLEnum(PlaybookType), nullable=False) + + title = Column(String(300), nullable=False) + content_markdown = Column(Text, nullable=False) + + version = Column(Integer, default=1) + is_published = Column(Boolean, default=False) + last_reviewed_at = Column(DateTime) + last_reviewed_by = Column(UUID(as_uuid=True), ForeignKey("users.id")) + + difficulty = Column(String(20)) + estimated_time_minutes = Column(Integer) + tools_required = Column(JSONB, default=[]) + prerequisites = Column(JSONB, default=[]) + references = Column(JSONB, default=[]) + tags = Column(JSONB, default=[]) + + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + created_at = Column(DateTime, default=datetime.utcnow) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow) + + +class LessonLearned(Base): + """Registro inmutable de lecciones aprendidas.""" + __tablename__ = "lessons_learned" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + technique_id = Column(UUID(as_uuid=True), ForeignKey("techniques.id"), index=True) + test_id = Column(UUID(as_uuid=True), ForeignKey("tests.id")) + attack_path_id = Column(UUID(as_uuid=True), ForeignKey("attack_paths.id")) + campaign_id = Column(UUID(as_uuid=True), ForeignKey("campaigns.id")) + + title = Column(String(300), nullable=False) + what_happened = Column(Text, nullable=False) + what_failed = Column(Text) + root_cause = Column(Text) + how_fixed = Column(Text) + recommendations = Column(Text) + impact = Column(String(20)) + + tags = Column(JSONB, default=[]) + created_by = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + created_at = Column(DateTime, default=datetime.utcnow) +``` + +Routers y servicios: CRUD con soporte Markdown, búsqueda fulltext, versionado. + +**Verificación:** +- Crear playbook tipo "detect" para T1059 → asociado correctamente +- Editar contenido → `version` incrementa, contenido anterior preservado en audit log +- Buscar playbooks por técnica retorna todos los tipos +- Lessons learned vinculadas a test/campaign correctamente + +--- + +## FASE 12 — Risk Intelligence & Recomendaciones + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fases 4 (OsintItem), 8, 9 + +--- + +### Tarea 12.1: Risk Score Compuesto por Técnica + +**Qué:** Score de riesgo operativo real que combina: explotabilidad, frecuencia en amenazas reales, tiempo sin validar, cobertura de detección, y severidad. + +**Implementación:** + +**app/services/risk_scoring_service.py** (nuevo): +```python +def calculate_technique_risk(db: Session, technique_id: UUID) -> dict: + """Calcula un risk score multidimensional por técnica.""" + technique = db.query(Technique).filter(Technique.id == technique_id).first() + if not technique: + return None + + # 1. EXPLOITABILITY (0-100) + templates = db.query(func.count(TestTemplate.id)).filter( + TestTemplate.technique_id == technique_id + ).scalar() or 0 + osint_count = db.query(func.count(OsintItem.id)).filter( + OsintItem.technique_id == technique_id + ).scalar() or 0 + exploitability = min(100, templates * 15 + osint_count * 10) + + # 2. THREAT_FREQUENCY (0-100) + actor_count = db.query(func.count(ThreatActorTechnique.id)).filter( + ThreatActorTechnique.technique_id == technique_id + ).scalar() or 0 + threat_frequency = min(100, actor_count * 20) + + # 3. DETECTION_GAP (0-100): inverso del confidence score + confidence = db.query(TechniqueConfidenceScore).filter( + TechniqueConfidenceScore.technique_id == technique_id + ).first() + confidence_score = confidence.confidence_score if confidence else 0 + detection_gap = 100 - confidence_score + + # 4. STALENESS (0-100) + last_validated = confidence.last_validated_at if confidence else None + if not last_validated: + staleness = 100 + else: + days = (datetime.utcnow() - last_validated).days + staleness = min(100, days / 3.65) + + # 5. SEVERITY (0-100): basado en táctica + tactic_severity = { + "initial-access": 90, "execution": 85, + "persistence": 80, "privilege-escalation": 85, + "defense-evasion": 90, "credential-access": 85, + "discovery": 50, "lateral-movement": 80, + "collection": 60, "command-and-control": 75, + "exfiltration": 90, "impact": 95, + "resource-development": 40, "reconnaissance": 35, + } + severity = tactic_severity.get(technique.tactic, 50) + + # RISK SCORE COMPUESTO + risk_score = ( + exploitability * 0.15 + + threat_frequency * 0.25 + + detection_gap * 0.25 + + staleness * 0.15 + + severity * 0.20 + ) + + return { + "technique_id": str(technique_id), + "mitre_id": technique.mitre_id, + "risk_score": round(risk_score, 1), + "risk_level": ( + "critical" if risk_score >= 80 else + "high" if risk_score >= 60 else + "medium" if risk_score >= 40 else "low" + ), + "factors": { + "exploitability": round(exploitability, 1), + "threat_frequency": round(threat_frequency, 1), + "detection_gap": round(detection_gap, 1), + "staleness": round(staleness, 1), + "severity": severity, + }, + } +``` + +--- + +### Tarea 12.2: Motor de Recomendaciones + +**Qué:** Sistema que sugiere automáticamente acciones prioritarias. + +**Categorías de recomendaciones:** +- Técnicas de alto riesgo sin cobertura +- Técnicas nunca probadas usadas por actores relevantes +- Detecciones silenciosas que necesitan verificación +- Reglas huérfanas sin owner +- Gaps de cobertura por táctica +- Técnicas con muchos CVEs recientes sin validación + +El servicio consulta múltiples fuentes de datos y genera una lista priorizada de recomendaciones con acciones sugeridas. + +--- + +## FASE 13 — Alertas Inteligentes Operativas + +**Duración estimada:** 2-3 semanas +**Dependencias:** Fases 6 (webhooks como canal), 8, 9, 12 + +--- + +### Tarea 13.1: Sistema de Alertas Operativas + +**Qué:** Alertas basadas en reglas configurables que detectan condiciones operativas y notifican proactivamente. + +**Modelo:** +```python +class OperationalAlertRule(Base): + __tablename__ = "operational_alert_rules" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(300), nullable=False) + description = Column(Text) + rule_type = Column(String(100), nullable=False) + condition = Column(JSONB, nullable=False) + severity = Column(String(20), default="medium") + is_active = Column(Boolean, default=True) + notification_channels = Column(JSONB, default=["in_app"]) + # ["in_app", "email", "webhook"] ← usa Fase 6 webhooks y Fase 7 email + last_triggered_at = Column(DateTime) + trigger_count = Column(Integer, default=0) + created_at = Column(DateTime, default=datetime.utcnow) +``` + +**Job evaluador** — Ejecuta cada hora, evalúa todas las reglas activas, genera alertas cuando se cumplen condiciones. + +**Reglas pre-configuradas (seed):** + +| Regla | Condición | Severidad | +|-------|-----------|-----------| +| 3+ técnicas críticas llevan 90 días sin validarse | `stale_critical_techniques: threshold_days=90, min=3` | critical | +| EDR actualizado — revalidaciones pendientes | `edr_update_pending_revalidation` | high | +| Nueva técnica MITRE sin cobertura | `new_mitre_techniques_uncovered` | medium | +| Regresión de cobertura detectada | `coverage_regression: drop_pct=5` | high | +| Ola de validaciones expirando esta semana | `validation_expiry_wave: days=7, min=10` | medium | + +--- + +## FASE 14 — Enterprise Readiness (SSO + API Keys) + +**Duración estimada:** 2 semanas +**Dependencias:** Fase 0 + +--- + +### Tarea 14.1: API Key Management + +**Qué:** Permitir crear API keys para integraciones automatizadas (PowerBI, SOAR, scripts) sin usar JWT de usuario. + +**Modelo:** +```python +class ApiKey(Base): + __tablename__ = "api_keys" + + id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4) + name = Column(String(200), nullable=False) + key_hash = Column(String(64), nullable=False, unique=True) # SHA256 + key_prefix = Column(String(8)) # Primeros 8 chars para identificación visual + user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False) + scopes = Column(JSONB, default=["read"]) + is_active = Column(Boolean, default=True) + last_used_at = Column(DateTime) + expires_at = Column(DateTime) + created_at = Column(DateTime, default=datetime.utcnow) +``` + +**Flujo:** Al crear → generar key aleatoria → mostrar UNA vez al usuario → guardar solo hash. Autenticar via header `X-API-Key`. + +--- + +### Tarea 14.2: SSO/SAML Integration + +**Qué:** Soporte para Single Sign-On vía SAML 2.0 usando `python3-saml`. + +Configuración en `app/config.py` con `SAML_ENABLED`, `SAML_IDP_METADATA_URL`, etc. Endpoint `/auth/saml/login` y `/auth/saml/callback`. + +--- + +## Features Adicionales Recomendadas + +Estas no estaban en los requisitos originales pero beneficiarían la plataforma: + +| # | Feature | Razón | Fase sugerida | +|---|---------|-------|---------------| +| A1 | Dashboard personalizable por rol | CISO ve métricas ejecutivas, Red Tech ve sus tests pendientes | Fase 5 | +| A2 | Import/Export de ATT&CK Navigator layers | Los equipos ya usan Navigator | Fase 2 | +| A3 | Approval workflow para cambios de scoring weights | Evitar que un admin cambie pesos sin supervisión | Fase 3 | +| A4 | Tags y campos custom | Cada organización tiene taxonomías propias | Fase 5 | +| A5 | Bulk operations | Validar/rechazar múltiples tests a la vez, asignar campaña masiva | Fases 5 y 9 | +| A6 | Markdown support en descripciones | Los técnicos quieren formatear procedimientos | Fase 11 | +| A7 | Detection Rule Git Sync | Sincronizar reglas desde repo Git corporativo | Fase 8 | +| A8 | Coverage Heatmap con Confidence overlay | Heatmap muestra estado + confidence como segunda capa | Fase 8 | +| A9 | Automatic Detection Gap → Ticket pipeline | RT rompe algo → auto-crear item en cola → asignar a Blue | Fase 10 | +| A10 | Export/Import ATT&CK Navigator con Confidence | Exportar layer que incluye confidence level | Fase 8 | +| A11 | Comparative Attack Path Results | Comparar mismo path ejecutado en diferentes fechas | Fase 10 | +| A12 | SLA Tracking para Detection Gaps | Medir tiempo desde gap hasta implementación de regla | Fase 13 | + +--- + +## Checklist de Nuevas Dependencias Python +``` +# Fase 0 +redis>=5.0.0 +ruff # dev +pytest # dev + +# Fase 1 +atlassian-python-api>=4.0.0 +tempo-api-python-client>=0.8.0 + +# Fase 2 +weasyprint>=62.0 +docxtpl>=0.18.0 + +# Fase 7 +# smtplib es stdlib — no necesita instalar + +# Fase 11 +markdown>=3.6 +Pygments>=2.18 + +# Fase 14 +python3-saml>=1.16.0 +``` + +--- + +## Checklist de Verificación Global por Fase + +| Fase | Tests Mínimos Requeridos | +|------|-------------------------| +| 0 | Redis ping, token blacklist persistencia, índices en EXPLAIN, excepciones de dominio mapean a HTTP, CI pasa | +| 1 | Jira link CRUD, sync bidireccional, worklog integridad hash, Tempo mock | +| 2 | PDF genera con datos reales, DOCX abre en Word, analytics endpoints retornan JSON plano | +| 3 | Audit log con IP/hash, login attempts registrados, password validation, rate limiting 429 | +| 4 | CVEs importados sin duplicados, stale detection marca técnicas, rate limit NVD respetado | +| 5 | Scoring con recencia funciona, pesos persisten en BD, snapshots con desglose por táctica | +| 6 | Webhook recibe POST con signature HMAC válida, failure_count incrementa | +| 7 | Email enviado con SMTP, skip silencioso sin SMTP, preferencias por usuario | +| 8 | Assets CRUD, hash auto-generation, decay engine run, confidence calculation, infrastructure change invalidation | +| 9 | Ownership assignment, revalidation queue generation, analyst dashboard data correctness | +| 10 | Attack path creation, step execution, timeline recording, detection rate calculation | +| 11 | Playbook CRUD, Markdown rendering, lessons learned linking | +| 12 | Risk score calculation, recommendation generation, ranking correctness | +| 13 | Alert rule evaluation, trigger conditions, notification dispatch via múltiples canales | +| 14 | API key auth flow, SAML login flow | + +--- + +## Instrucciones para Claude Code + +Para cada tarea, Claude Code debe seguir este flujo: + +1. Leer el contexto de la tarea y los archivos existentes relevantes +2. Crear los archivos nuevos indicados +3. Modificar los archivos existentes según las instrucciones +4. Crear migración Alembic si hay cambios de BD +5. Ejecutar `alembic upgrade head` para verificar migración +6. Ejecutar `pytest tests/` para verificar que tests existentes no se rompen +7. Crear tests nuevos para la funcionalidad añadida +8. Verificar según los criterios de verificación de la tarea +9. Commit con mensaje descriptivo: `feat(detection-lifecycle): add decay engine service [FASE-8.3]` + +**Importante:** Cada tarea debe funcionar independientemente antes de pasar a la siguiente. No avanzar si la verificación no pasa. \ No newline at end of file diff --git a/backend/app/routers/detection_rules.py b/backend/app/routers/detection_rules.py index d84c41d..c847db1 100644 --- a/backend/app/routers/detection_rules.py +++ b/backend/app/routers/detection_rules.py @@ -1,31 +1,32 @@ """Detection rules endpoints — listing, filtering, and template association. +Thin HTTP adapter: delegates all query and business logic to detection_rule_service. + Provides endpoints for browsing detection rules, querying rules by technique, and managing the template ↔ detection rule associations. """ -import logging import uuid from typing import Optional -from datetime import datetime -from fastapi import APIRouter, Depends, HTTPException, Query +from fastapi import APIRouter, Depends, Query from pydantic import BaseModel -from sqlalchemy import func from sqlalchemy.orm import Session from app.database import get_db from app.dependencies.auth import get_current_user, require_role, require_any_role from app.models.user import User -from app.models.detection_rule import DetectionRule -from app.models.test_template import TestTemplate -from app.models.test_template_detection_rule import TestTemplateDetectionRule -from app.models.test_detection_result import TestDetectionResult +from app.services.detection_rule_service import ( + list_rules, + get_rules_for_template, + auto_associate_rules, + get_rules_for_test, + evaluate_rule, +) -# --------------------------------------------------------------------------- -# Pydantic schemas for request validation -# --------------------------------------------------------------------------- +# ── Pydantic schemas for request validation ──────────────────────────── + class DetectionRuleEvaluate(BaseModel): """Payload for evaluating a detection rule against a test.""" @@ -34,14 +35,12 @@ class DetectionRuleEvaluate(BaseModel): triggered: Optional[bool] = None notes: Optional[str] = None -logger = logging.getLogger(__name__) router = APIRouter(prefix="/detection-rules", tags=["detection-rules"]) -# --------------------------------------------------------------------------- -# GET /detection-rules — List with filters -# --------------------------------------------------------------------------- +# ── GET /detection-rules — List with filters ─────────────────────────── + @router.get("") def list_detection_rules( @@ -55,54 +54,19 @@ def list_detection_rules( current_user: User = Depends(get_current_user), ): """List detection rules with optional filters and pagination.""" - query = db.query(DetectionRule).filter(DetectionRule.is_active == True) # noqa: E712 - - if technique: - query = query.filter(DetectionRule.mitre_technique_id == technique) - - if source: - query = query.filter(DetectionRule.source == source) - - if severity: - query = query.filter(DetectionRule.severity == severity) - - if search: - from app.utils import escape_like - pattern = f"%{escape_like(search)}%" - query = query.filter( - DetectionRule.title.ilike(pattern) - | DetectionRule.description.ilike(pattern) - ) - - total = query.count() - items = query.order_by(DetectionRule.mitre_technique_id, DetectionRule.title).offset(offset).limit(limit).all() - - return { - "total": total, - "offset": offset, - "limit": limit, - "items": [ - { - "id": str(r.id), - "mitre_technique_id": r.mitre_technique_id, - "title": r.title, - "description": r.description, - "source": r.source, - "source_url": r.source_url, - "rule_format": r.rule_format, - "severity": r.severity, - "platforms": r.platforms or [], - "log_sources": r.log_sources, - "is_active": r.is_active, - } - for r in items - ], - } + return list_rules( + db, + technique=technique, + source=source, + severity=severity, + search=search, + offset=offset, + limit=limit, + ) -# --------------------------------------------------------------------------- -# GET /test-templates/{id}/detection-rules — Rules for a template -# --------------------------------------------------------------------------- +# ── GET /detection-rules/for-template/{template_id} ──────────────────── + @router.get("/for-template/{template_id}") def get_detection_rules_for_template( @@ -111,46 +75,11 @@ def get_detection_rules_for_template( current_user: User = Depends(get_current_user), ): """Get detection rules associated with a test template.""" - template = db.query(TestTemplate).filter(TestTemplate.id == template_id).first() - if not template: - raise HTTPException(status_code=404, detail="Test template not found") - - associations = ( - db.query(TestTemplateDetectionRule) - .filter(TestTemplateDetectionRule.test_template_id == template_id) - .all() - ) - - rules = [] - for assoc in associations: - r = assoc.detection_rule - rules.append({ - "id": str(r.id), - "mitre_technique_id": r.mitre_technique_id, - "title": r.title, - "description": r.description, - "source": r.source, - "source_url": r.source_url, - "rule_content": r.rule_content, - "rule_format": r.rule_format, - "severity": r.severity, - "platforms": r.platforms or [], - "log_sources": r.log_sources, - "is_primary": assoc.is_primary, - }) - - return { - "template_id": str(template.id), - "template_name": template.name, - "mitre_technique_id": template.mitre_technique_id, - "rules": rules, - "total": len(rules), - } + return get_rules_for_template(db, template_id) -# --------------------------------------------------------------------------- -# POST /detection-rules/auto-associate — Auto-link templates ↔ rules -# --------------------------------------------------------------------------- +# ── POST /detection-rules/auto-associate ──────────────────────────────── + @router.post("/auto-associate") def auto_associate_detection_rules( @@ -163,60 +92,11 @@ def auto_associate_detection_rules( technique and create associations. Rules with severity >= high are marked as primary. """ - templates = db.query(TestTemplate).filter(TestTemplate.is_active == True).all() # noqa: E712 - rules = db.query(DetectionRule).filter(DetectionRule.is_active == True).all() # noqa: E712 - - # Index rules by technique - rules_by_technique: dict[str, list] = {} - for rule in rules: - tid = rule.mitre_technique_id - if tid not in rules_by_technique: - rules_by_technique[tid] = [] - rules_by_technique[tid].append(rule) - - created = 0 - skipped = 0 - high_severities = {"high", "critical"} - - for template in templates: - matching_rules = rules_by_technique.get(template.mitre_technique_id, []) - for rule in matching_rules: - # Check if association already exists - existing = ( - db.query(TestTemplateDetectionRule) - .filter( - TestTemplateDetectionRule.test_template_id == template.id, - TestTemplateDetectionRule.detection_rule_id == rule.id, - ) - .first() - ) - if existing: - skipped += 1 - continue - - is_primary = (rule.severity or "").lower() in high_severities - - assoc = TestTemplateDetectionRule( - test_template_id=template.id, - detection_rule_id=rule.id, - is_primary=is_primary, - ) - db.add(assoc) - created += 1 - - db.commit() - - total = db.query(TestTemplateDetectionRule).count() - return { - "created": created, - "skipped": skipped, - "total_associations": total, - } + return auto_associate_rules(db) -# --------------------------------------------------------------------------- -# GET /detection-rules/for-test/{test_id} — Rules + results for a test -# --------------------------------------------------------------------------- +# ── GET /detection-rules/for-test/{test_id} ────────────────────────────── + @router.get("/for-test/{test_id}") def get_detection_rules_for_test( @@ -229,83 +109,11 @@ def get_detection_rules_for_test( Finds rules by matching the test's technique_id to detection rules, and returns any existing evaluation results. """ - from app.models.test import Test - from app.models.technique import Technique - - test = db.query(Test).filter(Test.id == test_id).first() - if not test: - raise HTTPException(status_code=404, detail="Test not found") - - technique = db.query(Technique).filter(Technique.id == test.technique_id).first() - if not technique: - raise HTTPException(status_code=404, detail="Technique not found") - - # Get detection rules for this technique - rules = ( - db.query(DetectionRule) - .filter( - DetectionRule.mitre_technique_id == technique.mitre_id, - DetectionRule.is_active == True, # noqa: E712 - ) - .order_by(DetectionRule.severity.desc(), DetectionRule.title) - .all() - ) - - # Get existing results for this test - existing_results = ( - db.query(TestDetectionResult) - .filter(TestDetectionResult.test_id == test_id) - .all() - ) - results_map = {str(r.detection_rule_id): r for r in existing_results} - - items = [] - triggered_count = 0 - evaluated_count = 0 - - for rule in rules: - result = results_map.get(str(rule.id)) - triggered = result.triggered if result else None - notes = result.notes if result else None - evaluated_at = result.evaluated_at.isoformat() if result and result.evaluated_at else None - - if triggered is not None: - evaluated_count += 1 - if triggered: - triggered_count += 1 - - items.append({ - "id": str(rule.id), - "mitre_technique_id": rule.mitre_technique_id, - "title": rule.title, - "description": rule.description, - "source": rule.source, - "source_url": rule.source_url, - "rule_content": rule.rule_content, - "rule_format": rule.rule_format, - "severity": rule.severity, - "platforms": rule.platforms or [], - "log_sources": rule.log_sources, - "triggered": triggered, - "notes": notes, - "evaluated_at": evaluated_at, - "result_id": str(result.id) if result else None, - }) - - return { - "test_id": str(test.id), - "mitre_technique_id": technique.mitre_id, - "rules": items, - "total": len(items), - "evaluated": evaluated_count, - "triggered": triggered_count, - "detection_rate": round(triggered_count / evaluated_count * 100, 1) if evaluated_count > 0 else 0, - } + return get_rules_for_test(db, test_id) -# --------------------------------------------------------------------------- -# POST /detection-rules/evaluate — Save detection result for a rule -# --------------------------------------------------------------------------- +# ── POST /detection-rules/evaluate ────────────────────────────────────── + @router.post("/evaluate") def evaluate_detection_rule( @@ -314,60 +122,11 @@ def evaluate_detection_rule( current_user: User = Depends(require_any_role("blue_tech", "blue_lead")), ): """Save or update the evaluation result for a detection rule on a test.""" - test_id = payload.test_id - detection_rule_id = payload.detection_rule_id - triggered = payload.triggered - notes = payload.notes - - # Check test exists - from app.models.test import Test - test = db.query(Test).filter(Test.id == test_id).first() - if not test: - raise HTTPException(status_code=404, detail="Test not found") - - # Check rule exists - rule = db.query(DetectionRule).filter(DetectionRule.id == detection_rule_id).first() - if not rule: - raise HTTPException(status_code=404, detail="Detection rule not found") - - # Upsert result - existing = ( - db.query(TestDetectionResult) - .filter( - TestDetectionResult.test_id == test_id, - TestDetectionResult.detection_rule_id == detection_rule_id, - ) - .first() + return evaluate_rule( + db, + test_id=payload.test_id, + detection_rule_id=payload.detection_rule_id, + triggered=payload.triggered, + notes=payload.notes, + evaluator_id=current_user.id, ) - - if existing: - existing.triggered = triggered - existing.notes = notes - existing.evaluated_by = current_user.id - existing.evaluated_at = datetime.utcnow() - db.commit() - db.refresh(existing) - return { - "id": str(existing.id), - "triggered": existing.triggered, - "notes": existing.notes, - "evaluated_at": existing.evaluated_at.isoformat() if existing.evaluated_at else None, - } - else: - result = TestDetectionResult( - test_id=test_id, - detection_rule_id=detection_rule_id, - triggered=triggered, - notes=notes, - evaluated_by=current_user.id, - evaluated_at=datetime.utcnow(), - ) - db.add(result) - db.commit() - db.refresh(result) - return { - "id": str(result.id), - "triggered": result.triggered, - "notes": result.notes, - "evaluated_at": result.evaluated_at.isoformat() if result.evaluated_at else None, - } diff --git a/backend/app/services/detection_rule_service.py b/backend/app/services/detection_rule_service.py new file mode 100644 index 0000000..18c4d80 --- /dev/null +++ b/backend/app/services/detection_rule_service.py @@ -0,0 +1,319 @@ +"""Detection rule data service. + +Extracts query and business logic from the detection_rules router so +that the router remains a thin HTTP adapter. + +This module is framework-agnostic: no FastAPI imports. +""" + +from __future__ import annotations + +from datetime import datetime +from typing import Any + +from sqlalchemy.orm import Session + +from app.domain.errors import EntityNotFoundError +from app.models.detection_rule import DetectionRule +from app.models.test import Test +from app.models.test_template import TestTemplate +from app.models.test_template_detection_rule import TestTemplateDetectionRule +from app.models.test_detection_result import TestDetectionResult +from app.models.technique import Technique +from app.utils import escape_like + + +# ── Public service functions ────────────────────────────────────────── + + +def list_rules( + db: Session, + *, + technique: str | None = None, + source: str | None = None, + severity: str | None = None, + search: str | None = None, + offset: int = 0, + limit: int = 50, +) -> dict[str, Any]: + """List detection rules with optional filters and pagination.""" + query = db.query(DetectionRule).filter(DetectionRule.is_active == True) + + if technique: + query = query.filter(DetectionRule.mitre_technique_id == technique) + if source: + query = query.filter(DetectionRule.source == source) + if severity: + query = query.filter(DetectionRule.severity == severity) + if search: + pattern = f"%{escape_like(search)}%" + query = query.filter( + DetectionRule.title.ilike(pattern) + | DetectionRule.description.ilike(pattern) + ) + + total = query.count() + items = ( + query.order_by(DetectionRule.mitre_technique_id, DetectionRule.title) + .offset(offset) + .limit(limit) + .all() + ) + + return { + "total": total, + "offset": offset, + "limit": limit, + "items": [ + { + "id": str(r.id), + "mitre_technique_id": r.mitre_technique_id, + "title": r.title, + "description": r.description, + "source": r.source, + "source_url": r.source_url, + "rule_format": r.rule_format, + "severity": r.severity, + "platforms": r.platforms or [], + "log_sources": r.log_sources, + "is_active": r.is_active, + } + for r in items + ], + } + + +def get_rules_for_template(db: Session, template_id: str) -> dict[str, Any]: + """Get detection rules associated with a test template. + + Raises EntityNotFoundError if the template does not exist. + """ + template = db.query(TestTemplate).filter(TestTemplate.id == template_id).first() + if not template: + raise EntityNotFoundError("Test template", template_id) + + associations = ( + db.query(TestTemplateDetectionRule) + .filter(TestTemplateDetectionRule.test_template_id == template_id) + .all() + ) + + rules = [] + for assoc in associations: + r = assoc.detection_rule + rules.append({ + "id": str(r.id), + "mitre_technique_id": r.mitre_technique_id, + "title": r.title, + "description": r.description, + "source": r.source, + "source_url": r.source_url, + "rule_content": r.rule_content, + "rule_format": r.rule_format, + "severity": r.severity, + "platforms": r.platforms or [], + "log_sources": r.log_sources, + "is_primary": assoc.is_primary, + }) + + return { + "template_id": str(template.id), + "template_name": template.name, + "mitre_technique_id": template.mitre_technique_id, + "rules": rules, + "total": len(rules), + } + + +def auto_associate_rules(db: Session) -> dict[str, Any]: + """Auto-associate test templates with detection rules by MITRE technique ID. + + For each active template, finds all active detection rules for the same + technique and creates associations. Rules with severity high/critical + are marked as primary. Performs commit internally. + """ + templates = db.query(TestTemplate).filter(TestTemplate.is_active == True).all() + rules = db.query(DetectionRule).filter(DetectionRule.is_active == True).all() + + rules_by_technique: dict[str, list] = {} + for rule in rules: + tid = rule.mitre_technique_id + if tid not in rules_by_technique: + rules_by_technique[tid] = [] + rules_by_technique[tid].append(rule) + + created = 0 + skipped = 0 + high_severities = {"high", "critical"} + + for template in templates: + matching_rules = rules_by_technique.get(template.mitre_technique_id, []) + for rule in matching_rules: + existing = ( + db.query(TestTemplateDetectionRule) + .filter( + TestTemplateDetectionRule.test_template_id == template.id, + TestTemplateDetectionRule.detection_rule_id == rule.id, + ) + .first() + ) + if existing: + skipped += 1 + continue + + is_primary = (rule.severity or "").lower() in high_severities + + assoc = TestTemplateDetectionRule( + test_template_id=template.id, + detection_rule_id=rule.id, + is_primary=is_primary, + ) + db.add(assoc) + created += 1 + + db.commit() + + total = db.query(TestTemplateDetectionRule).count() + return { + "created": created, + "skipped": skipped, + "total_associations": total, + } + + +def get_rules_for_test(db: Session, test_id: str) -> dict[str, Any]: + """Get detection rules relevant to a test, along with their evaluation results. + + Finds rules by matching the test's technique to detection rules. + Raises EntityNotFoundError if the test or its technique does not exist. + """ + test = db.query(Test).filter(Test.id == test_id).first() + if not test: + raise EntityNotFoundError("Test", str(test_id)) + + technique = db.query(Technique).filter(Technique.id == test.technique_id).first() + if not technique: + raise EntityNotFoundError("Technique", str(test.technique_id)) + + rules = ( + db.query(DetectionRule) + .filter( + DetectionRule.mitre_technique_id == technique.mitre_id, + DetectionRule.is_active == True, + ) + .order_by(DetectionRule.severity.desc(), DetectionRule.title) + .all() + ) + + existing_results = ( + db.query(TestDetectionResult) + .filter(TestDetectionResult.test_id == test_id) + .all() + ) + results_map = {str(r.detection_rule_id): r for r in existing_results} + + items = [] + triggered_count = 0 + evaluated_count = 0 + + for rule in rules: + result = results_map.get(str(rule.id)) + triggered = result.triggered if result else None + notes = result.notes if result else None + evaluated_at = result.evaluated_at.isoformat() if result and result.evaluated_at else None + + if triggered is not None: + evaluated_count += 1 + if triggered: + triggered_count += 1 + + items.append({ + "id": str(rule.id), + "mitre_technique_id": rule.mitre_technique_id, + "title": rule.title, + "description": rule.description, + "source": rule.source, + "source_url": rule.source_url, + "rule_content": rule.rule_content, + "rule_format": rule.rule_format, + "severity": rule.severity, + "platforms": rule.platforms or [], + "log_sources": rule.log_sources, + "triggered": triggered, + "notes": notes, + "evaluated_at": evaluated_at, + "result_id": str(result.id) if result else None, + }) + + return { + "test_id": str(test.id), + "mitre_technique_id": technique.mitre_id, + "rules": items, + "total": len(items), + "evaluated": evaluated_count, + "triggered": triggered_count, + "detection_rate": round(triggered_count / evaluated_count * 100, 1) if evaluated_count > 0 else 0, + } + + +def evaluate_rule( + db: Session, + *, + test_id: Any, + detection_rule_id: Any, + triggered: bool | None, + notes: str | None, + evaluator_id: Any, +) -> dict[str, Any]: + """Save or update the evaluation result for a detection rule on a test. + + Raises EntityNotFoundError if the test or detection rule does not exist. + """ + test = db.query(Test).filter(Test.id == test_id).first() + if not test: + raise EntityNotFoundError("Test", str(test_id)) + + rule = db.query(DetectionRule).filter(DetectionRule.id == detection_rule_id).first() + if not rule: + raise EntityNotFoundError("Detection rule", str(detection_rule_id)) + + existing = ( + db.query(TestDetectionResult) + .filter( + TestDetectionResult.test_id == test_id, + TestDetectionResult.detection_rule_id == detection_rule_id, + ) + .first() + ) + + if existing: + existing.triggered = triggered + existing.notes = notes + existing.evaluated_by = evaluator_id + existing.evaluated_at = datetime.utcnow() + db.commit() + db.refresh(existing) + return { + "id": str(existing.id), + "triggered": existing.triggered, + "notes": existing.notes, + "evaluated_at": existing.evaluated_at.isoformat() if existing.evaluated_at else None, + } + else: + result = TestDetectionResult( + test_id=test_id, + detection_rule_id=detection_rule_id, + triggered=triggered, + notes=notes, + evaluated_by=evaluator_id, + evaluated_at=datetime.utcnow(), + ) + db.add(result) + db.commit() + db.refresh(result) + return { + "id": str(result.id), + "triggered": result.triggered, + "notes": result.notes, + "evaluated_at": result.evaluated_at.isoformat() if result.evaluated_at else None, + } diff --git a/docs/ADR.md b/docs/ADR.md new file mode 100644 index 0000000..9307de9 --- /dev/null +++ b/docs/ADR.md @@ -0,0 +1,394 @@ +# Aegis — Architecture Decision Records (ADR) + +> **Date:** February 11, 2026 +> **Status:** All decisions are **Accepted** and currently in effect. + +--- + +## Index + +| ADR | Title | Status | +|-----|-------|--------| +| [ADR-001](#adr-001-fastapi-as-backend-framework) | FastAPI as Backend Framework | Accepted | +| [ADR-002](#adr-002-postgresql-with-jsonb-as-primary-database) | PostgreSQL with JSONB as Primary Database | Accepted | +| [ADR-003](#adr-003-minio-for-evidence-storage) | MinIO for Evidence Storage | Accepted | +| [ADR-004](#adr-004-docker-compose-for-deployment) | Docker Compose for Deployment | Accepted | +| [ADR-005](#adr-005-modular-monolith-over-microservices) | Modular Monolith over Microservices | Accepted | +| [ADR-006](#adr-006-apscheduler-in-process-over-external-job-system) | APScheduler In-Process over External Job System | Accepted | + +--- + +## ADR-001: FastAPI as Backend Framework + +**Date:** Project inception +**Status:** Accepted + +### Context + +Aegis is an internal security platform for managing MITRE ATT&CK coverage through Red/Blue team validation workflows. The backend must: + +- Expose a REST API consumed by a React SPA (21 pages, 80+ endpoints). +- Handle CRUD operations for 18+ domain entities with complex filtering and joins. +- Support file uploads (evidence) and streaming downloads (CSV/JSON exports). +- Integrate with external APIs (MITRE TAXII 2.0, GitHub REST, D3FEND REST). +- Enforce RBAC authorization across 6 roles. +- Be developed and maintained by a small team requiring fast iteration. +- Run in a containerized environment with Python as the team's primary language. + +### Decision + +We chose **FastAPI** as the backend framework, served by **Uvicorn** (ASGI). + +Key factors: +- **Automatic OpenAPI/Swagger** generation from type hints reduces documentation burden for 80+ endpoints. +- **Pydantic integration** provides request/response validation with zero boilerplate, critical for a schema-heavy domain (test workflows, scoring payloads, compliance data). +- **`Depends()` system** provides clean dependency injection for auth, DB sessions, and role checks without a third-party DI container. +- **Async-capable** but allows synchronous route handlers, which matters because SQLAlchemy (sync) is the ORM and all external data imports are CPU/IO-bound synchronous operations. +- **Performance** is sufficient for an internal tool (< 100 concurrent users) without needing Go/Rust-level throughput. +- **Python ecosystem** gives direct access to `taxii2-client`, `pySigma`, `boto3`, `PyYAML`, and `toml` — all required for the 8 external data source integrations. + +### Consequences + +**Positive:** +- Swagger UI available in development (`/docs`) for rapid API exploration and testing. +- Pydantic schemas act as living documentation for the API contract. +- `Depends()` chain for `get_db` → `get_current_user` → `require_role()` is concise and composable. +- `python-jose` + `passlib` integrate naturally for JWT/bcrypt auth. +- SlowAPI integrates directly with FastAPI for rate limiting. + +**Negative:** +- The `Depends()` system encourages passing `db: Session` directly into route handlers, which has led to routers containing raw SQLAlchemy queries instead of delegating to a service/repository layer (see ADR analysis — 11 of 21 routers query the DB directly). +- Synchronous route handlers block the event loop when performing long operations (MITRE sync ZIP downloads can take 30+ seconds), mitigated by Nginx proxy timeout of 300s. +- No built-in background task system beyond `BackgroundTasks` (which is request-scoped), requiring APScheduler for scheduled jobs (see ADR-006). + +**Risks:** +- FastAPI's ease of putting logic in route handlers has contributed to "fat controllers" — this is a developer discipline issue, not a framework limitation. + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **Django + DRF** | Heavier ORM opinions, admin panel unnecessary, slower startup. Django's ORM lacks SQLAlchemy's flexibility with JSONB and complex joins. | +| **Flask + Flask-RESTful** | No built-in validation, no auto-generated OpenAPI, manual Swagger setup. Would require marshmallow or similar for schema validation. | +| **Go (Gin/Echo)** | Team's primary expertise is Python. The 8 data source integrations rely heavily on Python libraries (pySigma, taxii2-client, PyYAML). | +| **NestJS (Node.js)** | Would split the team across two runtimes. Python libraries for STIX/TAXII and Sigma rule parsing have no mature Node.js equivalents. | + +--- + +## ADR-002: PostgreSQL with JSONB as Primary Database + +**Date:** Project inception +**Status:** Accepted + +### Context + +Aegis manages a complex relational domain: techniques have tests, tests belong to campaigns, threat actors map to techniques, compliance controls map to techniques, detection rules map to techniques and tests. This is a deeply relational model with 18+ tables and many-to-many relationships. + +However, several entities also carry semi-structured data that varies by source: +- **Audit logs** — `details` field contains arbitrary action metadata (different structure per action type). +- **Threat actors** — `aliases`, `target_sectors`, `target_regions`, `references` are variable-length arrays/objects from STIX 2.0 bundles. +- **Detection rules** — `platforms` (array), `log_sources` (object with varying keys like `product`, `service`, `category`). +- **Data sources** — `last_sync_stats` (object with import-specific counters), `config` (source-specific configuration). +- **Techniques** — `platforms` (array of OS names from ATT&CK). +- **Campaigns** — `tags` (user-defined array). + +This data is imported from external sources with varying schemas (STIX JSON, Sigma YAML, Elastic TOML) and must be stored without rigid column definitions. + +### Decision + +We chose **PostgreSQL 15** as the primary database, using its native **JSONB** column type for semi-structured fields alongside traditional relational columns for the core domain. + +The schema is managed by **Alembic** (18 migration versions) with **SQLAlchemy** ORM using `sqlalchemy.dialects.postgresql.JSONB`. + +### Consequences + +**Positive:** +- Relational integrity enforced with foreign keys for the core domain (test → technique, campaign → test, evidence → test, etc.). +- JSONB columns store variable-structure data without schema migrations when external sources change their format. +- JSONB supports GIN indexing for efficient containment queries (`@>` operator) on arrays like `platforms` and `target_sectors`. +- Single database to operate — no need for a separate document store. +- PostgreSQL's mature ecosystem: `pg_dump` for backups, `pg_isready` for health checks, extensive monitoring tooling. +- SQLAlchemy's `JSONB` type allows Python dict/list access with full query support. + +**Negative:** +- JSONB fields bypass ORM-level validation — the schema for `details`, `config`, `references` etc. is only enforced by application code (Pydantic schemas on input), not by the database. +- Complex queries mixing relational joins with JSONB containment can be harder to optimize and debug. +- No GIN indexes are currently defined in migrations for JSONB columns, meaning array containment queries may perform full scans on large datasets. +- JSONB fields in audit logs make structured querying across action types difficult (e.g., "find all audit entries where details.old_state = 'draft'"). + +**Risks:** +- As JSONB usage grows, the boundary between "should be a column" and "should be JSONB" can blur. Currently well-contained to arrays and metadata fields. + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **PostgreSQL without JSONB** | Would require separate junction tables for every array field (technique_platforms, actor_aliases, actor_sectors, etc.), adding 10+ tables for data that is always read as a whole array. | +| **MongoDB** | The core domain is deeply relational (techniques ↔ tests ↔ campaigns ↔ threat actors). Modeling this in MongoDB would require denormalization, embedded documents, or manual reference integrity — trading JSONB flexibility for relational integrity loss. | +| **PostgreSQL + MongoDB (dual)** | Operational complexity of two database systems is unjustified for the current JSONB usage (~12 columns across 6 tables). | +| **MySQL 8 with JSON** | PostgreSQL's JSONB is binary-indexed and faster for containment queries. MySQL's JSON type is text-based with function-based indexing. PostgreSQL also has superior support for UUID primary keys (native type vs BINARY(16)). | + +--- + +## ADR-003: MinIO for Evidence Storage + +**Date:** Project inception +**Status:** Accepted + +### Context + +The Red/Blue team validation workflow requires both teams to upload evidence files (screenshots, log files, PCAPs, documents) to support their test findings. Requirements: + +- Files range from small screenshots (KB) to large PCAPs (hundreds of MB). +- Files must be associated with specific tests and teams (red/blue). +- Files must be downloadable by authorized users via the browser. +- Storage must be independent from the application database (no BLOBs in PostgreSQL). +- The platform is deployed on-premise via Docker Compose — cloud-native S3 is not available. +- The upload/download API must be simple and well-supported in Python. + +### Decision + +We chose **MinIO** as an S3-compatible object storage system, accessed via **boto3** (AWS S3 SDK for Python). + +Implementation details: +- A single `evidence` bucket is auto-created on backend startup (`ensure_bucket_exists()`). +- Files are uploaded with `put_object()` using a generated UUID-based key. +- Downloads use presigned URLs (`generate_presigned_url()`) with 1-hour expiration. +- The MinIO client is a module-level singleton in `storage.py`. +- Evidence metadata (filename, MIME type, size, team, test association) is stored in PostgreSQL; only the binary content lives in MinIO. + +### Consequences + +**Positive:** +- S3-compatible API means zero code changes if migrating to AWS S3, GCS, or any S3-compatible service. +- boto3 is the most mature and well-documented S3 client library in Python. +- Presigned URLs offload download bandwidth from the backend — the browser fetches directly from MinIO. +- Binary data stays out of PostgreSQL, keeping the database lean and backups fast. +- MinIO runs as a single Docker container with a persistent volume — simple to deploy and back up. +- MinIO Console (port 9001) provides a web UI for administrators to inspect stored files. + +**Negative:** +- Presigned URLs currently point to `minio:9000` (Docker internal hostname), which is not accessible from the browser in production without additional Nginx configuration or a public MinIO endpoint. +- No file virus scanning or content validation before storage. +- No lifecycle policies configured (no automatic deletion of old evidence). +- The module-level singleton client means the MinIO connection configuration cannot be changed at runtime (acceptable for the current deployment model). + +**Risks:** +- If MinIO container is lost and the volume is not backed up, all evidence files are permanently lost. Evidence metadata in PostgreSQL would reference non-existent files. + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **PostgreSQL BYTEA/BLOB** | Storing binary files in the database bloats backups, degrades query performance, and makes streaming large files complex. PostgreSQL is not designed as a file store. | +| **Local filesystem** | Not portable across container restarts without host volume mounts. No presigned URL support, requiring the backend to proxy all downloads. No built-in replication or management UI. | +| **AWS S3** | Requires cloud account and internet connectivity. The platform is designed for on-premise deployment where external cloud services may not be permitted. | +| **SeaweedFS** | Less mature ecosystem, smaller community. The S3-compatible layer is less complete than MinIO's. boto3 compatibility is not guaranteed. | + +--- + +## ADR-004: Docker Compose for Deployment + +**Date:** Project inception +**Status:** Accepted + +### Context + +Aegis is a multi-component platform deployed on-premise within organizations' security environments: + +- 4 services: Frontend (Nginx), Backend (Uvicorn), PostgreSQL, MinIO. +- Target environments range from a single server to small clusters. +- Security teams typically have Docker available but may not have Kubernetes. +- The platform must be installable by a security engineer (not necessarily a DevOps specialist). +- Both development and production environments should use the same orchestration approach for consistency. + +### Decision + +We chose **Docker Compose** as the deployment and orchestration tool, with two compose files: + +- `docker-compose.yml` — Development: source volumes mounted, dev servers, exposed ports. +- `docker-compose.prod.yml` — Production: multi-stage builds, Nginx serving static assets, only frontend port exposed, `SECRET_KEY` required. + +Supporting infrastructure: +- `scripts/install.sh` — Interactive production installer that generates secrets, prompts for configuration, writes `.env`, and runs `docker compose up -d --build`. +- `scripts/init.sh` — Development setup that waits for services, runs migrations, and seeds data. +- All services connected via a `aegis-network` bridge network. +- Named volumes for PostgreSQL and MinIO data persistence. +- Health checks on PostgreSQL (`pg_isready`) and backend (`/health`). +- Service dependency ordering: backend waits for `postgres: service_healthy` and `minio: service_started`. + +### Consequences + +**Positive:** +- Single-command deployment: `docker compose -f docker-compose.prod.yml up -d --build`. +- The `install.sh` wizard makes production setup accessible to non-DevOps personnel. +- Consistent environments between development and production (same containers, same network topology). +- Named volumes survive container rebuilds — data persists across upgrades. +- No external dependencies beyond Docker and Docker Compose. +- Multi-stage Dockerfile for frontend produces a minimal Nginx image (~25MB) from a full Node.js build stage. +- Non-root user (`appuser`, UID 1001) in backend Dockerfile follows container security best practices. + +**Negative:** +- No built-in horizontal scaling — running multiple backend instances requires manual Nginx upstream configuration and a shared token blacklist (currently in-memory). +- No rolling deployments — `docker compose up -d --build` causes brief downtime during image rebuilds. +- No built-in secrets management — secrets are in `.env` files on the host filesystem. +- No container orchestration beyond restart policies (`restart: always`). +- No centralized logging — each container logs to its own stdout/stderr. + +**Risks:** +- Single point of failure: if the host machine goes down, all services go down. +- No automated backup strategy — `pg_dump` is documented but not automated. + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **Kubernetes (k8s)** | Significantly higher operational complexity. Requires a cluster, kubectl expertise, Helm charts or manifests, ingress controllers, PVCs. Overkill for a single-server deployment targeting security teams. | +| **Docker Swarm** | Adds orchestration complexity with minimal benefit over Compose for < 5 services. The project does not need multi-node scheduling or service mesh. Swarm's future is uncertain compared to Compose V2. | +| **Bare metal / systemd** | Loses containerization benefits (isolation, reproducibility, dependency management). Would require manual installation of Python, Node.js, PostgreSQL, MinIO on each target system. | +| **Ansible + Docker** | Adds a configuration management layer that is unnecessary for a 4-service application. Could be valuable in the future for multi-server deployments but is premature now. | + +--- + +## ADR-005: Modular Monolith over Microservices + +**Date:** Project inception +**Status:** Accepted + +### Context + +Aegis has distinct functional domains that could theoretically be separate services: +- **Test Workflow** — Red/Blue validation state machine, evidence management. +- **Coverage Analytics** — Scoring engine, heatmaps, metrics, reports. +- **Data Import** — 8 external source integrations (MITRE, Sigma, Elastic, CALDERA, etc.). +- **Campaign Management** — Campaign lifecycle, scheduling, threat actor generation. +- **Compliance** — Framework mappings, gap analysis, control tracking. +- **User/Auth** — Authentication, RBAC, audit logging. + +However: +- These domains share the same database and have tight data dependencies (e.g., scoring reads tests, techniques, detection rules, and D3FEND mappings in a single calculation). +- The development team is small. +- The deployment target is single-server Docker Compose. +- Latency between services would complicate the scoring engine (which aggregates across 5+ tables). + +### Decision + +We chose a **modular monolith** architecture: a single deployable backend process organized into internal modules (routers, services, models) rather than separate microservices. + +Module boundaries: +- **Routers** (21 files) — HTTP endpoint definitions grouped by domain. +- **Services** (20 files) — Business logic grouped by capability (workflow, scoring, notifications, imports). +- **Models** (18 files) — ORM entities grouped by domain concept. +- **Schemas** (10 files) — Pydantic DTOs grouped by domain concept. + +All modules share a single database, a single process, and a single deployment artifact. + +### Consequences + +**Positive:** +- No network overhead between domains — scoring can join 5+ tables in a single SQL query. +- Single deployment artifact simplifies CI/CD, monitoring, and debugging. +- Shared database means ACID transactions across domains (e.g., creating a test + logging the audit entry + sending a notification in one commit). +- No service discovery, API gateways, circuit breakers, or distributed tracing needed. +- Faster development iteration — change any module, rebuild one container. + +**Negative:** +- All domains scale together — cannot scale the data import workers independently from the API. +- A bug in one module (e.g., a memory leak in scoring) can crash the entire application. +- Module boundaries are not enforced at the language level — routers currently import services and models freely across domains (e.g., `heatmap.py` imports 6 models from different domains). +- The monolith has grown to 21 routers and 20 services without explicit boundary enforcement, leading to "fat controllers" and cross-cutting concerns. + +**Risks:** +- Without explicit module boundaries (enforced by code structure or linting rules), the modular monolith can degrade into a traditional monolith where everything depends on everything. +- The Clean Architecture refactor proposed in `ARCHITECTURAL_ANALYSIS.md` would restore module boundaries via the domain/application/infrastructure/presentation layers. + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **Microservices** | The 8 data source integrations would each become a service, requiring inter-service communication for writing to the shared technique/rule tables. Scoring would need to call 3-4 services to gather data, adding latency and failure modes. Operational overhead (8+ containers, service mesh, distributed tracing) unjustified for a small team and single-server deployment. | +| **Microservices with shared DB** | Anti-pattern. Multiple services sharing a database lose the main benefit of microservices (independent deployment and schema evolution) while keeping the operational complexity. | +| **Modular monolith with enforced boundaries** | This is the recommended evolution (see ADR analysis). The current implementation has module structure but no boundary enforcement. Adding domain-layer interfaces (Protocol/ABC), a repository pattern, and import linting rules would achieve this without a microservices migration. | + +--- + +## ADR-006: APScheduler In-Process over External Job System + +**Date:** Project inception +**Status:** Accepted + +### Context + +Aegis requires periodic background tasks: + +| Task | Frequency | Duration | Description | +|------|-----------|----------|-------------| +| MITRE ATT&CK sync | Every 24 hours | 30-120 seconds | Download STIX/TAXII feed, upsert ~700 techniques | +| Intel scan | Every 7 days | 10-60 seconds | Scan threat intelligence sources | +| Notification cleanup | Every 24 hours | < 5 seconds | Delete read notifications older than 90 days | +| Coverage snapshot | Weekly (Sunday 00:00) | 5-30 seconds | Capture point-in-time coverage state across all techniques | +| Recurring campaigns | Every 24 hours | < 10 seconds | Check and spawn due recurring test campaigns | + +Requirements: +- Jobs must access the same database as the API. +- Jobs must not block API request handling. +- No additional infrastructure should be required beyond what Docker Compose already provides. +- Job failure should not crash the API server. +- Jobs do not need distributed execution (single-server deployment). + +### Decision + +We chose **APScheduler** (`BackgroundScheduler`) running as an in-process thread within the FastAPI application. + +Implementation details: +- The scheduler is started during FastAPI's `lifespan` startup event and shut down on application exit. +- Each job function creates its own `SessionLocal()` instance, independent from request-scoped sessions. +- All jobs use try/except/finally to ensure sessions are closed even on failure. +- Jobs are registered with `replace_existing=True` to handle server restarts cleanly. +- The scheduler is a module-level singleton in `jobs/mitre_sync_job.py`. + +### Consequences + +**Positive:** +- Zero additional infrastructure — no message broker, no worker containers, no job database. +- Jobs share the same Python process, so they can import services directly (`sync_mitre`, `scan_intel`, `create_snapshot`, etc.) without serialization or RPC. +- Simple debugging — job logs appear in the same stdout as API logs. +- Session isolation per job prevents interference with request-scoped transactions. +- `replace_existing=True` prevents duplicate job registrations on hot reload. + +**Negative:** +- **No persistence:** If the server crashes mid-job, the job state is lost. There is no retry mechanism — the job simply runs again at the next scheduled interval. +- **No distributed execution:** Cannot run jobs on a separate worker node. If the API is under heavy load, jobs compete for the same CPU and memory. +- **No dead letter queue:** Failed jobs are logged but not queued for retry. A failed MITRE sync silently waits 24 hours before trying again. +- **No job history:** There is no record of when jobs last ran, how long they took, or whether they succeeded — only log lines. +- **Single-instance constraint:** If multiple backend instances are running (horizontal scaling), each instance runs its own scheduler, causing duplicate job execution (double MITRE sync, double snapshots, etc.). +- **No manual trigger via scheduler:** Admin-triggered syncs go through the API endpoints (`/api/v1/system/*`), bypassing the scheduler entirely. There are effectively two paths to the same operations. + +**Risks:** +- The single-instance constraint is the most significant risk. If Aegis scales horizontally, APScheduler must be replaced or augmented with a distributed lock (e.g., PostgreSQL advisory locks or Redis-based locking). + +### Alternatives Considered + +| Alternative | Reason Rejected | +|------------|-----------------| +| **Celery + Redis/RabbitMQ** | Requires an additional broker container (Redis or RabbitMQ), a separate worker process, and Celery configuration. Significant operational overhead for 5 periodic tasks that each run for < 2 minutes. Would be justified if job volume grows or horizontal scaling is needed. | +| **Dramatiq + Redis** | Similar to Celery but lighter. Still requires a Redis container and a separate worker process. Same operational overhead concern. | +| **Cron jobs (host-level)** | Would require the host to have cron configured and scripts that call API endpoints or run Python commands inside the container. Breaks the "single Docker Compose" deployment model. Not portable. | +| **PostgreSQL `pg_cron`** | Runs inside the database, limited to SQL operations. Cannot execute Python logic (downloading ZIPs, parsing YAML, upserting with business rules). Would require stored procedures or external triggers. | +| **Kubernetes CronJobs** | Requires Kubernetes. Not applicable to the Docker Compose deployment model (see ADR-004). | +| **APScheduler with JobStore (PostgreSQL)** | APScheduler supports persistent job stores that would solve the single-instance problem via database locking. This is a viable evolution path — same library, minimal code change, adds distributed-safe execution. **Recommended as the first upgrade when horizontal scaling is needed.** | + +--- + +## ADR Evolution Path + +The following table summarizes when each decision should be revisited: + +| ADR | Revisit When | Likely Evolution | +|-----|-------------|-----------------| +| ADR-001 (FastAPI) | Stable — no change needed | Add structured logging, OpenTelemetry tracing | +| ADR-002 (PostgreSQL + JSONB) | JSONB query performance degrades | Add GIN indexes on JSONB columns, evaluate moving high-query fields to dedicated columns | +| ADR-003 (MinIO) | Cloud deployment required | Swap boto3 endpoint to AWS S3 / GCS (zero code change) | +| ADR-004 (Docker Compose) | Multi-server deployment needed | Migrate to Kubernetes with Helm charts, or add Ansible playbooks | +| ADR-005 (Modular Monolith) | Team grows > 5 developers, or domains need independent scaling | Enforce boundaries first (Clean Architecture refactor), then extract high-traffic domains as services if needed | +| ADR-006 (APScheduler) | Horizontal scaling required, or jobs need retry/history | Add APScheduler PostgreSQL JobStore first; migrate to Celery if job complexity grows significantly | diff --git a/docs/C4_CONTAINER_DIAGRAM.md b/docs/C4_CONTAINER_DIAGRAM.md new file mode 100644 index 0000000..ab760fa --- /dev/null +++ b/docs/C4_CONTAINER_DIAGRAM.md @@ -0,0 +1,228 @@ +# Aegis — C4 Container Diagram (Level 2) + +> **Author:** Architecture review +> **Date:** February 11, 2026 +> **Notation:** C4 Model — Level 2 (Container Diagram) + +--- + +## Diagram + +```mermaid +C4Container + title Aegis — Container Diagram (C4 Level 2) + + %% ─── Actors ───────────────────────────────────────────────────── + + Person(security_team, "Security Team", "Red/Blue Technicians, Red/Blue Leads, Viewers — interact with the platform via browser") + Person(admin, "Administrator", "Manages users, triggers data syncs, configures scoring weights, reviews audit logs") + + %% ─── System Boundary: Aegis Platform ──────────────────────────── + + Container_Boundary(aegis, "Aegis Platform") { + + Container(frontend, "Frontend SPA", "React 19, TypeScript, Vite, Tailwind CSS, Nginx", "Single-page application served by Nginx in production. Provides dashboards, ATT&CK heatmaps, test workflows, campaign management, compliance views, and report exports. Proxies /api/ requests to backend.") + + Container(backend, "Backend API", "Python 3.11, FastAPI, Uvicorn, SQLAlchemy", "REST API serving 21 router modules under /api/v1. Handles authentication (JWT + HttpOnly cookies), RBAC authorization (6 roles), Red/Blue test workflows, scoring engine, heatmap generation, report building, and CRUD for all domain entities. Rate-limited with SlowAPI.") + + Container(scheduler, "Background Scheduler", "APScheduler (in-process)", "Runs inside the backend process as a BackgroundScheduler thread. Executes 5 periodic jobs: MITRE ATT&CK sync (24h), intel scan (7d), notification cleanup (24h), weekly coverage snapshot (Sundays 00:00), recurring campaigns check (24h). Each job manages its own DB session.") + + ContainerDb(postgres, "PostgreSQL 15", "PostgreSQL, Alpine", "Primary relational data store. Holds techniques, tests, users, campaigns, threat actors, detection rules, compliance mappings, audit logs, notifications, coverage snapshots, and scoring configuration. Schema managed by Alembic migrations (18 versions).") + + ContainerDb(minio, "MinIO", "MinIO (S3-compatible), Alpine", "Object storage for Red/Blue team evidence files (screenshots, logs, PCAPs, documents). Stores files in the 'evidence' bucket. Backend generates presigned URLs for secure direct downloads.") + } + + %% ─── External Systems ─────────────────────────────────────────── + + System_Ext(mitre_taxii, "MITRE ATT&CK TAXII Server", "STIX/TAXII 2.0 feed providing Enterprise ATT&CK techniques and tactics catalog") + System_Ext(mitre_cti, "MITRE CTI GitHub", "STIX 2.0 bundles: ATT&CK techniques (fallback), threat actors (intrusion-sets), actor-technique relationships") + System_Ext(d3fend, "MITRE D3FEND API", "REST API providing defensive techniques and ATT&CK-to-D3FEND countermeasure mappings") + System_Ext(atomic, "Atomic Red Team", "GitHub repository with 1500+ atomic test YAML files mapped to ATT&CK techniques") + System_Ext(sigma, "SigmaHQ", "GitHub repository with Sigma detection rules in YAML, tagged with ATT&CK technique IDs") + System_Ext(elastic, "Elastic Detection Rules", "GitHub repository with Elastic SIEM rules in TOML format with MITRE threat mappings") + System_Ext(caldera, "MITRE CALDERA", "GitHub repository with CALDERA abilities in YAML, organized by tactic") + System_Ext(lolbas, "LOLBAS / GTFOBins", "GitHub repositories for Living Off The Land binaries (Windows) and GTFOBins (Linux)") + + %% ─── Planned Systems ──────────────────────────────────────────── + + System_Ext(github_actions, "GitHub Actions (Planned)", "Future CI/CD: lint, type check, pytest, Docker build, deploy") + System_Ext(artifactory, "Artifactory (Planned)", "Future artifact repository for Docker images and versioned build artifacts") + + %% ─── Relationships: Users → Containers ────────────────────────── + + Rel(security_team, frontend, "Uses", "HTTPS / Browser") + Rel(admin, frontend, "Uses", "HTTPS / Browser") + + %% ─── Relationships: Frontend → Backend ────────────────────────── + + Rel(frontend, backend, "Proxies API requests to", "HTTP (Nginx reverse proxy to backend:8000/api/)") + + %% ─── Relationships: Backend → Data Stores ─────────────────────── + + Rel(backend, postgres, "Reads/writes domain data", "TCP/5432, SQLAlchemy ORM") + Rel(backend, minio, "Uploads/downloads evidence files", "HTTP/9000, boto3 S3 API") + + %% ─── Relationships: Scheduler → Data Stores ───────────────────── + + Rel(scheduler, postgres, "Reads/writes via own sessions", "TCP/5432, SQLAlchemy") + + %% ─── Relationships: Backend/Scheduler → External Sources ──────── + + Rel(scheduler, mitre_taxii, "Syncs techniques every 24h", "TAXII 2.0 / HTTPS") + Rel(backend, mitre_cti, "Imports threat actors + fallback sync", "HTTPS, ZIP download") + Rel(backend, d3fend, "Imports D3FEND techniques and mappings", "REST API / HTTPS") + Rel(backend, atomic, "Imports atomic test templates", "HTTPS, ZIP ~40MB") + Rel(backend, sigma, "Imports Sigma detection rules", "HTTPS, ZIP download") + Rel(backend, elastic, "Imports Elastic detection rules", "HTTPS, ZIP download") + Rel(backend, caldera, "Imports CALDERA abilities", "HTTPS, ZIP download") + Rel(backend, lolbas, "Imports LOLBAS and GTFOBins", "HTTPS, ZIP download") + + %% ─── Relationships: Planned ───────────────────────────────────── + + Rel(github_actions, backend, "Builds, tests, deploys (planned)", "HTTPS") + Rel(github_actions, frontend, "Builds, deploys (planned)", "HTTPS") + Rel(github_actions, artifactory, "Pushes Docker images (planned)", "HTTPS") + + UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="1") +``` + +--- + +## Container Responsibilities + +### Frontend SPA + +| Attribute | Detail | +|-----------|--------| +| **Technology** | React 19, TypeScript 5.9, Vite 7.3, Tailwind CSS 4, React Router 7 | +| **Runtime (Dev)** | Node 20 + Vite dev server on port 5173 | +| **Runtime (Prod)** | Nginx Alpine serving static build artifacts on port 80 | +| **State Management** | AuthContext (React Context) + TanStack React Query for server state | +| **API Communication** | Axios client with `withCredentials: true` (HttpOnly JWT cookie) | +| **Security** | CSP headers, X-Frame-Options: DENY, X-Content-Type-Options: nosniff, gzip compression | +| **Responsibilities** | Render UI (21 pages, 30+ components), route protection by role, lazy loading, API proxy via Nginx `/api/` → `backend:8000/api/` | + +### Backend API + +| Attribute | Detail | +|-----------|--------| +| **Technology** | Python 3.11, FastAPI, Uvicorn, SQLAlchemy, Alembic, Pydantic | +| **Runtime** | Uvicorn ASGI server on port 8000 (behind Nginx proxy) | +| **API Surface** | 21 routers, 80+ endpoints under `/api/v1` | +| **Auth** | JWT (HS256) in HttpOnly cookie, bcrypt passwords, in-memory token blacklist | +| **RBAC** | 6 roles: admin, red_tech, blue_tech, red_lead, blue_lead, viewer | +| **Rate Limiting** | SlowAPI (5 req/min on login) | +| **Error Handling** | Global handlers for ValidationError → 400, SQLAlchemyError → 500, Exception → 500 | +| **Responsibilities** | All business logic, test workflow state machine, scoring engine, heatmap generation, report building, CRUD, data import orchestration, audit logging | + +### Background Scheduler + +| Attribute | Detail | +|-----------|--------| +| **Technology** | APScheduler `BackgroundScheduler` (runs in-process within backend) | +| **Lifecycle** | Starts on FastAPI lifespan startup, shuts down on app shutdown | +| **Session Model** | Each job creates and closes its own `SessionLocal()` instance | +| **Registered Jobs** | See table below | + +| Job | Trigger | Frequency | Action | +|-----|---------|-----------|--------| +| `mitre_sync` | Interval | Every 24 hours | Syncs ATT&CK techniques via TAXII 2.0 (fallback: GitHub ZIP) | +| `intel_scan` | Interval | Every 7 days | Scans threat intelligence sources for new indicators | +| `notification_cleanup` | Interval | Every 24 hours | Deletes read notifications older than 90 days | +| `weekly_snapshot` | Cron | Sundays at 00:00 | Creates coverage snapshot, cleans up old ones (keeps last 52) | +| `recurring_campaigns` | Interval | Every 24 hours | Checks and spawns due recurring test campaigns | + +### PostgreSQL 15 + +| Attribute | Detail | +|-----------|--------| +| **Image** | `postgres:15-alpine` | +| **Database** | `attackdb` | +| **Schema Management** | Alembic with 18 migration versions | +| **Connection** | `postgresql://user:pass@postgres:5432/attackdb` via SQLAlchemy | +| **Volumes** | Named volume `aegis_postgres_data_prod` for persistence | +| **Health Check** | `pg_isready` every 5 seconds | +| **Data Stored** | Techniques (ATT&CK), tests (Red/Blue workflow), users, campaigns, threat actors, detection rules (Sigma/Elastic), D3FEND mappings, compliance frameworks, audit logs, notifications, coverage snapshots, scoring config, intel items, data sources, evidence metadata | + +### MinIO (S3-compatible) + +| Attribute | Detail | +|-----------|--------| +| **Image** | `minio/minio:latest` | +| **Ports** | 9000 (S3 API), 9001 (admin console) | +| **Bucket** | `evidence` (auto-created on backend startup) | +| **Access** | Via boto3 S3 API from backend | +| **Volumes** | Named volume `aegis_minio_data_prod` for persistence | +| **Responsibilities** | Store Red/Blue team evidence files (screenshots, logs, PCAPs). Backend generates time-limited presigned URLs for secure browser downloads. | + +### GitHub Actions (Planned) + +| Attribute | Detail | +|-----------|--------| +| **Status** | Not yet implemented — no `.github/workflows/` directory exists | +| **Planned Scope** | Lint (ruff/flake8), type check (mypy), unit/integration tests (pytest), Docker image build, deploy to staging/production | +| **Integration** | Would trigger on push/PR to main branch | +| **Artifact Flow** | Build Docker images → push to Artifactory → deploy via compose | + +### Artifactory (Planned) + +| Attribute | Detail | +|-----------|--------| +| **Status** | Not yet implemented — no integration code exists | +| **Planned Scope** | Docker image registry for versioned backend/frontend images | +| **Integration** | Receive images from GitHub Actions CI pipeline, serve to production deploy | + +--- + +## Network Topology + +``` + Internet + │ + │ HTTPS (:80 / :443) + ▼ + ┌─────────────────┐ + │ Frontend │ + │ Nginx + React │ + │ :80 │ + └────────┬────────┘ + │ + ┌────────────┼────────────────────────────────┐ + │ │ aegis-network (bridge) │ + │ │ /api/ proxy │ + │ ▼ │ + │ ┌─────────────────┐ │ + │ │ Backend API │◄── Scheduler │ + │ │ FastAPI/Uvicorn │ (in-process thread) │ + │ │ :8000 │ │ + │ └───┬─────────┬──┘ │ + │ │ │ │ + │ │ │ │ + │ ▼ ▼ │ + │ ┌─────────┐ ┌───────┐ │ + │ │PostgreSQL│ │ MinIO │ │ + │ │ :5432 │ │ :9000 │ │ + │ └─────────┘ └───────┘ │ + │ │ + └──────────────────────────────────────────────┘ + │ + │ HTTPS (outbound only) + ▼ + External Data Sources + (MITRE, SigmaHQ, Elastic, etc.) +``` + +--- + +## Data Flow Summary + +| Flow | Path | Protocol | Notes | +|------|------|----------|-------| +| User → UI | Browser → Nginx | HTTPS | Static SPA assets, gzip compressed, 1-year cache for static files | +| UI → API | Nginx → Uvicorn | HTTP (internal) | Reverse proxy with 300s timeout for long sync operations | +| API → DB | Uvicorn → PostgreSQL | TCP/5432 | SQLAlchemy ORM, request-scoped sessions via `get_db()` | +| API → Storage | Uvicorn → MinIO | HTTP/9000 | boto3 S3 API, presigned URLs for downloads | +| Scheduler → DB | APScheduler thread → PostgreSQL | TCP/5432 | Independent sessions per job, created/closed in try/finally | +| Scheduler → External | APScheduler thread → MITRE TAXII | HTTPS | Scheduled sync every 24h, fallback to GitHub ZIP | +| Admin → External | API on-demand → GitHub repos | HTTPS | ZIP download triggered by admin via `/api/v1/system/*` endpoints | +| Health Check | Docker → Backend `/health` | HTTP (internal) | Restricted to private IPs via Nginx `allow/deny` directives | diff --git a/docs/C4_CONTEXT_DIAGRAM.md b/docs/C4_CONTEXT_DIAGRAM.md new file mode 100644 index 0000000..9091257 --- /dev/null +++ b/docs/C4_CONTEXT_DIAGRAM.md @@ -0,0 +1,140 @@ +# Aegis — C4 Context Diagram (Level 1) + +> **Author:** Architecture review +> **Date:** February 11, 2026 +> **Notation:** C4 Model — Level 1 (System Context) + +--- + +## Diagram + +```mermaid +C4Context + title Aegis — System Context Diagram (C4 Level 1) + + %% ─── Actors (People) ──────────────────────────────────────────── + + Person(red_tech, "Red Team Technician", "Executes offensive tests, submits evidence, creates tests from templates") + Person(blue_tech, "Blue Team Technician", "Evaluates detection results, submits blue evidence, documents findings") + Person(red_lead, "Red Team Lead", "Validates red team results, manages campaigns, reviews test outcomes") + Person(blue_lead, "Blue Team Lead", "Validates blue team results, manages remediation, reviews detection gaps") + Person(admin, "Administrator", "Manages users, triggers data syncs, configures scoring, oversees platform") + Person(viewer, "Viewer", "Read-only access to dashboards, reports, heatmaps, and compliance status") + + %% ─── Core System ──────────────────────────────────────────────── + + System(aegis, "Aegis Platform", "MITRE ATT&CK coverage management platform. Orchestrates Red/Blue team validation workflows, tracks technique coverage, generates heatmaps, compliance reports, and organizational scoring.") + + %% ─── Internal Infrastructure (Owned / Deployed) ───────────────── + + SystemDb(postgres, "PostgreSQL 15", "Primary data store. Stores techniques, tests, users, campaigns, threat actors, compliance mappings, audit logs, scoring config, and snapshots.") + SystemDb(minio, "MinIO (S3-compatible)", "Object storage for Red/Blue team evidence files (screenshots, logs, PCAPs). Serves presigned download URLs.") + + %% ─── External Data Sources (Consumed) ─────────────────────────── + + System_Ext(mitre_taxii, "MITRE ATT&CK TAXII Server", "STIX/TAXII 2.0 feed providing Enterprise ATT&CK techniques and tactics. Primary source for technique catalog sync.") + System_Ext(mitre_cti, "MITRE CTI GitHub Repository", "STIX 2.0 bundles for ATT&CK techniques (fallback), intrusion-sets (threat actors), and actor-technique relationships.") + System_Ext(d3fend, "MITRE D3FEND API", "Public REST API providing defensive techniques and ATT&CK-to-D3FEND mappings for countermeasure coverage.") + System_Ext(atomic, "Atomic Red Team (GitHub)", "Repository of atomic tests mapped to ATT&CK techniques. Downloaded as ZIP, parsed from YAML atomics.") + System_Ext(sigma, "SigmaHQ (GitHub)", "Repository of Sigma detection rules in YAML format. Parsed for ATT&CK tags and imported as detection rules.") + System_Ext(elastic, "Elastic Detection Rules (GitHub)", "Repository of Elastic SIEM rules in TOML format. Parsed for MITRE threat mappings and imported as detection rules.") + System_Ext(caldera, "MITRE CALDERA (GitHub)", "Repository of CALDERA abilities. YAML files parsed from data/abilities/ and imported as test templates.") + System_Ext(lolbas, "LOLBAS Project (GitHub)", "Living Off The Land Binaries and Scripts. YAML-based catalog imported as test templates mapped to ATT&CK techniques.") + System_Ext(gtfobins, "GTFOBins (GitHub)", "Unix binaries exploitation reference. Markdown with YAML front-matter parsed and mapped to ATT&CK techniques.") + + %% ─── Planned Systems (Not Yet Integrated) ────────────────────── + + System_Ext(github_ent, "GitHub Enterprise (Planned)", "Future CI/CD pipeline integration for automated linting, type checking, test execution, and deployment workflows.") + System_Ext(artifactory, "Artifactory (Planned)", "Future artifact repository for storing Docker images, build artifacts, and versioned releases.") + + %% ─── Relationships: Users → Aegis ─────────────────────────────── + + Rel(red_tech, aegis, "Creates and executes tests, uploads red evidence, uses test catalog", "HTTPS") + Rel(blue_tech, aegis, "Evaluates detections, uploads blue evidence, reviews detection rules", "HTTPS") + Rel(red_lead, aegis, "Validates red results, manages campaigns, reviews threat actor coverage", "HTTPS") + Rel(blue_lead, aegis, "Validates blue results, tracks remediation, reviews compliance", "HTTPS") + Rel(admin, aegis, "Manages users, triggers syncs, configures scoring weights, views audit logs", "HTTPS") + Rel(viewer, aegis, "Views dashboards, heatmaps, reports, and compliance status", "HTTPS") + + %% ─── Relationships: Aegis → Infrastructure ────────────────────── + + Rel(aegis, postgres, "Reads/writes all domain data", "TCP/5432, SQLAlchemy") + Rel(aegis, minio, "Uploads/downloads evidence files, generates presigned URLs", "HTTP/9000, boto3 S3 API") + + %% ─── Relationships: Aegis → External Sources ──────────────────── + + Rel(aegis, mitre_taxii, "Syncs ATT&CK techniques every 24h", "TAXII 2.0 / HTTPS") + Rel(aegis, mitre_cti, "Fallback technique sync + threat actor import", "HTTPS, ZIP download") + Rel(aegis, d3fend, "Imports defensive techniques and ATT&CK mappings", "REST API / HTTPS") + Rel(aegis, atomic, "Imports Atomic Red Team test templates", "HTTPS, ZIP download") + Rel(aegis, sigma, "Imports Sigma detection rules with ATT&CK tags", "HTTPS, ZIP download") + Rel(aegis, elastic, "Imports Elastic SIEM detection rules", "HTTPS, ZIP download") + Rel(aegis, caldera, "Imports CALDERA abilities as test templates", "HTTPS, ZIP download") + Rel(aegis, lolbas, "Imports LOLBAS binaries as test templates", "HTTPS, ZIP download") + Rel(aegis, gtfobins, "Imports GTFOBins as test templates", "HTTPS, ZIP download") + + %% ─── Relationships: Aegis → Planned ───────────────────────────── + + Rel(aegis, github_ent, "CI/CD pipelines (planned)", "HTTPS") + Rel(aegis, artifactory, "Artifact storage (planned)", "HTTPS") + + UpdateLayoutConfig($c4ShapeInRow="3", $c4BoundaryInRow="1") +``` + +--- + +## Diagram Notes + +### Actor Roles + +| Role | Access Level | Primary Actions | +|------|-------------|-----------------| +| **Red Team Technician** | Standard | Create tests, execute attacks, upload red evidence, use test catalog | +| **Blue Team Technician** | Standard | Evaluate detections, upload blue evidence, review detection rules | +| **Red Team Lead** | Elevated | Validate red results, manage campaigns, review threat actor coverage | +| **Blue Team Lead** | Elevated | Validate blue results, track remediation, review compliance | +| **Administrator** | Full | User management, trigger data syncs, scoring config, audit logs | +| **Viewer** | Read-only | View dashboards, heatmaps, reports, compliance status | + +### External Data Source Details + +| Source | Protocol | Frequency | Data Imported | +|--------|----------|-----------|---------------| +| MITRE ATT&CK TAXII | STIX/TAXII 2.0 | Every 24 hours (scheduled) | Enterprise techniques and tactics | +| MITRE CTI GitHub | HTTPS (ZIP) | Fallback + on-demand | Techniques, threat actors (intrusion-sets), actor-technique relationships | +| MITRE D3FEND | REST API | On-demand (admin trigger) | Defensive techniques, ATT&CK-to-D3FEND mappings | +| Atomic Red Team | HTTPS (ZIP ~40MB) | On-demand (admin trigger) | Test templates from `atomics/T*/T*.yaml` | +| SigmaHQ | HTTPS (ZIP) | On-demand (admin trigger) | Sigma detection rules with ATT&CK tags | +| Elastic Detection Rules | HTTPS (ZIP) | On-demand (admin trigger) | Elastic SIEM rules in TOML with MITRE mappings | +| MITRE CALDERA | HTTPS (ZIP) | On-demand (admin trigger) | Abilities from `data/abilities/{tactic}/*.yml` | +| LOLBAS Project | HTTPS (ZIP) | On-demand (admin trigger) | Living Off The Land binaries/scripts | +| GTFOBins | HTTPS (ZIP) | On-demand (admin trigger) | Unix binary exploitation references | + +### Planned Integrations (Not Yet Implemented) + +| System | Purpose | Status | +|--------|---------|--------| +| **GitHub Enterprise** | CI/CD pipelines for automated lint, type check, tests, and deployment | Planned — no `.github/workflows` exist yet | +| **Artifactory** | Docker image and build artifact repository | Planned — no integration code exists yet | + +### Infrastructure Boundary + +``` +┌─────────────────────────────────────────────┐ +│ Docker Compose Network │ +│ │ +│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │ +│ │ Frontend │ │ Backend │ │ PostgreSQL│ │ +│ │ (Nginx) │ │ (Uvicorn)│ │ 15 │ │ +│ │ :80 │ │ :8000 │ │ :5432 │ │ +│ └──────────┘ └──────────┘ └───────────┘ │ +│ ┌───────────┐ │ +│ │ MinIO │ │ +│ │ :9000/9001│ │ +│ └───────────┘ │ +└─────────────────────────────────────────────┘ + ▲ │ + │ HTTPS │ HTTPS (outbound) + │ ▼ + Users External Sources +``` diff --git a/docs/TECHNOLOGY_JUSTIFICATION.md b/docs/TECHNOLOGY_JUSTIFICATION.md new file mode 100644 index 0000000..ac0c63d --- /dev/null +++ b/docs/TECHNOLOGY_JUSTIFICATION.md @@ -0,0 +1,291 @@ +# Aegis — Technology Justification + +> **Document type:** Architecture Board Submission +> **Author:** Platform Architecture Team +> **Date:** February 11, 2026 +> **Classification:** Internal +> **Status:** Approved + +--- + +## 1. Purpose + +This document provides a formal justification for the technology selections made in the Aegis platform. Each technology choice is evaluated against the project's operational requirements, organizational constraints, security posture, and long-term sustainability. This document is intended for review by the Architecture Board and serves as the authoritative reference for technology governance. + +--- + +## 2. Project Context + +Aegis is an internal security operations platform that manages MITRE ATT&CK technique coverage through structured Red Team / Blue Team validation workflows. The platform integrates with 9 external threat intelligence and detection rule sources, enforces role-based access for 6 distinct user roles, and provides coverage analytics including heatmaps, scoring, compliance mapping, and executive reporting. + +### Operational Requirements + +| Requirement | Detail | +|------------|--------| +| **Deployment model** | On-premise, single-server, air-gap compatible | +| **User base** | 10–100 concurrent security analysts and leads | +| **Data model** | 18+ relational entities with many-to-many relationships and semi-structured metadata | +| **External integrations** | 9 data sources (MITRE TAXII 2.0, GitHub REST, D3FEND REST, Sigma YAML, Elastic TOML, CALDERA YAML, LOLBAS YAML, GTFOBins Markdown, STIX 2.0 JSON) | +| **File storage** | Binary evidence files (screenshots, logs, PCAPs) ranging from KB to hundreds of MB | +| **Scheduled operations** | 5 periodic background jobs (24h–7d cycles) | +| **Security** | RBAC, JWT authentication, audit logging, evidence chain of custody | + +### Organizational Constraints + +| Constraint | Detail | +|-----------|--------| +| **Team expertise** | Primary competency in Python and TypeScript | +| **Target operators** | Security engineers, not DevOps specialists | +| **Infrastructure** | Docker available; Kubernetes not guaranteed | +| **Network** | Outbound HTTPS required for data source sync; inbound limited to platform UI | +| **Budget** | Open-source preference; no commercial license dependencies for core platform | + +--- + +## 3. Backend Framework: FastAPI + +### Selection: FastAPI 0.x (latest stable) with Uvicorn ASGI server + +### Justification + +FastAPI was selected as the backend framework based on four primary evaluation criteria: API development velocity, ecosystem compatibility, runtime performance, and developer experience. + +**API Development Velocity.** Aegis exposes 80+ REST endpoints across 21 domain modules. FastAPI's automatic OpenAPI specification generation from Python type annotations eliminates the need for separate API documentation tooling. Pydantic integration provides request and response validation at the framework level, reducing boilerplate code for schema enforcement. The `Depends()` dependency injection system enables composable middleware chains for authentication, authorization, and database session management without requiring a third-party DI container. + +**Ecosystem Compatibility.** The platform's 9 external data source integrations depend on Python-specific libraries with no mature equivalents in other ecosystems: + +| Library | Purpose | Ecosystem | +|---------|---------|-----------| +| `taxii2-client` | STIX/TAXII 2.0 protocol | Python only | +| `pySigma` | Sigma rule parsing and transformation | Python only | +| `PyYAML` | YAML parsing (Atomic Red Team, CALDERA, LOLBAS) | Python preferred | +| `toml` | TOML parsing (Elastic detection rules) | Python preferred | +| `boto3` | S3-compatible storage API (MinIO) | Python preferred | +| `defusedxml` | Secure XML processing | Python preferred | + +Selecting a non-Python backend would require reimplementing or wrapping these libraries, introducing significant engineering risk. + +**Runtime Performance.** FastAPI's ASGI foundation provides asynchronous request handling capability. While the current implementation uses synchronous route handlers (due to SQLAlchemy's synchronous session model), the framework does not impose a performance ceiling for the target user base (10–100 concurrent users). Benchmark data from independent testing consistently places FastAPI among the highest-performing Python web frameworks. + +**Developer Experience.** Interactive Swagger UI (`/docs`) and ReDoc (`/redoc`) are available in non-production environments, accelerating API exploration and frontend integration. These documentation endpoints are automatically disabled in production to reduce attack surface. + +### Alternatives Evaluated + +| Framework | Evaluation Summary | Disposition | +|-----------|-------------------|-------------| +| Django + Django REST Framework | Mature and feature-rich, but introduces heavier ORM opinions, an unnecessary admin panel, and slower cold-start times. Django's ORM lacks SQLAlchemy's flexibility for JSONB column handling and complex join patterns required by the scoring engine. | Rejected | +| Flask + Flask-RESTful | Lightweight but lacks built-in request validation, automatic OpenAPI generation, and dependency injection. Would require additional libraries (marshmallow, flask-apispec) to achieve parity with FastAPI's built-in capabilities. | Rejected | +| Go (Gin / Echo) | Superior raw throughput, but the team's primary expertise is Python. The 9 data source integrations depend on Python libraries with no Go equivalents. The development velocity loss would outweigh performance gains for a 10–100 user internal platform. | Rejected | +| NestJS (Node.js / TypeScript) | Would unify frontend and backend language, but splits runtime expertise. No mature Node.js equivalents for STIX/TAXII and Sigma rule parsing. The Python data science and security tooling ecosystem is substantially deeper. | Rejected | + +--- + +## 4. Primary Database: PostgreSQL 15 + +### Selection: PostgreSQL 15 (Alpine) with SQLAlchemy ORM and Alembic migrations + +### Justification + +PostgreSQL was selected as the primary relational data store based on three requirements: relational integrity for a complex domain model, semi-structured data support for external source metadata, and operational maturity for on-premise deployment. + +**Relational Integrity.** The Aegis data model comprises 18+ entities with deep relational dependencies: techniques relate to tests, tests belong to campaigns, campaigns map to threat actors, threat actors link to techniques, compliance controls map to techniques, and detection rules associate with both techniques and test templates. This graph of many-to-many relationships demands foreign key enforcement, transactional consistency, and efficient join operations — core strengths of a relational database. + +**Semi-Structured Data (JSONB).** Several entities carry metadata with variable structure imported from external sources (STIX 2.0, Sigma YAML, Elastic TOML). PostgreSQL's native JSONB column type stores this data in a binary-indexed format that supports containment queries and GIN indexing, eliminating the need for a separate document store. Current JSONB usage is contained to 12 columns across 6 tables: + +| Entity | JSONB Fields | Content | +|--------|-------------|---------| +| Technique | `platforms` | OS platform array from ATT&CK | +| Threat Actor | `aliases`, `target_sectors`, `target_regions`, `references` | STIX 2.0 metadata | +| Detection Rule | `platforms`, `log_sources` | Rule targeting metadata | +| Data Source | `last_sync_stats`, `config` | Import statistics and source-specific configuration | +| Campaign | `tags` | User-defined classification | +| Audit Log | `details` | Action-specific metadata (variable per action type) | + +**Operational Maturity.** PostgreSQL 15 provides built-in health checking (`pg_isready`), mature backup tooling (`pg_dump`/`pg_restore`), extensive monitoring capabilities, and a 25+ year track record of production reliability. The Alpine-based Docker image is approximately 80MB, suitable for on-premise deployments with limited resources. + +**Schema Management.** Alembic provides version-controlled database migrations (18 versions to date), enabling reproducible schema evolution and rollback capability. + +### Alternatives Evaluated + +| Database | Evaluation Summary | Disposition | +|----------|-------------------|-------------| +| MongoDB | The core domain is deeply relational. Modeling technique-test-campaign-actor relationships in MongoDB would require denormalization or manual reference integrity, trading the JSONB advantage for relational integrity loss. | Rejected | +| MySQL 8 (JSON) | PostgreSQL's JSONB is binary-indexed and faster for containment queries than MySQL's text-based JSON type. PostgreSQL also provides native UUID support (vs. BINARY(16) in MySQL), which aligns with the platform's UUID-based primary keys. | Rejected | +| PostgreSQL + MongoDB (dual) | The operational complexity of maintaining two database systems is unjustified for 12 JSONB columns. A dual-database architecture would also complicate transactional consistency across relational and document data. | Rejected | + +--- + +## 5. Object Storage: MinIO + +### Selection: MinIO (S3-compatible) accessed via boto3 (AWS S3 SDK for Python) + +### Justification + +MinIO was selected as the evidence storage system based on three requirements: S3 API compatibility for portability, on-premise deployment capability, and separation of binary data from the relational database. + +**S3 API Compatibility.** MinIO implements the Amazon S3 API specification, accessed via the industry-standard `boto3` client library. This provides a zero-code-change migration path to AWS S3, Google Cloud Storage (via S3-compatible mode), or any other S3-compatible storage service should the deployment model change from on-premise to cloud. The storage interface (`upload_file`, `get_presigned_url`, `ensure_bucket_exists`) is a thin abstraction layer that is storage-backend agnostic. + +**On-Premise Deployment.** The platform is designed for deployment within organizational security environments where external cloud storage services may not be permitted due to data classification or regulatory requirements. MinIO runs as a single Docker container with persistent volume storage, requiring no external dependencies or network egress for storage operations. + +**Binary Data Separation.** Evidence files (screenshots, packet captures, log extracts) range from kilobytes to hundreds of megabytes. Storing binary data in PostgreSQL (BYTEA columns) would degrade database backup performance, increase storage costs, and complicate streaming downloads. MinIO's presigned URL mechanism offloads download bandwidth from the application server — the browser fetches evidence files directly from MinIO without proxying through the backend. + +**Administrative Visibility.** MinIO Console (port 9001) provides a web-based management interface for administrators to inspect, audit, and manage stored evidence files without requiring command-line access. + +### Alternatives Evaluated + +| Storage | Evaluation Summary | Disposition | +|---------|-------------------|-------------| +| PostgreSQL BYTEA | Stores binary files in the relational database. Bloats backups, degrades query performance on large tables, and requires the backend to proxy all file downloads. Not designed as a file store. | Rejected | +| Local filesystem | Not portable across container restarts without host volume mounts. No presigned URL support (backend must proxy all downloads). No replication, versioning, or management interface. | Rejected | +| AWS S3 | Requires a cloud account, internet connectivity for storage operations, and AWS credential management. Incompatible with air-gap or restricted-network deployment requirements. | Rejected (as primary; migration path preserved) | +| SeaweedFS | Smaller community and less mature S3-compatible API layer. boto3 compatibility is not fully guaranteed. Insufficient adoption for long-term support confidence. | Rejected | + +--- + +## 6. Frontend: React 19 + TypeScript 5.9 + +### Selection: React 19, TypeScript 5.9, Vite 7.3, Tailwind CSS 4, TanStack React Query 5 + +### Justification + +The frontend technology selection was driven by four criteria: component ecosystem maturity, type safety for a complex domain, build tooling performance, and developer productivity. + +**Component Ecosystem Maturity.** Aegis presents a complex user interface comprising 21 pages, 30+ components, and specialized visualizations including ATT&CK Navigator-compatible heatmaps, campaign timelines, compliance gauges, and multi-role workflow views. React's component model and its ecosystem (Recharts for data visualization, Lucide for iconography, TanStack Virtual for list virtualization) provide production-ready solutions for each of these requirements. + +**Type Safety.** TypeScript's static type system enforces correctness across the API communication layer (22 domain-specific API modules), shared type definitions (`types/models.ts`), and component props. With `strict: true` in `tsconfig.json`, the compiler catches null reference errors, incorrect property access, and type mismatches at build time rather than runtime. This is particularly valuable for the complex test workflow state machine, where state-dependent UI behavior must correctly reflect 6 possible test states and 6 user roles. + +**Build Tooling.** Vite provides sub-second hot module replacement during development and optimized production builds via Rollup. The multi-stage Docker build produces a minimal Nginx image (~25MB) serving pre-compiled static assets, eliminating the need for a Node.js runtime in production. + +**Server State Management.** TanStack React Query manages all server-side state (caching, refetching, mutation invalidation), eliminating the need for a client-side state management library (Redux, Zustand, MobX) for data fetching concerns. Authentication state is managed via React Context, and UI feedback via a Toast context — both lightweight patterns that avoid unnecessary library dependencies. + +**Styling.** Tailwind CSS 4 provides utility-first styling with zero-runtime CSS generation. The design system is consistent across all 21 pages without maintaining a separate CSS architecture or component library. + +### Alternatives Evaluated + +| Framework | Evaluation Summary | Disposition | +|-----------|-------------------|-------------| +| Angular | Comprehensive framework with built-in DI, routing, and HTTP client. However, the heavier abstraction layer and steeper learning curve are unnecessary for a team with React experience. Angular's opinionated module system adds boilerplate for a project of this scale. | Rejected | +| Vue 3 + TypeScript | Viable alternative with good TypeScript support and a smaller learning curve. However, the React ecosystem offers deeper library coverage for specialized components (ATT&CK heatmaps, data grids, chart libraries). The team's existing React proficiency favors continuity. | Rejected | +| Svelte / SvelteKit | Excellent developer experience and smaller bundle sizes, but a significantly smaller ecosystem for complex data visualization. Library availability for heatmaps, virtual scrolling, and charting is limited compared to React. | Rejected | +| HTMX + server-rendered templates | Would reduce frontend complexity but cannot support the interactive heatmap, drag-and-drop campaign management, real-time notification updates, and complex multi-step workflow forms required by the platform. | Rejected | + +--- + +## 7. Containerization and Deployment: Docker Compose + +### Selection: Docker with Docker Compose (V2), multi-stage Dockerfiles + +### Justification + +Docker Compose was selected as the deployment orchestration tool based on three requirements: single-command deployment for non-DevOps operators, consistent development-to-production environments, and minimal infrastructure prerequisites. + +**Operator Accessibility.** The platform is deployed by security engineers who may not have Kubernetes expertise or access to container orchestration infrastructure. Docker Compose provides single-command deployment (`docker compose up -d --build`) with an interactive installation script (`install.sh`) that generates secrets, prompts for configuration, and produces a `.env` file. This reduces deployment complexity to a level appropriate for the target operator profile. + +**Environment Consistency.** Two compose files maintain parity between development and production: + +| Aspect | Development | Production | +|--------|-------------|------------| +| Frontend | Vite dev server, hot reload | Nginx serving static build | +| Backend | Source volume-mounted, auto-reload | Multi-stage build, non-root user | +| Ports | All services exposed | Only frontend exposed | +| Secrets | Auto-generated ephemeral | Required via environment | + +**Infrastructure Footprint.** The entire platform (4 services) runs on a single server with Docker as the only prerequisite. Named volumes provide data persistence across container rebuilds. Health checks and dependency ordering ensure correct startup sequencing. + +**Security Hardening.** The backend Dockerfile follows container security best practices: non-root user (`appuser`, UID 1001), minimal base image (`python:3.11-slim`), and no unnecessary system packages beyond build dependencies. + +### Alternatives Evaluated + +| Platform | Evaluation Summary | Disposition | +|----------|-------------------|-------------| +| Kubernetes | Provides horizontal scaling, rolling deployments, and self-healing. However, it requires a cluster, kubectl expertise, Helm charts, ingress controllers, and persistent volume claims. This operational overhead is disproportionate for a 4-service application targeting single-server deployment. | Rejected (viable future evolution for multi-server) | +| Docker Swarm | Adds orchestration with lower complexity than Kubernetes but provides minimal benefit over Compose for < 5 services. Docker Swarm's development trajectory has stalled relative to Compose V2. | Rejected | +| Bare metal / systemd | Loses containerization benefits: isolation, reproducibility, and dependency management. Would require manual installation of Python, Node.js, PostgreSQL, and MinIO on each target system, increasing deployment failure risk. | Rejected | + +--- + +## 8. CI/CD and Artifact Management: GitHub Enterprise + Artifactory + +### Selection: GitHub Enterprise for source control and CI/CD; JFrog Artifactory for artifact storage + +### Status: Planned — not yet implemented + +### Justification + +GitHub Enterprise and Artifactory are designated as the CI/CD and artifact management platforms for Aegis based on organizational standardization, security requirements, and the artifact lifecycle. + +**Organizational Standardization.** GitHub Enterprise is the organization's standard source control and CI/CD platform. Adopting it for Aegis ensures consistency with existing developer workflows, access control policies, and audit mechanisms. Security teams reviewing the Aegis codebase will use familiar tooling and processes. + +**CI/CD Pipeline (Planned).** The following GitHub Actions workflow stages are planned: + +| Stage | Tools | Trigger | +|-------|-------|---------| +| **Lint** | ruff (Python), ESLint (TypeScript) | Push to any branch | +| **Type check** | mypy (Python), tsc --noEmit (TypeScript) | Push to any branch | +| **Unit tests** | pytest (backend), vitest (frontend) | Push to any branch | +| **Integration tests** | pytest with PostgreSQL service container | Pull request to main | +| **Docker build** | Multi-stage Dockerfile verification | Pull request to main | +| **Image publish** | Docker build + push to Artifactory | Merge to main | +| **Deploy** | Docker Compose pull + restart | Manual trigger or tag | + +**Artifact Repository.** Artifactory serves as the Docker image registry for versioned backend and frontend images. This provides: + +- **Versioned releases:** Each merge to main produces a tagged image (`aegis-backend:1.2.3`, `aegis-frontend:1.2.3`). +- **Rollback capability:** Previous image versions remain available for rapid rollback. +- **Vulnerability scanning:** Artifactory's Xray integration enables automated CVE scanning of Docker image layers. +- **Access control:** Image pull/push permissions align with organizational RBAC policies. + +**Air-Gap Deployment Support.** For restricted-network deployments, Docker images can be exported from Artifactory as tarballs (`docker save`), transferred via secure media, and loaded into the target environment (`docker load`) without requiring network connectivity to the registry. + +### Implementation Timeline + +| Phase | Scope | Estimated Effort | +|-------|-------|-----------------| +| Phase 1 | Basic CI: lint + type check + unit tests | 1–2 days | +| Phase 2 | Integration tests with PostgreSQL service container | 2–3 days | +| Phase 3 | Docker image build + Artifactory publish | 1–2 days | +| Phase 4 | Automated deployment trigger | 2–3 days | + +--- + +## 9. Technology Stack Summary + +| Layer | Technology | Version | License | Purpose | +|-------|-----------|---------|---------|---------| +| **Backend** | Python | 3.11 | PSF | Runtime | +| | FastAPI | latest | MIT | Web framework | +| | Uvicorn | latest | BSD | ASGI server | +| | SQLAlchemy | latest | MIT | ORM | +| | Alembic | latest | MIT | Migrations | +| | Pydantic | v2 | MIT | Validation | +| | APScheduler | latest | MIT | Background jobs | +| | boto3 | latest | Apache 2.0 | S3 storage client | +| **Frontend** | React | 19.2 | MIT | UI framework | +| | TypeScript | 5.9 | Apache 2.0 | Type safety | +| | Vite | 7.3 | MIT | Build tooling | +| | Tailwind CSS | 4.1 | MIT | Styling | +| | TanStack Query | 5.90 | MIT | Server state | +| | Recharts | 2.15 | MIT | Visualization | +| **Database** | PostgreSQL | 15 | PostgreSQL | Relational store | +| **Storage** | MinIO | latest | AGPL-3.0 | Object storage | +| **Infrastructure** | Docker | latest | Apache 2.0 | Containerization | +| | Docker Compose | V2 | Apache 2.0 | Orchestration | +| | Nginx | Alpine | BSD | Reverse proxy | +| **CI/CD** | GitHub Enterprise | — | Commercial | Source control + CI | +| **Artifacts** | Artifactory | — | Commercial | Image registry | + +### License Compliance Note + +All core platform dependencies use permissive open-source licenses (MIT, BSD, Apache 2.0, PSF, PostgreSQL License). The only copyleft dependency is MinIO (AGPL-3.0), which is used as a standalone service (not linked into application code) and therefore does not impose AGPL obligations on the Aegis codebase. GitHub Enterprise and Artifactory are covered under existing organizational commercial licenses. + +--- + +## 10. Approval + +| Role | Name | Date | Signature | +|------|------|------|-----------| +| Platform Architect | | | | +| Security Architect | | | | +| Infrastructure Lead | | | | +| Development Lead | | | | +| Architecture Board Chair | | | |