Root cause: Cloudflare CDN was caching empty/error API responses from
/api/v1/metrics/* endpoints during the backend startup window (502 errors).
Subsequent requests were served from Cloudflare edge cache, never reaching
nginx or the backend, so the dashboard always showed empty metrics data.
NoCacheAPIMiddleware adds Cache-Control: no-store + Pragma: no-cache to
all /api/ responses so Cloudflare and browsers never cache them.
Backend:
- intel_service: remove 50-technique limit (scan all techniques), improve
pattern matching with word boundaries (\bT1059\b), raise min name length
to 8 chars to reduce false positives, skip entries with empty titles
- technique_query_service: add intel_items to get_technique_detail() so
the technique page now shows recent threat intel articles (last 20)
- New GET /intel/items endpoint with optional technique_id filter
Frontend:
- New api/intel.ts with listIntelItems()
- ReviewQueuePage: complete redesign
* Expandable rows — click a technique to see its intel articles inline
* IntelPanel component fetches articles per technique on expand
* 'Create Template from Intel' button opens pre-filled modal:
name (from article title), source_url (article link), technique_id
User reads the article and fills the attack procedure
* Updated explanation text: lists all 3 reasons a technique can be flagged
(MITRE update / intel scan / new template or detection rule)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PostureSnapshot model, Alembic migration (b039exec), schemas, service
aggregating all phases (coverage/risk/operations/knowledge/MTTD), and
router at /api/v1/dashboard with executive view, KPIs, coverage-by-tactic,
posture-history, posture-snapshot, and activity-feed endpoints.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Playbooks: versioned Markdown runbooks per technique × type (attack/detect/investigate/respond/hunt)
- PlaybookVersion: immutable snapshots on every update; restore to any previous version
- LessonLearned: post-mortem records linked to tests/campaigns/attack-paths or manual
- Alembic migration b037know (raw SQL, idempotent, no PostgreSQL enums)
- Router /api/v1/knowledge: 14 endpoints for playbooks + lessons + stats
- Pydantic validators for playbook_type, severity, entity_type (422 on invalid)
- Knowledge stats endpoint: totals + breakdown by severity and playbook type
- Soft-delete on both resources; include_inactive filter for admin recovery
- QA script: 70+ tests across CRUD, versioning, filtering, auth, soft-delete, regression
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Phase 6.1: WebhookConfig model, CRUD router (/api/v1/webhooks, admin-only),
dispatch_webhook() with HMAC signing; integrated into test validation,
campaign completion, and MITRE sync job
- Phase 7.1: SMTP email service with send_test_validated_email,
send_campaign_completed_email, send_new_mitre_techniques_email;
notify_role_with_email() added to notification_service
- Phase 7.2: notification_preferences and jira_account_id on User model;
PATCH /users/me/preferences endpoint; Alembic migrations b031phase6 and b032phase7
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Create domain/errors.py as canonical error hierarchy: DomainError, InvalidStateTransition, PermissionViolation, BusinessRuleViolation, EntityNotFoundError, DuplicateEntityError
- InvalidOperationError now inherits from BusinessRuleViolation for semantic consistency
- Convert domain/exceptions.py to backward-compatible re-export shim with legacy aliases (DomainException, InvalidTransitionError, AuthorizationError)
- Update error_handler.py to import from domain/errors.py and map all new error types
- Update main.py to register DomainError (new base) as the exception handler root
Tarea 4.1 — OSINT Enrichment:
- Add OsintItem model with source_type, severity, CVSS metadata, review flag
- Add Alembic migration b022 with osint_items table and optimized indexes
- Add osint_enrichment_service with NVD API integration, deduplication, rate limiting
- Add OSINT router: GET /osint/items, /osint/summary, /osint/technique/{id}
- Add POST /osint/items/{id}/review to mark items as reviewed
- Add POST /osint/enrich/{technique_id} for manual single-technique enrichment
- Techniques with new CVEs are automatically flagged review_required=True
- Register weekly enrichment job in APScheduler
- Add NVD_API_KEY config setting for optional increased rate limits
Tarea 4.2 — Stale Coverage Detection:
- Add stale_detection_service that flags techniques with no validated test
in the last N days, or never-validated but with a coverage status
- Configurable threshold via STALE_THRESHOLD_DAYS setting (default 365)
- Register daily stale detection job in APScheduler
- Only flags techniques not already marked review_required
Pause/Resume timer:
- Add paused_at, red_paused_seconds, blue_paused_seconds fields to Test model
- Add pause_timer/resume_timer workflow functions with accumulated pause tracking
- Auto-resume on phase submit; subtract paused time from worklog duration
- Add POST /tests/{id}/pause-timer and resume-timer endpoints
- Update LiveTimer component with pause/resume button and paused visual state
- Wire pause/resume mutations through TestDetailPage and TestDetailHeader
Professional Reporting Engine - Fase 2:
- Add ReportEngine service with Jinja2 HTML rendering, WeasyPrint PDF, and docxtpl DOCX
- Add corporate CSS stylesheet with cover page, data tables, stats grid, findings
- Create purple_campaign, coverage_report, and executive_summary HTML templates
- Add report_generation_service collecting domain data for each report type
- Add professional_reports router: GET /reports/generate/purple-campaign/{id}, coverage-summary, executive-summary
- Add analytics router with flat JSON endpoints for PowerBI: /coverage, /tests, /trends, /operators
- Add advanced_metrics router: /coverage-by-tactic, /never-tested, /avg-validation-time, /detection-rate-trend
- Add weasyprint and docxtpl to requirements.txt
- Add REPORT_TEMPLATES_DIR, REPORT_OUTPUT_DIR, COMPANY_NAME, COMPANY_LOGO_PATH to config
Full Jira/Tempo pipeline: link Aegis entities to Jira issues, auto-sync
status hourly, log time internally with integrity hashing, and optionally
push worklogs to Tempo.
- 1.1 JiraLink model + Worklog model: Alembic migration b020 with indexes,
enums (jiralinkentitytype, jirasyncdirection), and integrity_hash column
- 1.2 Jira service: atlassian-python-api wrapper with lazy singleton client,
search/create/sync operations, feature-flagged via JIRA_ENABLED
- 1.3 Jira router: CRUD endpoints for /jira/links, /jira/search,
/jira/create-issue with audit logging and entity-to-issue auto-creation
- 1.4 Tempo service: worklog push via tempo-api-python-client, auto-log from
test completions when TEMPO_ENABLED, graceful fallback on failure
- 1.5 Worklog service + router: immutable internal time records with SHA-256
integrity hash, CRUD at /worklogs, /worklogs/{id}/verify endpoint
- 1.6 Frontend: JiraLinkPanel component (search, link, sync, unlink) and
WorklogTimeline component (timeline view, manual log form) integrated into
TestDetailPage sidebar, CampaignDetailPage grid, TechniqueDetailPage
- 1.7 Jira sync job: APScheduler hourly job syncs all links from Jira,
registered in background scheduler alongside existing jobs
Foundational changes required before any new feature work can begin.
- 0.1 Redis infrastructure: add redis:7-alpine to docker-compose dev and prod,
REDIS_URL config, singleton client in app/infrastructure/redis_client.py
- 0.2 Token blacklist on Redis SEC-001: replace in-memory dict with Redis SETEX
keyed by jti, auto-expiring TTL derived from token exp
- 0.3 Database indexes SR-006: Alembic migration b019 with 5 composite indexes
for scoring, MTTD/MTTR, remediation, and notification queries
- 0.4 Domain exceptions TD-003: app/domain/exceptions.py with typed errors,
error_handler middleware mapping them to HTTP, services decoupled from FastAPI
- 0.5 Fix silenced exceptions TD-007: replace 4 bare except-pass blocks in
test_workflow_service with logger.warning with exc_info
- 0.6 CI pipeline TD-009: GitHub Actions workflow with Postgres and Redis
service containers, ruff lint, pytest; ruff.toml for baseline config
Critical (1-3):
- Replace hardcoded admin credentials with secure auto-generation (seed.py)
- Enforce SECRET_KEY configuration, fail in production if missing (config.py)
- Add Zip Slip and Zip Bomb protection to all ZIP import services
High/Medium (4-9):
- Add 50MB file size limit and extension whitelist to evidence uploads
- Configure CORS origins via environment variable instead of hardcoded
- Migrate JWT storage from localStorage to HttpOnly cookies (frontend+backend)
- Add rate limiting (5/min) on login endpoint via slowapi
- Replace generic dict payloads with Pydantic schemas (mass assignment)
Medium (10-17):
- Check is_active on login to prevent disabled users from authenticating
- Sanitize exception messages in API responses (system, data_sources)
- Escape LIKE wildcards in all ilike search filters across 8 routers
- Run Docker container as non-root user (appuser)
- Make MINIO_SECURE configurable via environment variable
- Add password complexity policy (12+ chars, upper/lower/digit/special)
- Implement JWT token revocation via in-memory blacklist + reduce TTL to 15min
- Replace xml.etree with defusedxml to prevent Billion Laughs attacks
Low (18-20):
- Add security headers to Nginx (CSP, X-Frame-Options, HSTS-ready, etc.)
- Disable Swagger UI/ReDoc/OpenAPI in production
- Restrict /health endpoint to internal networks via Nginx ACL
Also: rewrite install.sh as interactive wizard for guided deployment,
fix test-from-template validation error (technique_id UUID vs MITRE ID)
T-109: Rewrite tests router with full Red/Blue workflow endpoints - list with filters, create from template, Red/Blue team updates with state guards, start-execution, submit-red, submit-blue, validate-red, validate-blue, reopen, and timeline. All using workflow service from Phase 11.
T-110: Rewrite evidence router with Red/Blue separation - upload with team field, list with team filter, delete with state-based permissions. Red Team edits in draft/red_executing, Blue Team in blue_evaluating, admin bypasses all.
T-111: Create test_templates router with full CRUD - paginated list with source/platform/severity/search filters, by-technique lookup, admin-only create/update, and soft delete. Registered in main.py.
T-112: Add POST /system/import-atomic-tests endpoint to system router - admin-only trigger for Atomic Red Team import with error handling and statistics response.
Includes validation tests for all four tasks (35 checks total).
- Add GET /metrics/summary endpoint with global coverage counts and percentage
- Add GET /metrics/by-tactic endpoint with per-tactic coverage breakdown
- Handle multi-tactic techniques (comma-separated) counting in each tactic
- Add CoverageSummary and TacticCoverage Pydantic schemas
- Update README with metrics endpoints and project structure
- Add MITRE sync service via TAXII 2.0 with GitHub fallback
- Upsert attack-pattern objects into techniques table (691 techniques)
- Detect name/description changes and flag review_required on re-sync
- Add APScheduler background job running every 24h
- Add POST /system/sync-mitre endpoint (admin only)
- Add GET /system/scheduler-status endpoint (admin only)
- Configure logging for scheduler and sync visibility
- Update README with new endpoints and project structure
- Add Pydantic schemas for Technique, Test and Evidence
- Add CRUD endpoints for Techniques (list with filters, detail, create, update, review)
- Add CRUD endpoints for Tests (create, detail, update, validate, reject)
- Add evidence upload with SHA-256 integrity and presigned download URLs
- Add MinIO/S3 storage client with bucket auto-creation on startup
- Add status_service to recalculate technique coverage from test results
- Add require_any_role RBAC dependency for multi-role authorization
- Update README with API endpoints reference and project structure
This commit establishes the foundational infrastructure for the Aegis
MITRE ATT&CK Coverage Platform.
T-001: Initialize project and Docker Compose
- Set up Docker Compose with PostgreSQL 15, MinIO, and FastAPI backend
- Create basic FastAPI application with health endpoint
- Configure persistent volumes for data storage
T-002: Configuration and database connection
- Add centralized configuration using pydantic-settings
- Implement SQLAlchemy database connection with session management
- Configure MinIO and JWT settings
T-003: Initialize Alembic for migrations
- Set up Alembic with PostgreSQL connection from settings
- Create initial empty migration
- Configure autogenerate support for future models
Also includes:
- Professional README with setup instructions
- Comprehensive .gitignore for Python/Node/Docker
- Project task plan (AegisTestPlan.md)