Commit Graph

126 Commits

Author SHA1 Message Date
kitos
6f4901b611 security: fix 6 vulnerabilities identified in SDLC audit
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- fix(auth): enforce API key scopes in require_role/require_any_role;
  attach _api_key_scopes to user on API key auth; add require_scope()
  dependency — scopes were stored but never enforced (CWE-285)

- fix(sso): read SECURE_COOKIES env var for SSO cookie instead of
  hardcoded secure=False — SAML sessions now respect HTTPS config (CWE-614)

- fix(webhooks): SSRF prevention — validate webhook URLs against private
  and reserved CIDRs at creation/update time (CWE-918)

- fix(knowledge): restrict playbook/lesson create, update and restore
  to admin/red_lead/blue_lead roles — was open to any authenticated user (CWE-284)

- fix(alerts): restrict alert acknowledge/resolve/dismiss to admin/lead
  roles — any user could silence security alerts (CWE-284)

- security: delete get_admin_creds.py, check_auth.py, deploy.py scripts
  containing hardcoded root SSH credentials and production DB access;
  add scripts/.gitignore to prevent reintroduction (CWE-798)
2026-05-22 09:46:29 +02:00
kitos
fc16675cf2 fix(alerts): import User model in operational_alert_service to fix NameError in _dispatch_inapp_notifications
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-05-21 17:11:35 +02:00
kitos
97349a1d13 feat(alerts): close Phase 13 gaps — hourly job + webhook + in-app notifications
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Add dispatch_webhook_targeted() to webhook_service for rule-specific delivery
- evaluate_all_rules() now dispatches in-app notifications (admins/leads) and
  webhooks after each alert fires (targeted + global alert.fired broadcast)
- APScheduler: _run_alert_evaluation() job registered hourly alongside existing jobs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 15:57:41 +02:00
kitos
d4b147da7c feat(alerts): Phase 13 — Operational Alert Engine
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
AlertRule + AlertInstance models (b041alerts migration), 8 pre-seeded system
rules (high_risk x2, stale_technique, coverage_regression, low_coverage,
expiry_wave, new_technique, orphan_spike), evaluation engine with per-rule
cooldown, full alert lifecycle (acknowledge/resolve/dismiss), custom rule CRUD,
and summary endpoint. Rules seeded at app startup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 15:25:55 +02:00
kitos
d81fc04b8f feat(enterprise): Phase 14 — API Key Management + SSO/SAML 2.0
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- ApiKey model (SHA-256 hash, prefix, scopes, expiry) + Alembic migration (b040ent)
- SsoConfig model for SAML 2.0 IdP settings (attribute mapping, auto-provision)
- API key auth integrated into get_current_user (aegis_ prefix detection)
- Routers: /api/v1/api-keys (full CRUD + revoke) and /api/v1/sso (metadata, login, callback, config)
- python3-saml added to requirements; Dockerfile adds libxmlsec1-dev for SAML XML signing
- QA script: 52 assertions covering key lifecycle, API key auth, SSO config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 16:43:57 +02:00
kitos
ab591d30c4 feat(dashboard): Phase 13 — Executive Dashboard
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
PostureSnapshot model, Alembic migration (b039exec), schemas, service
aggregating all phases (coverage/risk/operations/knowledge/MTTD), and
router at /api/v1/dashboard with executive view, KPIs, coverage-by-tactic,
posture-history, posture-snapshot, and activity-feed endpoints.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 16:20:21 +02:00
kitos
41a0c536bb fix(risk): fix remaining t.technique_id → t.mitre_id in get_recommendations
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-05-20 16:11:48 +02:00
kitos
7fae4783a2 fix(risk): Technique uses status_global and mitre_id (not status/technique_id)
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-05-20 15:59:26 +02:00
kitos
084ea4c0b2 fix(risk): correct TechniqueConfidenceScore fields, TechniqueStatus values, Test.result usage
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-05-20 15:58:03 +02:00
kitos
362a17aa1b feat(risk): Phase 12 — Risk Intelligence [FASE-12]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- TechniqueRiskProfile model: per-technique risk scoring (0-100)
- 4-factor weighted scoring: detection_gap(35%) + threat_actors(30%) + osint(20%) + test_failures(15%)
- Risk levels: critical(≥75) / high(≥50) / medium(≥25) / low(≥10) / info
- Detailed scoring_breakdown (JSONB) + actionable recommendations per technique
- Router /api/v1/risk: compute-all, compute-one, list, matrix, summary, recommendations, top
- Alembic migration b038risk (raw SQL, idempotent)
- QA script: 60+ tests across all endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 15:31:38 +02:00
kitos
4fba4152d9 fix(knowledge): use EntityNotFoundError/DuplicateEntityError instead of DomainError(status_code=)
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-05-20 15:21:36 +02:00
kitos
4f5370db89 feat(knowledge): Phase 11 — Knowledge Management (Playbooks + Lessons Learned) [FASE-11]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Playbooks: versioned Markdown runbooks per technique × type (attack/detect/investigate/respond/hunt)
- PlaybookVersion: immutable snapshots on every update; restore to any previous version
- LessonLearned: post-mortem records linked to tests/campaigns/attack-paths or manual
- Alembic migration b037know (raw SQL, idempotent, no PostgreSQL enums)
- Router /api/v1/knowledge: 14 endpoints for playbooks + lessons + stats
- Pydantic validators for playbook_type, severity, entity_type (422 on invalid)
- Knowledge stats endpoint: totals + breakdown by severity and playbook type
- Soft-delete on both resources; include_inactive filter for admin recovery
- QA script: 70+ tests across CRUD, versioning, filtering, auth, soft-delete, regression

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:39:05 +02:00
kitos
080ce56de7 feat(attack-paths): Phase 10 — Attack Paths & Advanced Purple Team [FASE-10]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
Models (5 tables):
  - AttackPath: named reusable attack scenario with template flag
  - AttackPathStep: ordered kill-chain step (technique + test link)
  - AttackPathExecution: a run with Red/Blue leads, timing, stored metrics
  - AttackPathStepResult: per-step detected/not_detected/skipped result
  - TimelineEntry: timestamped Red/Blue/system actions for MTTD/MTTR

Migration b036atk: raw SQL to avoid SQLAlchemy DDL hook issues

Service (attack_path_service.py):
  - Full CRUD for paths + steps (add, update, delete, reorder)
  - Execution lifecycle: create → start → execute steps → complete/abort
  - Pre-creates pending step results on execution creation
  - Auto-adds system timeline entries on key state transitions
  - complete_execution() computes: detection_rate, mttd_seconds,
    furthest_undetected_step, detected/not_detected/skipped counts
  - get_kill_chain_metrics(): per-step breakdown + phase summary

Router /api/v1/attack-paths (20 endpoints):
  POST/GET/PATCH/DELETE attack paths
  GET/POST/PATCH/DELETE steps + reorder
  POST/GET executions per path
  GET/POST/start/complete/abort executions
  POST/GET step results
  POST/GET timeline entries
  GET kill-chain metrics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-20 13:11:01 +02:00
kitos
a8b4518485 feat(ownership): Phase 9 — Ownership & Daily Operations [FASE-9]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
Backend:
- TechniqueOwnership model: per-technique owner, backup owner, team
- RevalidationQueueItem model: prioritised analyst work queue
  (critical/high/medium/low, reasons: validation_expired/infra_change/
   osint_alert/mitre_update/rule_modified/low_confidence/manual)
- Migration b035ownerq: creates technique_ownerships and
  revalidation_queue_items tables with full indexes

Services:
- ownership_service: set/get technique ownership, bulk assign by tactic
  or platform, orphan reports for techniques and assets
- revalidation_queue_service: smart queue generation (scans expired
  validations, low-confidence techniques, recent infra changes),
  list/create/update queue items, analyst dashboard

Router /api/v1/ownership:
  GET/PUT /ownership/techniques/{id}   — technique ownership
  PATCH   /ownership/assets/{id}       — asset ownership
  GET     /ownership/orphans/techniques — orphan report
  GET     /ownership/orphans/assets     — orphan report
  POST    /ownership/bulk-assign        — bulk by tactic/platform
  GET/POST /ownership/queue             — revalidation queue CRUD
  PATCH   /ownership/queue/{id}         — update item status/assignee
  POST    /ownership/queue/generate     — scan & generate items
  GET     /ownership/analyst-dashboard  — personalised daily view

Scheduler: queue_generation job daily at 02:30 (after decay engine)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:48:47 +02:00
kitos
89a951c2a2 fix(decay-engine): strip tzinfo from validated_at before datetime arithmetic
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
The previous fix changed _now() to return naive UTC, but the code still
called .replace(tzinfo=utc) on most_recent (from DB) before subtracting.
This caused "can't subtract offset-naive and offset-aware datetimes".
Now we strip tzinfo if present, keeping everything naive UTC consistently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:35:02 +02:00
kitos
9a020f97ef fix(detection-lifecycle): fix timezone naive/aware mismatch and duplicate technique mapping
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Replace datetime.now(timezone.utc) with datetime.utcnow() in _now() across
  all three Phase 8 files to match DB DateTime column type (naive UTC)
- Guard POST /assets/{id}/techniques/{tid} against duplicate mappings:
  if mapping already exists, update coverage_type/confidence_level instead
  of inserting a duplicate row

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 16:29:04 +02:00
kitos
1fe150963c feat(dlm): Phase 8 — Detection Lifecycle Management [FASE-8]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
Tasks 8.1-8.5:

Models (8.1):
- DetectionAsset: SIEM/EDR/Sigma rule assets with auto-hash
- DetectionTechniqueMapping: N:M asset ↔ technique coverage
- DetectionValidation: immutable validation records with expiry
- TechniqueConfidenceScore: computed multi-factor confidence
- InfrastructureChangeLog: infra changes that invalidate detections
- DecayPolicy: configurable freshness thresholds per platform/tactic

Services (8.2, 8.3):
- detection_asset_service: CRUD + SHA-256 rule hashing + auto-
  invalidation on rule/infra changes
- decay_engine_service: daily decay engine — expires stale validations,
  recalculates confidence (recency/coverage/health/diversity factors),
  processes infrastructure change propagation

Router (8.4): 15 endpoints under /api/v1/detection-lifecycle:
  assets CRUD, technique mappings, validations, confidence scores,
  infrastructure changes, decay trigger, executive dashboard

Scheduler (8.3): decay engine runs daily at 02:00
Seed (8.5): default policy (90/180/365d) + strict initial-access policy
Migration: b034dlm (6 tables, 11 indexes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 15:45:16 +02:00
kitos
0e1b8e2b39 feat(settings): Settings page with email, webhooks, notifications, profile [FASE-8]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- SystemConfig model + migration b033 for runtime key-value config
- GET/PATCH /system/email-config + POST /system/email-test (admin only)
- email_service reads SMTP config from DB (overrides .env)
- Webhooks now accessible to red_lead/blue_lead + admin
- GET /users/me already existed; /users/me/preferences already working
- SettingsPage with 4 role-aware tabs:
  * Profile & Jira: jira_account_id, user info
  * Notifications: role-specific email/in-app toggles (12 prefs)
  * Webhooks: full CRUD + test ping (leads + admin)
  * Email/SMTP: enable toggle, server config, test email (admin only)
- Added /settings route (all authenticated users)
- Settings link added to Sidebar

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 15:10:31 +02:00
kitos
c1e06d4c0a feat(phases): implement webhooks (6.1), email (7.1), user preferences (7.2)
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Phase 6.1: WebhookConfig model, CRUD router (/api/v1/webhooks, admin-only),
  dispatch_webhook() with HMAC signing; integrated into test validation,
  campaign completion, and MITRE sync job
- Phase 7.1: SMTP email service with send_test_validated_email,
  send_campaign_completed_email, send_new_mitre_techniques_email;
  notify_role_with_email() added to notification_service
- Phase 7.2: notification_preferences and jira_account_id on User model;
  PATCH /users/me/preferences endpoint; Alembic migrations b031phase6 and b032phase7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 13:40:45 +02:00
kitos
63da22b77e fix(qa): 5 bug fixes — audit dates, CSP, template modal, MITRE sync timeout, data source auto-sync
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- audit_service: set timestamp=datetime.now(utc) explicitly so DB never stores NULL
- AuditLogPage: formatDate handles null/undefined timestamps (was showing Jan 1 1970)
- nginx.conf: add CSP script-src hash for inline script (sha256-31OgE8E9...)
- system.py: MITRE sync now runs in BackgroundTasks — returns immediately, no more 120s timeout
- mitre_sync_job.py: add _run_data_sources_sync job (every 6h) that checks sync_frequency
  and auto-syncs overdue enabled data sources
- SystemPage: MITRE sync result shows "started" vs "complete" message
- test-templates.ts: add updateTemplate() API function
- SystemPage: template name cell is now clickable — opens TemplateDetailModal with
  full edit form (name, description, procedure, detection, platform, severity, tool)
  and Save / Activate / Deactivate / Close buttons

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-19 12:05:35 +02:00
05b221a22d feat(scoring): composite recency decay and severity weights persisted in DB [FASE-5.1] 2026-05-18 15:07:12 +02:00
2ee59d4e18 test(intel): verify OSINT enrichment and stale coverage detection [FASE-4] 2026-05-18 14:50:31 +02:00
c0aff4cbeb feat(audit): enhanced audit trail with IP, user-agent and integrity hash [FASE-3.1] 2026-05-18 14:16:18 +02:00
a8a24b5429 fix(metrics): correct never-tested technique query [FASE-2.6]
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
Use distinct technique_id list filtering so untested techniques are returned reliably on SQLite and Postgres.
2026-05-18 14:00:48 +02:00
ed2c34ef28 feat(reports): extend report generation service [FASE-2.3]
Add quarterly summary and technique detail builders with UUID-safe lookups and unit tests for purple campaign context.
2026-05-18 14:00:42 +02:00
c28a47c43b test(reports): add ReportEngine unit tests [FASE-2.1]
Stub WeasyPrint for CI-friendly PDF generation and verify HTML render, PDF path, and HTML file output.
2026-05-18 14:00:37 +02:00
03d7d1cc80 feat(tempo): harden worklog sync and add tests [FASE-1.4]
Add tempo-api-python-client dependency, TEMPO_API_VERSION setting, enum-safe Jira link lookup, work type on create_worklog, and mocked auto_log tests.
2026-05-18 13:36:26 +02:00
79a4772ab5 feat: make heatmap layers extensible via LayerRegistry (OCP) 2026-02-20 16:07:36 +01:00
a9255e15ce refactor: remove db.commit() from audit_service.log_action, all callers use UoW 2026-02-20 15:33:23 +01:00
14d995b40c refactor: remove db.commit() from business services, callers use UnitOfWork (Tier 3) 2026-02-20 14:42:20 +01:00
339d669498 feat: move all remaining inline logic from routers to services (Tier 2) 2026-02-20 14:34:24 +01:00
9e22fde746 feat: extract advanced_metrics, analytics, test_templates, and auth to services (Tier 1 complete) 2026-02-20 14:28:52 +01:00
d77075272e feat: add ImportService protocol and registry for OCP-compliant import extensibility (LP-7) 2026-02-20 13:31:18 +01:00
c0c6cda11d feat: add Campaign/Compliance domain entities and extract users/audit/data_sources to services (LP-2 through LP-6) 2026-02-20 13:28:14 +01:00
f4c74230ec refactor(campaigns): extract CRUD/business logic to campaign_crud_service, use domain exceptions 2026-02-19 19:04:32 +01:00
50b70704ae refactor(evidence): extract permission validation and queries to evidence_service, use domain exceptions 2026-02-19 19:02:36 +01:00
20738d11b3 refactor(tests): extract CRUD/query logic to test_crud_service, router delegates to service with domain exceptions 2026-02-19 18:35:09 +01:00
4e3787d091 refactor(scoring): persist weights in DB table, replace mutable Settings with scoring_config_service 2026-02-19 17:46:02 +01:00
93fde55389 refactor(threat-actors): extract query/business logic to threat_actor_service, fix N+1 with grouped subqueries
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
2026-02-19 17:40:00 +01:00
560fc0c9f0 refactor(detection-rules): extract query/business logic to detection_rule_service, router is thin HTTP adapter 2026-02-19 17:39:31 +01:00
d305db8794 refactor(compliance): extract business logic to compliance_service, use domain exceptions instead of HTTPException 2026-02-19 17:06:32 +01:00
25fddad17c refactor(metrics): extract query logic to metrics_query_service, thin down router to HTTP adapter 2026-02-19 17:06:07 +01:00
8d5c5fa80e refactor(reports): extract query and aggregation logic to coverage_report_service, fix N+1 test-count pattern 2026-02-19 15:56:42 +01:00
42a9f4dcd4 refactor(status): consolidate status_service to delegate to TechniqueEntity.recalculate_status() eliminating duplicated business logic 2026-02-19 15:23:01 +01:00
e651ef8a8c refactor(heatmap): extract business logic to dedicated service
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
Move layer dispatch, entity-not-found checks, and validation from router to heatmap_service. Router now only validates requests, calls service, and formats responses (no HTTPException, no business logic). Service raises EntityNotFoundError/BusinessRuleViolation instead of returning None. Add build_navigator_export() for centralized dispatch. 29 new tests (253 total, 0 failures).
2026-02-18 16:09:51 +01:00
1338d52cd0 fix(workflow): enforce domain state machine in dual validation path
validate_as_red/blue_lead now delegate to TestEntity. check_dual_validation routes through entity instead of assigning test.state directly. Side effects dispatched via domain events. Entity raises InvalidOperationError for backward compat. Removed 4 dead V1 xfail tests, fixed 2 real test issues. 224 passed, 0 xfailed.
2026-02-18 15:49:59 +01:00
576705d61d refactor(workflow): delegate start_execution to TestEntity
Replace manual state+field mutation with entity.start_execution() and apply_to(), keeping audit logging and notifications at the service layer.
2026-02-18 15:29:36 +01:00
633c8e46ad refactor(workflow): delegate transition_state to TestEntity
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
transition_state() now hydrates a TestEntity from the ORM model and delegates state validation to entity.transition_to(). The entity is authoritative for which transitions are valid; VALID_TRANSITIONS and can_transition() are kept for backward compatibility.

Also adds public transition_to() method to TestEntity as the stable API surface for callers that need a single validated transition without lifecycle side-effects.
2026-02-18 13:54:01 +01:00
6147abc87a refactor(heatmap): extract business logic to dedicated service
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Create heatmap_service.py with all layer-building logic (coverage, threat-actor, detection-rules, campaign)

- Service is framework-agnostic: no FastAPI imports, no HTTPException, no db.commit()

- Fix N+1 in coverage and threat-actor layers: bulk-fetch test_counts and rule_counts with GROUP BY

- Router reduced from 528 to 140 lines: validates request, calls service, returns response
2026-02-18 13:14:41 +01:00
bfce1a8a0e refactor(core): introduce Unit of Work and remove commits from services
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
- Add UnitOfWork context manager in domain/unit_of_work.py with commit/rollback/flush API and auto-rollback on exception

- Remove all db.commit() from test_workflow_service (8 calls), notification_service (4 calls), status_service (1 call)

- Services now only stage changes via db.add/db.flush; caller owns the transaction boundary

- Update routers/tests.py: wrap 9 workflow endpoints in UnitOfWork context managers

- Update routers/notifications.py: wrap mark_as_read and mark_all_as_read in UnitOfWork
2026-02-18 12:51:55 +01:00