# Aegis — Target Architecture: Clean Modular Monolith > **Author:** Architecture review > **Date:** February 11, 2026 (updated February 18, 2026) > **Status:** In Progress — foundational layers implemented > **Depends on:** ARCHITECTURAL_ANALYSIS.md, DEPENDENCY_ANALYSIS.md, TECH_DEBT_AND_RISKS.md > > **Implementation Progress (Feb 18, 2026):** > - ✅ Domain exceptions hierarchy (`domain/errors.py`, `domain/exceptions.py`) > - ✅ Error handler middleware (`middleware/error_handler.py`) > - ✅ TestEntity with full state machine (`domain/test_entity.py`) > - ✅ TechniqueEntity with status recalculation (`domain/entities/technique.py`) > - ✅ Value objects: MitreId, ScoringWeights (`domain/value_objects/`) > - ✅ Repository ports/protocols (`domain/ports/repositories/`) > - ✅ SQLAlchemy repository implementations (`infrastructure/persistence/repositories/`) > - ✅ ORM-Entity mappers (`infrastructure/persistence/mappers/`) > - ✅ FastAPI dependency wiring (`dependencies/repositories.py`) > - ✅ Unit of Work (`domain/unit_of_work.py`) > - ✅ Redis-backed token blacklist (`infrastructure/redis_client.py`) > - ✅ CI pipeline (`.github/workflows/ci.yml`) > - ✅ 326 tests passing (domain unit tests + integration tests + API tests) > - ✅ Architecture rules file (`.cursor/rules/aegis-architecture.md`) > > **Remaining:** Application layer use cases, Campaign/Compliance domain entities, router migration to repositories, scoring config persistence, structured logging. --- ## Table of Contents 1. [Target Architecture Overview](#1-target-architecture-overview) 2. [Layer Definitions and Responsibilities](#2-layer-definitions-and-responsibilities) 3. [Module Boundaries](#3-module-boundaries) 4. [Dependency Rules](#4-dependency-rules) 5. [Top 5 Modules to Refactor First](#5-top-5-modules-to-refactor-first) 6. [Repository Pattern for Technique](#6-repository-pattern-for-technique) --- ## 1. Target Architecture Overview ### Design Philosophy The target architecture applies Clean Architecture principles to a modular monolith. This is not a microservices migration — it is an internal reorganization of the existing codebase to enforce separation of concerns, dependency inversion, and testability while maintaining a single deployable unit. ### Target Directory Structure ``` backend/ └── app/ ├── main.py # FastAPI app bootstrap (minimal) ├── config.py # Pydantic Settings (read-only) │ ├── domain/ # ★ DOMAIN LAYER │ ├── __init__.py │ │ │ ├── enums.py # TechniqueStatus, TestState, TeamSide, TestResult │ │ # (moved from models/enums.py — these are domain concepts) │ │ │ ├── exceptions.py # Domain exception hierarchy │ │ # EntityNotFoundError │ │ # DuplicateEntityError │ │ # InvalidTransitionError │ │ # InvalidOperationError │ │ # AuthorizationError │ │ │ ├── events.py # Domain event definitions (data classes) │ │ # TestStateChanged, TechniqueStatusRecalculated, │ │ # CampaignCompleted, EvidenceUploaded │ │ │ ├── entities/ # Rich domain entities with behavior │ │ ├── __init__.py │ │ ├── technique.py # TechniqueEntity: recalculate_status(), mark_reviewed() │ │ ├── test.py # TestEntity: can_transition(), start_execution(), │ │ │ # submit_red(), submit_blue(), validate(), reopen() │ │ ├── campaign.py # CampaignEntity: add_test(), remove_test(), activate(), │ │ │ # complete(), has_circular_dependency() │ │ ├── user.py # UserEntity: has_role(), can_access() │ │ ├── detection_rule.py # DetectionRuleEntity │ │ ├── threat_actor.py # ThreatActorEntity │ │ └── evidence.py # EvidenceEntity: validate_upload_permission() │ │ │ ├── value_objects/ # Immutable, equality-by-value │ │ ├── __init__.py │ │ ├── mitre_id.py # MitreId: validated format (T1059, T1059.001) │ │ ├── score.py # TechniqueScore, TacticScore, OrgScore (with breakdown) │ │ └── scoring_weights.py # ScoringWeights: validated weight set (sum == 100) │ │ │ └── ports/ # ★ INTERFACES — the contracts │ ├── __init__.py │ ├── repositories/ # Data access contracts (one per aggregate root) │ │ ├── __init__.py │ │ ├── technique_repository.py # TechniqueRepository protocol │ │ ├── test_repository.py # TestRepository protocol │ │ ├── campaign_repository.py # CampaignRepository protocol │ │ ├── user_repository.py # UserRepository protocol │ │ ├── detection_rule_repository.py │ │ ├── threat_actor_repository.py │ │ ├── evidence_repository.py │ │ ├── audit_repository.py │ │ ├── notification_repository.py │ │ └── snapshot_repository.py │ │ │ └── services/ # External capability contracts │ ├── __init__.py │ ├── storage_port.py # StoragePort: upload_file(), get_download_url() │ ├── event_publisher_port.py # EventPublisherPort: publish(DomainEvent) │ └── token_blacklist_port.py # TokenBlacklistPort: revoke(), is_revoked() │ ├── application/ # ★ APPLICATION LAYER │ ├── __init__.py │ │ │ ├── interfaces/ # Application-level contracts │ │ ├── __init__.py │ │ └── unit_of_work.py # UnitOfWork protocol: commit(), rollback(), __enter__/__exit__ │ │ │ ├── dto/ # Input/output data structures for use cases │ │ ├── __init__.py # Pure data classes — no ORM, no Pydantic │ │ ├── technique_dto.py # TechniqueListFilters, TechniqueResult, TechniqueDetail │ │ ├── test_dto.py # CreateTestInput, TestResult, TestTimeline │ │ ├── scoring_dto.py # ScoreRequest, ScoreResult, ScoreHistoryResult │ │ ├── heatmap_dto.py # HeatmapFilters, HeatmapLayer, NavigatorExport │ │ ├── report_dto.py # CoverageReportResult, CsvExportResult │ │ └── campaign_dto.py # CreateCampaignInput, CampaignProgress │ │ │ └── use_cases/ # Orchestrators — one class per operation │ ├── __init__.py │ │ │ ├── techniques/ │ │ ├── list_techniques.py # ListTechniquesUseCase │ │ ├── get_technique.py # GetTechniqueUseCase │ │ ├── create_technique.py # CreateTechniqueUseCase │ │ ├── update_technique.py # UpdateTechniqueUseCase │ │ └── review_technique.py # ReviewTechniqueUseCase │ │ │ ├── tests/ │ │ ├── create_test.py # CreateTestUseCase │ │ ├── create_from_template.py # CreateFromTemplateUseCase │ │ ├── start_execution.py # StartExecutionUseCase │ │ ├── submit_red.py # SubmitRedUseCase │ │ ├── submit_blue.py # SubmitBlueUseCase │ │ ├── validate_test.py # ValidateTestUseCase │ │ ├── reopen_test.py # ReopenTestUseCase │ │ └── get_retest_chain.py # GetRetestChainUseCase │ │ │ ├── scoring/ │ │ ├── calculate_technique_score.py │ │ ├── calculate_tactic_score.py │ │ ├── calculate_org_score.py │ │ └── update_scoring_weights.py │ │ │ ├── heatmap/ │ │ ├── generate_coverage_layer.py │ │ ├── generate_actor_layer.py │ │ ├── generate_detection_layer.py │ │ └── export_navigator.py │ │ │ ├── reports/ │ │ ├── generate_coverage_report.py │ │ ├── generate_test_results_report.py │ │ ├── generate_remediation_report.py │ │ └── export_coverage_csv.py │ │ │ └── campaigns/ │ ├── create_campaign.py │ ├── manage_campaign_tests.py │ ├── activate_campaign.py │ ├── generate_from_threat_actor.py │ └── schedule_recurring.py │ ├── infrastructure/ # ★ INFRASTRUCTURE LAYER │ ├── __init__.py │ │ │ ├── persistence/ │ │ ├── __init__.py │ │ ├── database.py # Engine, SessionLocal, get_db() — unchanged │ │ │ │ │ ├── orm/ # SQLAlchemy models (table mapping ONLY) │ │ │ ├── __init__.py # Re-export all models for Alembic │ │ │ ├── base.py # declarative_base() │ │ │ ├── technique_model.py # Current models/technique.py — unchanged │ │ │ ├── test_model.py # Current models/test.py — unchanged │ │ │ ├── campaign_model.py │ │ │ ├── user_model.py │ │ │ └── ... # All 18 current models, untouched │ │ │ │ │ ├── repositories/ # Concrete repository implementations │ │ │ ├── __init__.py │ │ │ ├── sa_technique_repository.py │ │ │ ├── sa_test_repository.py │ │ │ ├── sa_campaign_repository.py │ │ │ └── ... # One per domain port │ │ │ │ │ ├── unit_of_work.py # SQLAlchemy UoW (wraps Session commit/rollback) │ │ │ │ │ └── mappers/ # ORM Model ↔ Domain Entity converters │ │ ├── __init__.py │ │ ├── technique_mapper.py # to_entity(model) → TechniqueEntity │ │ │ # to_model(entity) → TechniqueORM │ │ ├── test_mapper.py │ │ └── ... │ │ │ ├── storage/ │ │ └── minio_storage.py # Implements StoragePort (current storage.py logic) │ │ │ ├── auth/ │ │ ├── jwt_service.py # Token creation and verification │ │ └── redis_token_blacklist.py # Implements TokenBlacklistPort │ │ │ ├── external/ # External data source adapters │ │ ├── mitre_taxii_adapter.py # Current mitre_sync_service.py │ │ ├── atomic_red_team_adapter.py # Current atomic_import_service.py │ │ ├── sigma_adapter.py │ │ ├── elastic_adapter.py │ │ ├── caldera_adapter.py │ │ ├── d3fend_adapter.py │ │ ├── lolbas_adapter.py │ │ └── threat_actor_adapter.py │ │ │ ├── events/ │ │ └── sync_event_publisher.py # Implements EventPublisherPort (in-process dispatch) │ │ │ ├── cache/ │ │ └── redis_score_cache.py # Replaces current in-memory score_cache.py │ │ │ └── jobs/ │ └── scheduler.py # APScheduler setup (current mitre_sync_job.py) │ └── presentation/ # ★ PRESENTATION LAYER ├── __init__.py │ ├── api/ │ └── v1/ # Thin routers — HTTP mapping only │ ├── __init__.py │ ├── techniques.py # Injects use case via Depends(), maps exceptions │ ├── tests.py │ ├── campaigns.py │ ├── heatmap.py │ ├── reports.py │ ├── scores.py │ ├── metrics.py │ └── ... # All 21 current routers, thinned │ ├── schemas/ # Pydantic models (request/response shapes) │ ├── __init__.py # Current schemas/ — unchanged │ ├── technique_schema.py │ ├── test_schema.py │ └── ... │ ├── dependencies/ # FastAPI Depends() wiring │ ├── __init__.py │ ├── auth.py # Current dependencies/auth.py │ ├── repositories.py # get_technique_repo(), get_test_repo(), ... │ └── use_cases.py # get_create_technique_use_case(), ... │ ├── middleware/ │ ├── error_handler.py # Maps domain exceptions → HTTP responses │ └── rate_limiter.py │ └── mappers/ # Pydantic schema ↔ application DTO converters ├── __init__.py ├── technique_mapper.py # TechniqueCreate → CreateTechniqueInput │ # TechniqueResult → TechniqueOut └── ... ``` --- ## 2. Layer Definitions and Responsibilities ### Domain Layer — The Core ``` Depends on: NOTHING (zero imports from outside domain/) ``` | Component | Responsibility | What It Must NOT Do | |-----------|---------------|---------------------| | **Entities** | Encapsulate business rules, invariants, and state transitions. A `TestEntity` knows which transitions are valid. A `TechniqueEntity` can recalculate its own status from a list of test results. | Import SQLAlchemy, FastAPI, Pydantic, or any framework. Access the database. Make HTTP calls. | | **Value Objects** | Represent domain concepts with value equality. `MitreId("T1059.001")` validates format on construction. `ScoringWeights` ensures the 5 weights sum to 100. | Be mutable. Have identity (no primary key). | | **Enums** | Define domain vocabularies: `TechniqueStatus`, `TestState`, `TeamSide`, `TestResult`. | Change based on infrastructure (these are the same enums currently in `models/enums.py`). | | **Exceptions** | Domain-specific error conditions. `InvalidTransitionError(current=draft, target=validated)`. | Reference HTTP status codes. Know about FastAPI. | | **Events** | Facts about things that happened. `TestStateChanged(test_id, old_state, new_state, user_id, timestamp)`. | Carry behavior. Know how they will be handled. | | **Ports** | Interfaces (Protocol) defining what the domain needs from the outside world. `TechniqueRepository`, `StoragePort`, `EventPublisherPort`. | Contain implementations. Reference concrete classes. | ### Application Layer — The Orchestrators ``` Depends on: domain/ only ``` | Component | Responsibility | What It Must NOT Do | |-----------|---------------|---------------------| | **Use Cases** | Orchestrate a single business operation by calling domain entities and ports. `CreateTechniqueUseCase` validates uniqueness via `TechniqueRepository`, constructs a `TechniqueEntity`, saves it, and publishes an event. | Know about HTTP, Pydantic, SQLAlchemy, or FastAPI. Contain business rules (those belong in entities). Contain queries (those belong in repositories). | | **DTOs** | Plain data containers for use case input/output. No validation logic, no ORM awareness. | Inherit from Pydantic `BaseModel`. Reference ORM models. | | **Unit of Work** | Interface for transaction boundaries. Use cases call `uow.commit()` or `uow.rollback()`. | Know about SQLAlchemy sessions. | ### Infrastructure Layer — The Implementations ``` Depends on: domain/ (implements ports), application/ (implements UoW) ``` | Component | Responsibility | What It Must NOT Do | |-----------|---------------|---------------------| | **ORM Models** | Map Python classes to database tables. Unchanged from current `models/`. | Contain business logic. Be passed outside the infrastructure layer (use mappers to convert to domain entities). | | **Repositories** | Implement port interfaces using SQLAlchemy. `SATechniqueRepository.find_by_mitre_id()` translates to `db.query(Technique).filter(...)`. | Be called by anything outside the application layer. Contain business decisions. | | **Mappers** | Convert between ORM models and domain entities. `TechniqueMapper.to_entity(orm_model) → TechniqueEntity`. | Contain business logic. Be a 1:1 field copy (they handle relationship loading and value object construction). | | **External Adapters** | Implement data source integrations. Download ZIPs, parse YAML/TOML/STIX, return domain-compatible data. | Be called from routers directly. Know about HTTP responses. | | **Storage, Cache, Auth** | Implement service ports. `MinioStorage` implements `StoragePort`. `RedisTokenBlacklist` implements `TokenBlacklistPort`. | Leak implementation details (Redis keys, S3 bucket names) outside the infrastructure layer. | ### Presentation Layer — The HTTP Boundary ``` Depends on: application/ (calls use cases), domain/ (reads exceptions) ``` | Component | Responsibility | What It Must NOT Do | |-----------|---------------|---------------------| | **Routers** | Map HTTP requests to use case calls. Parse path/query/body parameters, call the use case, return the response. 10-20 lines per endpoint maximum. | Contain business logic. Execute database queries. Build complex data structures. | | **Schemas** | Pydantic models for HTTP request/response validation. Unchanged from current `schemas/`. | Be used inside use cases or domain entities. | | **Dependencies** | Wire use cases via FastAPI `Depends()`. Construct repositories, inject into use cases, return. | Contain logic beyond wiring. | | **Error Handler** | Map domain exceptions to HTTP responses. `EntityNotFoundError → 404`, `InvalidTransitionError → 400`, `AuthorizationError → 403`. | Know about business rules. | | **Mappers** | Convert between Pydantic schemas and application DTOs. | Contain business logic. | --- ## 3. Module Boundaries The monolith is organized into domain modules. Each module owns its entities, repositories, and use cases. Cross-module communication goes through application-layer use cases or domain events — never through direct repository access. ``` ┌─────────────────────────────────────────────────────────────────┐ │ Domain Modules │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────┐ │ │ │ Technique │ │ Test │ │ Campaign │ │ Scoring │ │ │ │ │ │ │ │ │ │ │ │ │ │ entity │ │ entity │ │ entity │ │ value objs │ │ │ │ repo port │ │ repo port │ │ repo port │ │ use cases │ │ │ │ use cases │ │ use cases │ │ use cases │ │ (reads from │ │ │ │ │ │ │ │ │ │ other repos)│ │ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └──────┬──────┘ │ │ │ │ │ │ │ │ ┌─────┴──────────────┴──────────────┴───────────────┴──────┐ │ │ │ Shared Domain: enums, exceptions, events │ │ │ └───────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌─────────────┐ │ │ │ Heatmap │ │ Reports │ │Compliance │ │ Threat Intel│ │ │ │ │ │ │ │ │ │ │ │ │ │ use cases │ │ use cases │ │ use cases │ │ adapters │ │ │ │ (reads │ │ (reads │ │ (reads │ │ use cases │ │ │ │ repos) │ │ repos) │ │ repos) │ │ │ │ │ └───────────┘ └───────────┘ └───────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` **Cross-module rule:** A use case in the Scoring module may read from `TechniqueRepository` and `TestRepository` (both defined as ports in the domain layer). It must NOT import the SQLAlchemy model directly. --- ## 4. Dependency Rules ``` ┌─────────────────┐ │ Presentation │ Knows: FastAPI, Pydantic, HTTP │ (routers, │ Depends on: Application, Domain │ schemas) │ └────────┬─────────┘ │ calls use cases ┌────────▼─────────┐ │ Application │ Knows: Domain entities, ports, DTOs │ (use cases) │ Depends on: Domain ONLY └────────┬─────────┘ │ uses entities + ports ┌────────▼─────────┐ │ Domain │ Knows: NOTHING external │ (entities, │ Depends on: NOTHING │ ports, enums) │ (this is the core) └────────▲─────────┘ │ implements ports ┌────────┴─────────┐ │ Infrastructure │ Knows: SQLAlchemy, boto3, Redis, requests │ (repositories, │ Depends on: Domain (ports), Application (UoW) │ adapters) │ └──────────────────┘ ``` ### Import Rules (Enforceable by Linting) | From \ To | domain/ | application/ | infrastructure/ | presentation/ | |-----------|---------|-------------|----------------|--------------| | **domain/** | Self only | FORBIDDEN | FORBIDDEN | FORBIDDEN | | **application/** | ALLOWED | Self only | FORBIDDEN | FORBIDDEN | | **infrastructure/** | ALLOWED (ports) | ALLOWED (UoW) | Self only | FORBIDDEN | | **presentation/** | ALLOWED (exceptions) | ALLOWED (use cases, DTOs) | ALLOWED (wiring only, in dependencies/) | Self only | --- ## 5. Top 5 Modules to Refactor First ### Selection Criteria Each module is scored on three axes from the DEPENDENCY_ANALYSIS.md findings: | Axis | Weight | Measurement | |------|--------|-------------| | **Complexity** | 35% | Lines of code, number of DB operations, number of models imported, number of concerns mixed | | **Technical Risk** | 35% | N+1 queries, security issues, silent exception swallowing, framework coupling, scalability bottleneck | | **Business Impact** | 30% | Centrality to the domain (how many other modules depend on it), user-facing frequency, correctness criticality | --- ### #1: Test Workflow Module **Refactor scope:** `routers/tests.py` (664 lines, 30 db ops) + `services/test_workflow_service.py` (456 lines, 13 db ops) + `services/status_service.py` (47 lines) | Axis | Score | Evidence | |------|-------|----------| | Complexity | **10/10** | 664-line router with 15+ endpoints. Mixes CRUD, template instantiation, timeline queries, and workflow delegation. The workflow service itself is 456 lines with a state machine, notifications, and audit logging. | | Technical Risk | **10/10** | `test_workflow_service` imports `FastAPI.HTTPException` — the most severe framework coupling in the codebase. 4 `except Exception: pass` blocks silently swallow notification failures. No way to unit test the state machine without a database session. | | Business Impact | **10/10** | The Red/Blue validation workflow IS the core product. Every user role interacts with tests daily. A state transition bug could invalidate an entire assessment. 5 other modules depend on test data (scoring, heatmap, reports, metrics, campaigns). | **Why first:** This module contains the single most important business logic in Aegis (the test state machine), yet it has the most severe coupling problems (HTTPException in domain logic, swallowed exceptions). Extracting a `TestEntity` with the state machine as a domain object unlocks pure unit testing of the most critical business rules. **What to extract:** - `TestEntity` with `can_transition()`, `start_execution()`, `submit_red()`, `submit_blue()`, `validate()`, `reopen()` → `domain/entities/test.py` - `InvalidTransitionError`, `EntityNotFoundError` → `domain/exceptions.py` - `TestRepository` protocol → `domain/ports/repositories/test_repository.py` - One use case per state transition → `application/use_cases/tests/` - Remove all `HTTPException` from services - Replace `except Exception: pass` with event-based notification dispatch --- ### #2: Scoring Module **Refactor scope:** `services/scoring_service.py` (468 lines, 17 db ops) + `services/score_cache.py` + `routers/scores.py` (2 db ops) + `services/operational_metrics_service.py` (21 db ops) | Axis | Score | Evidence | |------|-------|----------| | Complexity | **9/10** | Multi-dimensional scoring algorithm reading from 7 different models. 5 configurable weights. Tactic, actor, and org scores compound technique scores. Operational metrics add MTTD/MTTR calculations with audit log queries. | | Technical Risk | **9/10** | **SR-001 from risk registry:** Org score generates ~3,500 DB queries (N+1 pattern). Settings mutated at runtime (thread-unsafe). In-memory cache does not scale across workers. Operational metrics N+1 on audit logs adds ~1,000 more queries. | | Business Impact | **9/10** | Scores drive executive dashboards, compliance reports, and snapshot history. Incorrect scores misrepresent organizational security posture. Scoring weights mutability without persistence means config is lost on restart. | **Why second:** Scoring is the second most critical domain concept and the most severe scalability bottleneck. Refactoring it introduces the repository pattern for batch queries and moves scoring weights to a persistent, immutable configuration. **What to extract:** - `TechniqueScore`, `TacticScore`, `OrgScore` value objects → `domain/value_objects/score.py` - `ScoringWeights` value object with validation → `domain/value_objects/scoring_weights.py` - Scoring algorithm as pure functions operating on domain objects → `application/use_cases/scoring/` - Batch query methods in repositories → `TechniqueRepository.find_all_with_test_counts()` - Redis-backed cache → `infrastructure/cache/` - Persist weights in DB → `ScoringConfigRepository` --- ### #3: Heatmap Module **Refactor scope:** `routers/heatmap.py` (528 lines, 13 db ops, 0 service delegation) | Axis | Score | Evidence | |------|-------|----------| | Complexity | **9/10** | 528 lines in a single router file. Imports 10 models from 6 different domains. Mixes HTTP handling, complex multi-table queries, color mapping algorithms, ATT&CK Navigator JSON serialization, and streaming export — all in one file with zero delegation. | | Technical Risk | **8/10** | **SR-003 from risk registry:** 1,400+ queries per request (2 per technique × 700). No caching. Full table scan. Every heatmap page load hammers the database. Most-visited view in the platform. | | Business Impact | **8/10** | The ATT&CK heatmap is the primary visualization — it is the first thing executives see. Navigator export is used for external reporting and audit evidence. Incorrect heatmap data directly impacts security decision-making. | **Why third:** This is the purest "fat controller" in the codebase — 528 lines of business logic, queries, and serialization with zero abstraction. It is also the most-visited page and the second-worst scalability bottleneck. Extracting it demonstrates the pattern for all other fat routers. **What to extract:** - Layer generation logic → `application/use_cases/heatmap/generate_coverage_layer.py` etc. - Navigator export format → `application/use_cases/heatmap/export_navigator.py` - Color mapping → `domain/value_objects/` or utility in application layer - Batch metadata queries → `TechniqueRepository.find_all_with_coverage_metadata()` - Router reduced from 528 lines to ~80 (5 endpoints × ~15 lines each) --- ### #4: Campaign Module **Refactor scope:** `routers/campaigns.py` (36 db ops) + `services/campaign_service.py` (10 db ops, imports HTTPException) + `services/campaign_scheduler_service.py` (8 db ops) | Axis | Score | Evidence | |------|-------|----------| | Complexity | **8/10** | Router has 36 db operations — the highest count of any router. Campaign lifecycle spans creation, test management, activation, completion, scheduling, and threat actor generation. Three files with partially overlapping responsibilities. | | Technical Risk | **7/10** | `campaign_service.py` imports `HTTPException` (framework coupling). Scheduler creates campaigns in background jobs with its own session. Circular dependency detection logic is complex and untested (no campaign router tests exist). | | Business Impact | **8/10** | Campaigns organize test execution for entire threat actor profiles. A bug in campaign scheduling or circular dependency detection could spawn infinite campaigns or skip critical test coverage. Campaigns drive the operational workflow for Red/Blue leads. | **Why fourth:** The campaign module has the most scattered responsibilities (36 db ops in router + service + scheduler) and the second instance of HTTPException in a service. It is a natural candidate after tests, scoring, and heatmap because it depends on both test and technique entities, testing the cross-module communication pattern. **What to extract:** - `CampaignEntity` with `add_test()`, `activate()`, `complete()`, `has_circular_dependency()` → `domain/entities/campaign.py` - `CampaignRepository` protocol → `domain/ports/repositories/` - Use cases for lifecycle operations → `application/use_cases/campaigns/` - Remove `HTTPException` from `campaign_service.py` - Campaign scheduling as infrastructure concern → `infrastructure/jobs/` --- ### #5: Reports & Metrics Module **Refactor scope:** `routers/reports.py` (273 lines, 6 db ops) + `routers/metrics.py` (316 lines, 12 db ops) + `routers/compliance.py` (~350 lines, 13 db ops) | Axis | Score | Evidence | |------|-------|----------| | Complexity | **8/10** | Three routers totaling ~940 lines with zero service delegation. Complex aggregation queries, CSV generation, in-memory data transformation, and compliance gap analysis — all inline in route handlers. | | Technical Risk | **7/10** | **SR-004 from risk registry:** Reports load unbounded result sets (all techniques, all tests). N+1 per-technique test counts in reports. In-memory aggregation instead of SQL GROUP BY. No streaming for CSV export. Compliance calls `calculate_technique_score()` per technique per control — multiplicative N+1. | | Business Impact | **7/10** | Reports and metrics are consumed by leads and executives for decision-making. Compliance reports map to regulatory requirements (NIST 800-53, CIS Controls). Incorrect metrics erode trust in the platform. | **Why fifth:** These three routers share the same anti-pattern (fat controller with inline queries and aggregations) and the same fix (extract to application-layer use cases with repository-backed batch queries). Refactoring them as a group establishes the pattern for the remaining 8 routers that still have direct DB access. **What to extract:** - Report generation → `application/use_cases/reports/` - Metrics calculation → `application/use_cases/metrics/` (or merge with scoring) - Compliance gap analysis → `application/use_cases/compliance/` - SQL-level aggregation in repositories → `TechniqueRepository.get_coverage_summary()` - CSV streaming as infrastructure concern → `infrastructure/export/csv_writer.py` --- ### Refactor Priority Summary ``` Module Complexity Risk Impact Weighted Order ───────────────────────────────────────────────────────── Test Workflow 10 10 10 10.0 #1 Scoring 9 9 9 9.0 #2 Heatmap 9 8 8 8.4 #3 Campaigns 8 7 8 7.7 #4 Reports & Metrics 8 7 7 7.4 #5 ``` --- ## 6. Repository Pattern for Technique This section designs a concrete repository pattern for `Technique` that can be introduced **without breaking existing code**. The strategy is additive: new code uses the repository, old code continues working until incrementally migrated. ### 6.1. Domain Port — The Interface ```python # domain/ports/repositories/technique_repository.py from __future__ import annotations import uuid from typing import Protocol, runtime_checkable from app.domain.enums import TechniqueStatus @runtime_checkable class TechniqueRepository(Protocol): """Port defining how the application accesses technique data. This is a domain contract — implementations live in infrastructure/. The domain layer NEVER imports the implementation. """ # ── Single-entity access ───────────────────────────────────── def find_by_id(self, technique_id: uuid.UUID) -> TechniqueEntity | None: """Return a technique by primary key, or None.""" ... def find_by_mitre_id(self, mitre_id: str) -> TechniqueEntity | None: """Return a technique by its MITRE ATT&CK identifier (e.g. 'T1059.001').""" ... def find_by_mitre_id_with_tests(self, mitre_id: str) -> TechniqueEntity | None: """Return a technique with its tests eagerly loaded.""" ... # ── List access ────────────────────────────────────────────── def list_all( self, *, tactic: str | None = None, status: TechniqueStatus | None = None, review_required: bool | None = None, ) -> list[TechniqueEntity]: """Return techniques matching the given filters, ordered by mitre_id.""" ... def list_by_tactic(self, tactic: str) -> list[TechniqueEntity]: """Return all techniques for a given tactic.""" ... def list_by_ids(self, ids: list[uuid.UUID]) -> list[TechniqueEntity]: """Return techniques matching a list of primary keys.""" ... # ── Batch queries (for scoring/heatmap performance) ────────── def count_by_status(self) -> dict[TechniqueStatus, int]: """Return technique counts grouped by status_global. Single SQL query — replaces the per-technique counting pattern.""" ... def find_all_with_test_counts(self) -> list[TechniqueWithCounts]: """Return all techniques with pre-aggregated test counts and detection rule counts. Single query with subqueries — eliminates the N+1 pattern in heatmap and scoring.""" ... # ── Mutations ──────────────────────────────────────────────── def save(self, technique: TechniqueEntity) -> TechniqueEntity: """Persist a new or updated technique. Returns the saved entity.""" ... def exists_by_mitre_id(self, mitre_id: str) -> bool: """Check existence without loading the full entity.""" ... ``` **Key design decisions:** - Uses `typing.Protocol` (structural subtyping) rather than `ABC` — no need for the implementation to explicitly inherit. This is idiomatic Python and works with `isinstance()` checks via `@runtime_checkable`. - Methods return domain entities (`TechniqueEntity`), never ORM models. - Batch methods (`count_by_status`, `find_all_with_test_counts`) are designed to eliminate the N+1 patterns identified in SR-001 and SR-003. - No `Session` parameter — the session is an implementation detail of the SQLAlchemy repository. ### 6.2. Infrastructure Implementation — SQLAlchemy ```python # infrastructure/persistence/repositories/sa_technique_repository.py import uuid from typing import NamedTuple from sqlalchemy import func from sqlalchemy.orm import Session, joinedload from app.domain.enums import TechniqueStatus from app.domain.entities.technique import TechniqueEntity from app.domain.ports.repositories.technique_repository import TechniqueRepository from app.infrastructure.persistence.orm.technique_model import Technique from app.infrastructure.persistence.orm.test_model import Test from app.infrastructure.persistence.orm.detection_rule_model import DetectionRule from app.infrastructure.persistence.mappers.technique_mapper import TechniqueMapper class TechniqueWithCounts(NamedTuple): """Pre-aggregated technique data for heatmap/scoring.""" entity: TechniqueEntity test_count: int validated_test_count: int detection_rule_count: int class SATechniqueRepository: """SQLAlchemy implementation of TechniqueRepository. Receives a Session from the Unit of Work — does NOT create its own. Does NOT call commit() — that is the Unit of Work's responsibility. """ def __init__(self, session: Session) -> None: self._session = session # ── Single-entity access ───────────────────────────────────── def find_by_id(self, technique_id: uuid.UUID) -> TechniqueEntity | None: model = self._session.query(Technique).filter( Technique.id == technique_id ).first() return TechniqueMapper.to_entity(model) if model else None def find_by_mitre_id(self, mitre_id: str) -> TechniqueEntity | None: model = self._session.query(Technique).filter( Technique.mitre_id == mitre_id ).first() return TechniqueMapper.to_entity(model) if model else None def find_by_mitre_id_with_tests(self, mitre_id: str) -> TechniqueEntity | None: model = ( self._session.query(Technique) .options(joinedload(Technique.tests)) .filter(Technique.mitre_id == mitre_id) .first() ) return TechniqueMapper.to_entity_with_tests(model) if model else None # ── List access ────────────────────────────────────────────── def list_all( self, *, tactic: str | None = None, status: TechniqueStatus | None = None, review_required: bool | None = None, ) -> list[TechniqueEntity]: query = self._session.query(Technique) if tactic is not None: query = query.filter(Technique.tactic == tactic) if status is not None: query = query.filter(Technique.status_global == status) if review_required is not None: query = query.filter(Technique.review_required == review_required) models = query.order_by(Technique.mitre_id).all() return [TechniqueMapper.to_entity(m) for m in models] def list_by_tactic(self, tactic: str) -> list[TechniqueEntity]: models = ( self._session.query(Technique) .filter(Technique.tactic == tactic) .order_by(Technique.mitre_id) .all() ) return [TechniqueMapper.to_entity(m) for m in models] def list_by_ids(self, ids: list[uuid.UUID]) -> list[TechniqueEntity]: models = ( self._session.query(Technique) .filter(Technique.id.in_(ids)) .all() ) return [TechniqueMapper.to_entity(m) for m in models] # ── Batch queries ──────────────────────────────────────────── def count_by_status(self) -> dict[TechniqueStatus, int]: rows = ( self._session.query( Technique.status_global, func.count(Technique.id), ) .group_by(Technique.status_global) .all() ) result = {s: 0 for s in TechniqueStatus} for status_val, count in rows: result[status_val] = count return result def find_all_with_test_counts(self) -> list[TechniqueWithCounts]: """Single query that replaces the N+1 pattern. Instead of: for each technique → query tests → query rules This does: one query with subqueries for counts. """ test_count_sq = ( self._session.query( Test.technique_id, func.count(Test.id).label("test_count"), func.count(Test.id).filter(Test.state == "validated").label("validated_count"), ) .group_by(Test.technique_id) .subquery() ) rule_count_sq = ( self._session.query( DetectionRule.mitre_technique_id, func.count(DetectionRule.id).label("rule_count"), ) .group_by(DetectionRule.mitre_technique_id) .subquery() ) rows = ( self._session.query( Technique, func.coalesce(test_count_sq.c.test_count, 0), func.coalesce(test_count_sq.c.validated_count, 0), func.coalesce(rule_count_sq.c.rule_count, 0), ) .outerjoin(test_count_sq, Technique.id == test_count_sq.c.technique_id) .outerjoin(rule_count_sq, Technique.mitre_id == rule_count_sq.c.mitre_technique_id) .order_by(Technique.mitre_id) .all() ) return [ TechniqueWithCounts( entity=TechniqueMapper.to_entity(tech), test_count=tc, validated_test_count=vtc, detection_rule_count=rc, ) for tech, tc, vtc, rc in rows ] # ── Mutations ──────────────────────────────────────────────── def save(self, technique: TechniqueEntity) -> TechniqueEntity: model = TechniqueMapper.to_model(technique) merged = self._session.merge(model) self._session.flush() # flush to get generated values, but do NOT commit return TechniqueMapper.to_entity(merged) def exists_by_mitre_id(self, mitre_id: str) -> bool: return ( self._session.query(Technique.id) .filter(Technique.mitre_id == mitre_id) .first() ) is not None ``` **Key design decisions:** - **No `commit()`**: The repository flushes but never commits. Transaction control belongs to the Unit of Work, which the use case manages. - **Returns domain entities**: The mapper converts ORM models to domain entities at the repository boundary. No ORM model ever crosses into the application or domain layers. - **Batch method**: `find_all_with_test_counts()` replaces the N+1 pattern with subqueries — reducing 1,400+ queries to 1 for the heatmap. ### 6.3. Injection into a Use Case ```python # presentation/dependencies/repositories.py from fastapi import Depends from sqlalchemy.orm import Session from app.domain.ports.repositories.technique_repository import TechniqueRepository from app.infrastructure.persistence.database import get_db from app.infrastructure.persistence.repositories.sa_technique_repository import ( SATechniqueRepository, ) def get_technique_repository( db: Session = Depends(get_db), ) -> TechniqueRepository: """FastAPI dependency that provides a TechniqueRepository. Wiring lives ONLY in the presentation layer — the use case never knows it's getting a SQLAlchemy implementation. """ return SATechniqueRepository(db) ``` ```python # presentation/dependencies/use_cases.py from fastapi import Depends from app.application.use_cases.techniques.create_technique import CreateTechniqueUseCase from app.domain.ports.repositories.technique_repository import TechniqueRepository from app.presentation.dependencies.repositories import get_technique_repository def get_create_technique_use_case( technique_repo: TechniqueRepository = Depends(get_technique_repository), ) -> CreateTechniqueUseCase: return CreateTechniqueUseCase(technique_repo=technique_repo) ``` ```python # application/use_cases/techniques/create_technique.py import uuid from app.domain.entities.technique import TechniqueEntity from app.domain.exceptions import DuplicateEntityError from app.domain.ports.repositories.technique_repository import TechniqueRepository from app.application.dto.technique_dto import CreateTechniqueInput, TechniqueResult class CreateTechniqueUseCase: """Application use case: create a new MITRE ATT&CK technique. This class knows NOTHING about: - FastAPI, HTTP, Pydantic - SQLAlchemy, PostgreSQL - How the repository is implemented """ def __init__(self, technique_repo: TechniqueRepository) -> None: self._repo = technique_repo def execute(self, input: CreateTechniqueInput, user_id: uuid.UUID) -> TechniqueResult: # Business rule: mitre_id must be unique if self._repo.exists_by_mitre_id(input.mitre_id): raise DuplicateEntityError("Technique", "mitre_id", input.mitre_id) # Create domain entity technique = TechniqueEntity.create( mitre_id=input.mitre_id, name=input.name, description=input.description, tactic=input.tactic, platforms=input.platforms, ) # Persist through repository saved = self._repo.save(technique) # Return application DTO return TechniqueResult.from_entity(saved) ``` ```python # presentation/api/v1/techniques.py (refactored — thin router) from fastapi import APIRouter, Depends, status from app.application.use_cases.techniques.create_technique import CreateTechniqueUseCase from app.domain.exceptions import DuplicateEntityError, EntityNotFoundError from app.presentation.dependencies.auth import get_current_user, require_role from app.presentation.dependencies.use_cases import get_create_technique_use_case from app.presentation.schemas.technique_schema import TechniqueCreate, TechniqueOut router = APIRouter(prefix="/techniques", tags=["techniques"]) @router.post("", response_model=TechniqueOut, status_code=status.HTTP_201_CREATED) def create_technique( payload: TechniqueCreate, use_case: CreateTechniqueUseCase = Depends(get_create_technique_use_case), current_user = Depends(require_role("admin")), ): """Create a new technique. This router: - Receives the HTTP request (Pydantic validates it) - Calls the use case - The error handler middleware maps domain exceptions to HTTP responses - Returns the result Total: 5 lines of actual logic. """ result = use_case.execute( input=CreateTechniqueInput( mitre_id=payload.mitre_id, name=payload.name, description=payload.description, tactic=payload.tactic, platforms=payload.platforms, ), user_id=current_user.id, ) return result ``` ### 6.4. Coexistence Strategy — No Big Bang The repository can be introduced **alongside existing code** without breaking anything: ``` Phase 1: Create the repository interface and SQLAlchemy implementation. Both old (direct db.query) and new (repository) code coexist. New endpoints use the repository. Old endpoints are unchanged. Phase 2: Migrate routers one endpoint at a time. Replace db.query(Technique).filter(...) with repo.find_by_mitre_id(). Each migration is a small, reviewable PR. Phase 3: When all consumers are migrated, the ORM model is no longer imported outside infrastructure/. Enforce via linting rule. ``` At no point does existing functionality break. Both patterns access the same database, the same tables, the same session. The repository is an additive abstraction — it wraps what already exists.