Files
Aegis/docs/DEPENDENCY_ANALYSIS.md
Kitos 0b65f51d1c
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
docs: update architecture analysis and tech debt docs to reflect resolved items
2026-02-18 19:27:52 +01:00

25 KiB

Aegis — Backend Internal Dependency Analysis

Author: Architecture review
Date: February 11, 2026 (updated February 18, 2026)
Scope: All 21 routers and 20 services in backend/app/

Note: This analysis describes the original state. Since then, a Clean Architecture refactor has begun. See ARCHITECTURAL_ANALYSIS.md for current status. Key changes: domain exceptions replace HTTPException in services, repository ports and implementations exist for Test and Technique, domain entities with business logic exist for Test and Technique, Unit of Work pattern is available, CI pipeline is active.


Table of Contents

  1. Do Routers Import SQLAlchemy Models Directly?
  2. Do Services Access the Database Directly?
  3. Do Services Contain Business Logic or Just CRUD?
  4. Is Business Logic Separated from Persistence?
  5. Is Infrastructure Decoupled from Logic?
  6. What Architecture Is Actually Implemented?

1. Do Routers Import SQLAlchemy Models Directly?

Yes. Every single router imports at least one SQLAlchemy model. 19 of 21 routers execute raw database operations inline.

Complete Router-to-Model Import Map

Router Models Imported Directly DB Operations in Router
audit.py AuditLog, User 3
auth.py User 1
campaigns.py User, Campaign, CampaignTest, Test, Technique, ThreatActor 36
compliance.py User, ComplianceFramework, ComplianceControl, ComplianceControlMapping, Technique, TestTemplate, ThreatActorTechnique 13
d3fend.py User, Technique, DefensiveTechnique, DefensiveTechniqueMapping 3
data_sources.py User, DataSource 14
detection_rules.py User, DetectionRule, TestTemplate, TestTemplateDetectionRule, TestDetectionResult 21
evidence.py Evidence, Test, User, enums 11
heatmap.py User, Technique, Test, ThreatActor, ThreatActorTechnique, DetectionRule, Campaign, CampaignTest, DefensiveTechniqueMapping, enums 13
metrics.py Technique, Test, User, enums 12
notifications.py Notification, User 2
operational_metrics.py User 0 (delegates)
reports.py Technique, Test, User, enums 6
scores.py User, Technique, ThreatActor 2
snapshots.py User, CoverageSnapshot, SnapshotTechniqueState 6
system.py User 0 (delegates)
techniques.py Technique, User, enums 12
test_templates.py TestTemplate, User 20
tests.py AuditLog, Technique, Test, TestTemplate, User, enums 30
threat_actors.py User, ThreatActor, ThreatActorTechnique, Technique, Test, TestTemplate, enums 11
users.py User 9

Key Numbers

  • 21 / 21 routers import at least one SQLAlchemy model.
  • 19 / 21 routers execute db.query(), db.add(), db.commit(), or db.delete() directly (only operational_metrics.py and system.py fully delegate).
  • Total DB operations across all routers: 225 (db.query, db.add, db.commit, db.delete, db.refresh calls).
  • All 21 routers import Session from SQLAlchemy.
  • 7 routers import func (aggregations).
  • 7 routers import joinedload (eager loading).
  • 2 routers import or_ (compound filters).

What This Means

Routers are tightly coupled to the ORM. They know:

  • Table structure (column names, relationships)
  • Query syntax (filter, join, group_by, order_by)
  • Transaction management (commit, refresh, add)
  • Eager loading strategy (joinedload, selectinload)

There is no abstraction layer between routers and the database. Changing a column name on the Technique model would require modifying at least 8 routers.


2. Do Services Access the Database Directly?

Yes. All 19 services that handle data (all except score_cache.py) receive a SQLAlchemy Session as a parameter and execute queries directly.

Complete Service-to-Database Map

Service Models Used DB Operations Receives Session Imports app.database
atomic_import_service TestTemplate 3 Yes No
audit_service AuditLog 2 Yes No
caldera_import_service TestTemplate, DataSource 5 Yes No
campaign_scheduler_service Campaign, CampaignTest, Test, User 8 Yes No
campaign_service Campaign, CampaignTest, Test, TestTemplate, Technique, ThreatActor, ThreatActorTechnique, User 10 Yes No
compliance_import_service ComplianceFramework, ComplianceControl, ComplianceControlMapping, Technique 22 Yes No
d3fend_import_service Technique, DefensiveTechnique, DefensiveTechniqueMapping 13 Yes No
elastic_import_service DetectionRule, DataSource 5 Yes No
intel_service IntelItem, Technique 4 Yes No
lolbas_import_service TestTemplate, DataSource 7 Yes No
mitre_sync_service Technique, enums 3 Yes No
notification_service Notification, User 12 Yes No
operational_metrics_service Test, Technique, TestDetectionResult, AuditLog, enums 21 Yes No
score_cache 0 No No
scoring_service Technique, Test, DetectionRule, TestDetectionResult, DefensiveTechniqueMapping, ThreatActor, ThreatActorTechnique 17 Yes No
sigma_import_service DetectionRule, DataSource 5 Yes No
snapshot_service Technique, CoverageSnapshot, SnapshotTechniqueState 13 Yes No
status_service Technique, enums 1 Yes No
test_workflow_service Test, User, enums 13 Yes No
threat_actor_import_service ThreatActor, ThreatActorTechnique, Technique, DataSource 8 Yes No

Key Numbers

  • Total DB operations across all services: 172 (db.query, db.add, db.commit, etc.).
  • 19 / 20 services receive Session as a function parameter.
  • 0 / 20 services import app.database directly — sessions are always injected by callers (routers or background jobs).
  • All 19 data-handling services import SQLAlchemy symbols (Session, func, case, etc.).

Positive Pattern: Session Injection

Services do follow one good practice: none of them create their own database sessions. Sessions are always passed in as arguments:

# All services use this pattern:
def calculate_technique_score(technique: Technique, db: Session) -> dict:
    all_tests = db.query(Test).filter(Test.technique_id == technique.id).all()

This makes sessions testable (you can pass a mock or test session). However, the services still know the full ORM API — they construct queries, call commit(), and manage eager loading.


3. Do Services Contain Business Logic or Just CRUD?

Mixed. Services fall into three distinct categories.

Category A: Rich Business Logic (5 services)

These services contain genuine domain logic — rules, calculations, state machines, and business decisions:

Service Logic Type Complexity
test_workflow_service State machine with valid transition map, role-based guards, multi-step validation, retest chain management High — 456 lines, 10+ public functions, embeds the test lifecycle rules
scoring_service Multi-dimensional scoring algorithm with configurable weights, breakdown calculations, decay functions High — 468 lines, complex math combining 5 weighted factors
campaign_service Circular dependency detection, campaign progress calculation, auto-generation from threat actors Medium — business rules for campaign management
campaign_scheduler_service Recurring campaign scheduling, next-run calculation, campaign cloning Medium — temporal business logic
operational_metrics_service MTTD/MTTR calculation, detection efficacy, trend analysis with time windows Medium — analytical business logic

Category B: External Data Import (8 services)

These services handle fetching, parsing, and upserting data from external sources. They are more "integration logic" than "business logic":

Service External Source Logic
mitre_sync_service MITRE TAXII + GitHub STIX 2.0 parsing, technique upsert
atomic_import_service GitHub (ZIP) YAML parsing, template creation
sigma_import_service GitHub (ZIP) YAML + ATT&CK tag extraction
elastic_import_service GitHub (ZIP) TOML parsing, rule creation
caldera_import_service GitHub (ZIP) YAML parsing, ability import
d3fend_import_service D3FEND REST API JSON parsing, mapping creation
lolbas_import_service GitHub (ZIP) YAML/Markdown parsing
threat_actor_import_service GitHub (ZIP) STIX 2.0 bundle parsing

Category C: Thin CRUD Wrappers (7 services)

These services are essentially database operations with minimal logic:

Service What It Does Lines of Logic
audit_service log_action() — creates an AuditLog row ~10 lines
notification_service CRUD for notifications + notify_test_state_change() ~30 lines of logic, rest is DB access
status_service recalculate_technique_status() — counts tests by state, sets status ~20 lines
snapshot_service Creates snapshots by looping over techniques and calling scoring_service Orchestration + DB writes
score_cache In-memory dict with TTL ~30 lines, pure caching
compliance_import_service Parses NIST/CIS data and creates DB rows Parsing + bulk insert
intel_service Fetches RSS/feeds and creates IntelItem rows Fetch + parse + insert

The Missing Logic

Significant business logic that should be in services but lives in routers instead:

Logic Current Location Should Be
ATT&CK Navigator layer generation heatmap.py router (528 lines) heatmap_service or use case
Coverage report building reports.py router (273 lines) report_service or use case
Coverage metrics aggregation metrics.py router (316 lines) metrics_service
Detection rule CRUD + auto-association detection_rules.py router (21 DB ops) detection_rule_service
Technique CRUD + review workflow techniques.py router (12 DB ops) technique_service
Campaign full lifecycle campaigns.py router (36 DB ops) Partially in campaign_service, but router does most CRUD

4. Is Business Logic Separated from Persistence?

No. There is no separation boundary between business logic and persistence anywhere in the codebase.

The Dependency Graph

┌─────────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER (Routers)                                        │
│                                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │techniques│   │ heatmap  │   │ reports  │   │ campaigns│  ...    │
│  │ 12 db.q  │   │ 13 db.q  │   │ 6 db.q   │   │ 36 db.q  │        │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘   └────┬─────┘        │
│       │ direct        │ direct        │ direct       │ direct       │
│       │               │               │              │ + service    │
├───────┼───────────────┼───────────────┼──────────────┼──────────────┤
│ SERVICE LAYER (Partial)                                              │
│                                                                     │
│  ┌──────────────┐   ┌───────────┐   ┌──────────────────────────┐   │
│  │test_workflow  │   │  scoring  │   │     8 import services    │   │
│  │ 13 db.q      │   │ 17 db.q   │   │    3-22 db.q each       │   │
│  │ HTTPException │   │ settings  │   │    + HTTP requests       │   │
│  └──────┬───────┘   └─────┬─────┘   └────────────┬─────────────┘   │
│         │                  │                       │                 │
├─────────┼──────────────────┼───────────────────────┼─────────────────┤
│ PERSISTENCE LAYER (SQLAlchemy — no abstraction)                      │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  db.query(Model).filter(...).all()  ← called from EVERYWHERE   │ │
│  │  db.add(instance)                                               │ │
│  │  db.commit()                                                    │ │
│  │  db.refresh(instance)                                           │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  Total: 225 db operations in routers + 172 in services = 397 total  │
│  Spread across: 19 routers + 19 services = 38 files                 │
└─────────────────────────────────────────────────────────────────────┘

Why There Is No Separation

  1. No Repository Pattern. There are no repository classes or functions that encapsulate database access. Every file that needs data constructs its own query.

  2. No Domain Entity Layer. The SQLAlchemy models serve dual duty as both persistence mapping AND domain objects. There is no separate domain entity with business methods — the same Test class that defines the database table is passed around as the business object.

  3. No Abstraction Boundary. There is no interface (Protocol/ABC) anywhere in the codebase that separates "what data I need" from "how to get it from the database."

  4. Services Commit Transactions. Some services call db.commit() internally, while their calling routers may also call db.commit(). There is no Unit of Work pattern governing transaction boundaries.

Concrete Example: Scoring a Technique

The scoring_service.calculate_technique_score() function mixes business logic and persistence in every line:

# Business logic (what to calculate) and persistence (how to get data)
# are interleaved — inseparable:

all_tests = db.query(Test).filter(Test.technique_id == technique.id).all()  # ← persistence
validated_tests = [t for t in all_tests if t.state == TestState.validated]    # ← logic
detected_tests = [t for t in validated_tests if t.detection_result == TestResult.detected]  # ← logic
test_ratio = len(detected_tests) / len(validated_tests)                      # ← logic
test_score = round(test_ratio * w_tests, 1)                                  # ← logic

rule_count = db.query(func.count(DetectionRule.id))...scalar() or 0          # ← persistence
rule_score = min(rule_count / 3.0, 1.0) * w_detection                        # ← logic

To test the scoring algorithm in isolation (without a database), you would need to refactor every query into a repository that can be mocked.


5. Is Infrastructure Decoupled from Logic?

No. Infrastructure concerns are embedded directly in both routers and services.

Infrastructure Dependency Map

Infrastructure Where It Bleeds Into Logic Impact
SQLAlchemy ORM 19 routers (225 ops) + 19 services (172 ops) = 38 files, 397 operations Cannot switch ORM or use raw SQL without rewriting 38 files
FastAPI HTTPException test_workflow_service.py, campaign_service.py (2 services) Business logic throws HTTP-specific exceptions — cannot reuse from CLI, workers, or pure tests
MinIO (boto3) storage.py (well isolated) → called from evidence.py router Storage itself is clean, but the router handles presigned URL generation
APScheduler mitre_sync_job.py → creates SessionLocal() directly → calls services Jobs bypass the DI system and create their own sessions
app.config.settings scoring_service.py (reads weights), test_workflow_service.py (reads MAX_RETEST_COUNT), auth.py router (reads SECRET_KEY), scores.py router (mutates weights) Global mutable singleton accessed from multiple layers
External HTTP (requests/httpx) 8 import services make outbound HTTP calls Tightly coupled — cannot test import logic without network access or mocking requests

What Is Well Isolated

Component Isolation Quality
storage.py (MinIO) Good — thin wrapper with 3 functions (ensure_bucket_exists, upload_file, get_presigned_url). Only accessed from 1 router.
auth.py (JWT/bcrypt) Good — self-contained module for token creation, verification, and password hashing.
dependencies/auth.py Good — composable FastAPI Depends() chain for auth and RBAC.
config.py (Settings) Partial — Pydantic Settings with env loading is clean, but the object is mutable and accessed as a global singleton.

What Is Poorly Isolated

Component Problem
Database session lifecycle get_db() is a generator injected via Depends() in routers, but services receive raw Session objects. Background jobs create sessions with SessionLocal() directly, bypassing the DI system entirely.
External API calls Import services directly call requests.get() / httpx.get(). No port/adapter pattern — the HTTP client is an implementation detail embedded in business logic.
Scoring configuration settings.SCORING_WEIGHT_* is read from a mutable global object. The scores.py router mutates it at runtime. No database-backed configuration.

6. What Architecture Is Actually Implemented?

Classification: Inconsistent Layered Architecture with Partial Service Extraction

The codebase does not follow any named architectural pattern consistently. It is a hybrid of two approaches that were never unified:

Pattern 1: Transaction Script (60% of codebase)

Most routers follow the Transaction Script pattern — each endpoint is a self-contained script that receives a request, queries the database, applies logic, mutates data, and returns a response. All in one function:

HTTP Request → Router Function → [query DB → apply logic → write DB → return response]

Routers using this pattern: techniques, evidence, users, audit, reports, heatmap, metrics, detection_rules, threat_actors, data_sources, compliance, test_templates, d3fend, snapshots (partially)

Pattern 2: Service Layer (40% of codebase)

Some routers delegate complex operations to services:

HTTP Request → Router Function → Service Function → [query DB → apply logic → write DB]
                                                  → return to router → return response

Routers using this pattern: tests (workflow), scores (scoring), notifications, operational_metrics, system (imports), campaigns (partially), snapshots (partially)

The Actual Dependency Direction

             ┌──────────────────────────────────────────┐
             │            EVERYTHING DEPENDS ON          │
             │                                          │
             │   SQLAlchemy Models (18 concrete classes) │
             │   SQLAlchemy Session (passed everywhere)  │
             │                                          │
             └──────────┬───────────────┬───────────────┘
                        │               │
              ┌─────────▼──────┐  ┌─────▼──────────┐
              │    Routers     │  │    Services     │
              │  (21 files)    │  │  (20 files)     │
              │  225 db ops    │  │  172 db ops     │
              │  import models │  │  import models  │
              │  import Session│  │  receive Session│
              └────────┬───────┘  └────────┬────────┘
                       │                    │
                       │   cross-reference  │
                       │◄──────────────────►│
                       │  13 routers import │
                       │  services          │
                       │  10 services import│
                       │  other services    │
                       └────────────────────┘

The dependency direction is: everything points DOWN to SQLAlchemy. There is no inversion. The models are the center of gravity, not the domain logic.

Comparison with Named Architectures

Architecture Aegis Implementation Verdict
Clean Architecture No domain layer, no use cases, no ports/adapters, no dependency inversion Not implemented
Hexagonal Architecture No ports, no adapters, infrastructure is not pluggable Not implemented
Layered Architecture Layers exist (routers → services → models) but boundaries are not enforced — routers bypass the service layer freely Partially implemented, inconsistently
Domain-Driven Design Anemic models, no aggregates, no value objects, no domain events, no bounded contexts Not implemented
Transaction Script Most endpoints follow this pattern De facto pattern for ~60% of code
Active Record SQLAlchemy models don't have business methods (they're not Active Record either) Not implemented

Summary Classification

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Architecture:  Inconsistent Layered Monolith                   │
│                                                                 │
│  Dominant pattern:  Transaction Script (routers as scripts)     │
│  Secondary pattern: Service Layer (for complex workflows)       │
│                                                                 │
│  Boundary enforcement:  None                                    │
│  Dependency direction:  All code → SQLAlchemy (downward)        │
│  Abstraction layers:    Zero (no interfaces, no repositories)   │
│                                                                 │
│  Files with direct DB access:  38 out of 41 (93%)              │
│  Total scattered DB operations: 397                             │
│                                                                 │
│  Well-designed components:                                      │
│    - test_workflow_service (state machine)                       │
│    - scoring_service (algorithm — coupled to DB)                 │
│    - storage.py (clean MinIO wrapper)                           │
│    - dependencies/auth.py (composable auth chain)               │
│                                                                 │
│  Poorly-designed components:                                    │
│    - heatmap.py router (528 lines, 13 DB ops, zero delegation)  │
│    - campaigns.py router (36 DB ops, partial delegation)        │
│    - detection_rules.py router (21 DB ops, zero delegation)     │
│    - test_templates.py router (20 DB ops, zero delegation)      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘