kitos/Aegis

Fork 0

Files

Kitos 0b65f51d1c

Aegis CI / lint-and-test (push) Has been cancelled

Details

docs: update architecture analysis and tech debt docs to reflect resolved items

2026-02-18 19:27:52 +01:00

25 KiB

Raw Blame History

Aegis — Backend Internal Dependency Analysis

Author: Architecture review
Date: February 11, 2026 (updated February 18, 2026)
Scope: All 21 routers and 20 services in backend/app/

Note: This analysis describes the original state. Since then, a Clean Architecture refactor has begun. See ARCHITECTURAL_ANALYSIS.md for current status. Key changes: domain exceptions replace HTTPException in services, repository ports and implementations exist for Test and Technique, domain entities with business logic exist for Test and Technique, Unit of Work pattern is available, CI pipeline is active.

Do Routers Import SQLAlchemy Models Directly?
Do Services Access the Database Directly?
Do Services Contain Business Logic or Just CRUD?
Is Business Logic Separated from Persistence?
Is Infrastructure Decoupled from Logic?
What Architecture Is Actually Implemented?

1. Do Routers Import SQLAlchemy Models Directly?

Yes. Every single router imports at least one SQLAlchemy model. 19 of 21 routers execute raw database operations inline.

Complete Router-to-Model Import Map

Router	Models Imported Directly	DB Operations in Router
`audit.py`	AuditLog, User	3
`auth.py`	User	1
`campaigns.py`	User, Campaign, CampaignTest, Test, Technique, ThreatActor	36
`compliance.py`	User, ComplianceFramework, ComplianceControl, ComplianceControlMapping, Technique, TestTemplate, ThreatActorTechnique	13
`d3fend.py`	User, Technique, DefensiveTechnique, DefensiveTechniqueMapping	3
`data_sources.py`	User, DataSource	14
`detection_rules.py`	User, DetectionRule, TestTemplate, TestTemplateDetectionRule, TestDetectionResult	21
`evidence.py`	Evidence, Test, User, enums	11
`heatmap.py`	User, Technique, Test, ThreatActor, ThreatActorTechnique, DetectionRule, Campaign, CampaignTest, DefensiveTechniqueMapping, enums	13
`metrics.py`	Technique, Test, User, enums	12
`notifications.py`	Notification, User	2
`operational_metrics.py`	User	0 (delegates)
`reports.py`	Technique, Test, User, enums	6
`scores.py`	User, Technique, ThreatActor	2
`snapshots.py`	User, CoverageSnapshot, SnapshotTechniqueState	6
`system.py`	User	0 (delegates)
`techniques.py`	Technique, User, enums	12
`test_templates.py`	TestTemplate, User	20
`tests.py`	AuditLog, Technique, Test, TestTemplate, User, enums	30
`threat_actors.py`	User, ThreatActor, ThreatActorTechnique, Technique, Test, TestTemplate, enums	11
`users.py`	User	9

Key Numbers

21 / 21 routers import at least one SQLAlchemy model.
19 / 21 routers execute db.query(), db.add(), db.commit(), or db.delete() directly (only operational_metrics.py and system.py fully delegate).
Total DB operations across all routers: 225 (db.query, db.add, db.commit, db.delete, db.refresh calls).
All 21 routers import Session from SQLAlchemy.
7 routers import func (aggregations).
7 routers import joinedload (eager loading).
2 routers import or_ (compound filters).

What This Means

Routers are tightly coupled to the ORM. They know:

Table structure (column names, relationships)
Query syntax (filter, join, group_by, order_by)
Transaction management (commit, refresh, add)
Eager loading strategy (joinedload, selectinload)

There is no abstraction layer between routers and the database. Changing a column name on the Technique model would require modifying at least 8 routers.

2. Do Services Access the Database Directly?

Yes. All 19 services that handle data (all except score_cache.py) receive a SQLAlchemy Session as a parameter and execute queries directly.

Complete Service-to-Database Map

Service	Models Used	DB Operations	Receives `Session`	Imports `app.database`
`atomic_import_service`	TestTemplate	3	Yes	No
`audit_service`	AuditLog	2	Yes	No
`caldera_import_service`	TestTemplate, DataSource	5	Yes	No
`campaign_scheduler_service`	Campaign, CampaignTest, Test, User	8	Yes	No
`campaign_service`	Campaign, CampaignTest, Test, TestTemplate, Technique, ThreatActor, ThreatActorTechnique, User	10	Yes	No
`compliance_import_service`	ComplianceFramework, ComplianceControl, ComplianceControlMapping, Technique	22	Yes	No
`d3fend_import_service`	Technique, DefensiveTechnique, DefensiveTechniqueMapping	13	Yes	No
`elastic_import_service`	DetectionRule, DataSource	5	Yes	No
`intel_service`	IntelItem, Technique	4	Yes	No
`lolbas_import_service`	TestTemplate, DataSource	7	Yes	No
`mitre_sync_service`	Technique, enums	3	Yes	No
`notification_service`	Notification, User	12	Yes	No
`operational_metrics_service`	Test, Technique, TestDetectionResult, AuditLog, enums	21	Yes	No
`score_cache`	—	0	No	No
`scoring_service`	Technique, Test, DetectionRule, TestDetectionResult, DefensiveTechniqueMapping, ThreatActor, ThreatActorTechnique	17	Yes	No
`sigma_import_service`	DetectionRule, DataSource	5	Yes	No
`snapshot_service`	Technique, CoverageSnapshot, SnapshotTechniqueState	13	Yes	No
`status_service`	Technique, enums	1	Yes	No
`test_workflow_service`	Test, User, enums	13	Yes	No
`threat_actor_import_service`	ThreatActor, ThreatActorTechnique, Technique, DataSource	8	Yes	No

Key Numbers

Total DB operations across all services: 172 (db.query, db.add, db.commit, etc.).
19 / 20 services receive Session as a function parameter.
0 / 20 services import app.database directly — sessions are always injected by callers (routers or background jobs).
All 19 data-handling services import SQLAlchemy symbols (Session, func, case, etc.).

Positive Pattern: Session Injection

Services do follow one good practice: none of them create their own database sessions. Sessions are always passed in as arguments:

# All services use this pattern:
def calculate_technique_score(technique: Technique, db: Session) -> dict:
    all_tests = db.query(Test).filter(Test.technique_id == technique.id).all()

This makes sessions testable (you can pass a mock or test session). However, the services still know the full ORM API — they construct queries, call commit(), and manage eager loading.

3. Do Services Contain Business Logic or Just CRUD?

Mixed. Services fall into three distinct categories.

Category A: Rich Business Logic (5 services)

These services contain genuine domain logic — rules, calculations, state machines, and business decisions:

Service	Logic Type	Complexity
`test_workflow_service`	State machine with valid transition map, role-based guards, multi-step validation, retest chain management	High — 456 lines, 10+ public functions, embeds the test lifecycle rules
`scoring_service`	Multi-dimensional scoring algorithm with configurable weights, breakdown calculations, decay functions	High — 468 lines, complex math combining 5 weighted factors
`campaign_service`	Circular dependency detection, campaign progress calculation, auto-generation from threat actors	Medium — business rules for campaign management
`campaign_scheduler_service`	Recurring campaign scheduling, next-run calculation, campaign cloning	Medium — temporal business logic
`operational_metrics_service`	MTTD/MTTR calculation, detection efficacy, trend analysis with time windows	Medium — analytical business logic

Category B: External Data Import (8 services)

These services handle fetching, parsing, and upserting data from external sources. They are more "integration logic" than "business logic":

Service	External Source	Logic
`mitre_sync_service`	MITRE TAXII + GitHub	STIX 2.0 parsing, technique upsert
`atomic_import_service`	GitHub (ZIP)	YAML parsing, template creation
`sigma_import_service`	GitHub (ZIP)	YAML + ATT&CK tag extraction
`elastic_import_service`	GitHub (ZIP)	TOML parsing, rule creation
`caldera_import_service`	GitHub (ZIP)	YAML parsing, ability import
`d3fend_import_service`	D3FEND REST API	JSON parsing, mapping creation
`lolbas_import_service`	GitHub (ZIP)	YAML/Markdown parsing
`threat_actor_import_service`	GitHub (ZIP)	STIX 2.0 bundle parsing

Category C: Thin CRUD Wrappers (7 services)

These services are essentially database operations with minimal logic:

Service	What It Does	Lines of Logic
`audit_service`	`log_action()` — creates an AuditLog row	~10 lines
`notification_service`	CRUD for notifications + `notify_test_state_change()`	~30 lines of logic, rest is DB access
`status_service`	`recalculate_technique_status()` — counts tests by state, sets status	~20 lines
`snapshot_service`	Creates snapshots by looping over techniques and calling scoring_service	Orchestration + DB writes
`score_cache`	In-memory dict with TTL	~30 lines, pure caching
`compliance_import_service`	Parses NIST/CIS data and creates DB rows	Parsing + bulk insert
`intel_service`	Fetches RSS/feeds and creates IntelItem rows	Fetch + parse + insert

The Missing Logic

Significant business logic that should be in services but lives in routers instead:

Logic	Current Location	Should Be
ATT&CK Navigator layer generation	`heatmap.py` router (528 lines)	`heatmap_service` or use case
Coverage report building	`reports.py` router (273 lines)	`report_service` or use case
Coverage metrics aggregation	`metrics.py` router (316 lines)	`metrics_service`
Detection rule CRUD + auto-association	`detection_rules.py` router (21 DB ops)	`detection_rule_service`
Technique CRUD + review workflow	`techniques.py` router (12 DB ops)	`technique_service`
Campaign full lifecycle	`campaigns.py` router (36 DB ops)	Partially in `campaign_service`, but router does most CRUD

4. Is Business Logic Separated from Persistence?

No. There is no separation boundary between business logic and persistence anywhere in the codebase.

The Dependency Graph

┌─────────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER (Routers)                                        │
│                                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │techniques│   │ heatmap  │   │ reports  │   │ campaigns│  ...    │
│  │ 12 db.q  │   │ 13 db.q  │   │ 6 db.q   │   │ 36 db.q  │        │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘   └────┬─────┘        │
│       │ direct        │ direct        │ direct       │ direct       │
│       │               │               │              │ + service    │
├───────┼───────────────┼───────────────┼──────────────┼──────────────┤
│ SERVICE LAYER (Partial)                                              │
│                                                                     │
│  ┌──────────────┐   ┌───────────┐   ┌──────────────────────────┐   │
│  │test_workflow  │   │  scoring  │   │     8 import services    │   │
│  │ 13 db.q      │   │ 17 db.q   │   │    3-22 db.q each       │   │
│  │ HTTPException │   │ settings  │   │    + HTTP requests       │   │
│  └──────┬───────┘   └─────┬─────┘   └────────────┬─────────────┘   │
│         │                  │                       │                 │
├─────────┼──────────────────┼───────────────────────┼─────────────────┤
│ PERSISTENCE LAYER (SQLAlchemy — no abstraction)                      │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐ │
│  │  db.query(Model).filter(...).all()  ← called from EVERYWHERE   │ │
│  │  db.add(instance)                                               │ │
│  │  db.commit()                                                    │ │
│  │  db.refresh(instance)                                           │ │
│  └─────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  Total: 225 db operations in routers + 172 in services = 397 total  │
│  Spread across: 19 routers + 19 services = 38 files                 │
└─────────────────────────────────────────────────────────────────────┘

Why There Is No Separation

No Repository Pattern. There are no repository classes or functions that encapsulate database access. Every file that needs data constructs its own query.
No Domain Entity Layer. The SQLAlchemy models serve dual duty as both persistence mapping AND domain objects. There is no separate domain entity with business methods — the same Test class that defines the database table is passed around as the business object.
No Abstraction Boundary. There is no interface (Protocol/ABC) anywhere in the codebase that separates "what data I need" from "how to get it from the database."
Services Commit Transactions. Some services call db.commit() internally, while their calling routers may also call db.commit(). There is no Unit of Work pattern governing transaction boundaries.

Concrete Example: Scoring a Technique

The scoring_service.calculate_technique_score() function mixes business logic and persistence in every line:

# Business logic (what to calculate) and persistence (how to get data)
# are interleaved — inseparable:

all_tests = db.query(Test).filter(Test.technique_id == technique.id).all()  # ← persistence
validated_tests = [t for t in all_tests if t.state == TestState.validated]    # ← logic
detected_tests = [t for t in validated_tests if t.detection_result == TestResult.detected]  # ← logic
test_ratio = len(detected_tests) / len(validated_tests)                      # ← logic
test_score = round(test_ratio * w_tests, 1)                                  # ← logic

rule_count = db.query(func.count(DetectionRule.id))...scalar() or 0          # ← persistence
rule_score = min(rule_count / 3.0, 1.0) * w_detection                        # ← logic

To test the scoring algorithm in isolation (without a database), you would need to refactor every query into a repository that can be mocked.

5. Is Infrastructure Decoupled from Logic?

No. Infrastructure concerns are embedded directly in both routers and services.

Infrastructure Dependency Map

Infrastructure	Where It Bleeds Into Logic	Impact
SQLAlchemy ORM	19 routers (225 ops) + 19 services (172 ops) = 38 files, 397 operations	Cannot switch ORM or use raw SQL without rewriting 38 files
FastAPI HTTPException	`test_workflow_service.py`, `campaign_service.py` (2 services)	Business logic throws HTTP-specific exceptions — cannot reuse from CLI, workers, or pure tests
MinIO (boto3)	`storage.py` (well isolated) → called from `evidence.py` router	Storage itself is clean, but the router handles presigned URL generation
APScheduler	`mitre_sync_job.py` → creates `SessionLocal()` directly → calls services	Jobs bypass the DI system and create their own sessions
`app.config.settings`	`scoring_service.py` (reads weights), `test_workflow_service.py` (reads MAX_RETEST_COUNT), `auth.py` router (reads SECRET_KEY), `scores.py` router (mutates weights)	Global mutable singleton accessed from multiple layers
External HTTP (requests/httpx)	8 import services make outbound HTTP calls	Tightly coupled — cannot test import logic without network access or mocking `requests`

What Is Well Isolated

Component	Isolation Quality
`storage.py` (MinIO)	Good — thin wrapper with 3 functions (`ensure_bucket_exists`, `upload_file`, `get_presigned_url`). Only accessed from 1 router.
`auth.py` (JWT/bcrypt)	Good — self-contained module for token creation, verification, and password hashing.
`dependencies/auth.py`	Good — composable FastAPI `Depends()` chain for auth and RBAC.
`config.py` (Settings)	Partial — Pydantic Settings with env loading is clean, but the object is mutable and accessed as a global singleton.

What Is Poorly Isolated

Component	Problem
Database session lifecycle	`get_db()` is a generator injected via `Depends()` in routers, but services receive raw `Session` objects. Background jobs create sessions with `SessionLocal()` directly, bypassing the DI system entirely.
External API calls	Import services directly call `requests.get()` / `httpx.get()`. No port/adapter pattern — the HTTP client is an implementation detail embedded in business logic.
Scoring configuration	`settings.SCORING_WEIGHT_*` is read from a mutable global object. The `scores.py` router mutates it at runtime. No database-backed configuration.

6. What Architecture Is Actually Implemented?

Classification: Inconsistent Layered Architecture with Partial Service Extraction

The codebase does not follow any named architectural pattern consistently. It is a hybrid of two approaches that were never unified:

Pattern 1: Transaction Script (60% of codebase)

Most routers follow the Transaction Script pattern — each endpoint is a self-contained script that receives a request, queries the database, applies logic, mutates data, and returns a response. All in one function:

HTTP Request → Router Function → [query DB → apply logic → write DB → return response]

Routers using this pattern: techniques, evidence, users, audit, reports, heatmap, metrics, detection_rules, threat_actors, data_sources, compliance, test_templates, d3fend, snapshots (partially)

Pattern 2: Service Layer (40% of codebase)

Some routers delegate complex operations to services:

HTTP Request → Router Function → Service Function → [query DB → apply logic → write DB]
                                                  → return to router → return response

Routers using this pattern: tests (workflow), scores (scoring), notifications, operational_metrics, system (imports), campaigns (partially), snapshots (partially)

The Actual Dependency Direction

             ┌──────────────────────────────────────────┐
             │            EVERYTHING DEPENDS ON          │
             │                                          │
             │   SQLAlchemy Models (18 concrete classes) │
             │   SQLAlchemy Session (passed everywhere)  │
             │                                          │
             └──────────┬───────────────┬───────────────┘
                        │               │
              ┌─────────▼──────┐  ┌─────▼──────────┐
              │    Routers     │  │    Services     │
              │  (21 files)    │  │  (20 files)     │
              │  225 db ops    │  │  172 db ops     │
              │  import models │  │  import models  │
              │  import Session│  │  receive Session│
              └────────┬───────┘  └────────┬────────┘
                       │                    │
                       │   cross-reference  │
                       │◄──────────────────►│
                       │  13 routers import │
                       │  services          │
                       │  10 services import│
                       │  other services    │
                       └────────────────────┘

The dependency direction is: everything points DOWN to SQLAlchemy. There is no inversion. The models are the center of gravity, not the domain logic.

Comparison with Named Architectures

Architecture	Aegis Implementation	Verdict
Clean Architecture	No domain layer, no use cases, no ports/adapters, no dependency inversion	Not implemented
Hexagonal Architecture	No ports, no adapters, infrastructure is not pluggable	Not implemented
Layered Architecture	Layers exist (routers → services → models) but boundaries are not enforced — routers bypass the service layer freely	Partially implemented, inconsistently
Domain-Driven Design	Anemic models, no aggregates, no value objects, no domain events, no bounded contexts	Not implemented
Transaction Script	Most endpoints follow this pattern	De facto pattern for ~60% of code
Active Record	SQLAlchemy models don't have business methods (they're not Active Record either)	Not implemented

Summary Classification

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│  Architecture:  Inconsistent Layered Monolith                   │
│                                                                 │
│  Dominant pattern:  Transaction Script (routers as scripts)     │
│  Secondary pattern: Service Layer (for complex workflows)       │
│                                                                 │
│  Boundary enforcement:  None                                    │
│  Dependency direction:  All code → SQLAlchemy (downward)        │
│  Abstraction layers:    Zero (no interfaces, no repositories)   │
│                                                                 │
│  Files with direct DB access:  38 out of 41 (93%)              │
│  Total scattered DB operations: 397                             │
│                                                                 │
│  Well-designed components:                                      │
│    - test_workflow_service (state machine)                       │
│    - scoring_service (algorithm — coupled to DB)                 │
│    - storage.py (clean MinIO wrapper)                           │
│    - dependencies/auth.py (composable auth chain)               │
│                                                                 │
│  Poorly-designed components:                                    │
│    - heatmap.py router (528 lines, 13 DB ops, zero delegation)  │
│    - campaigns.py router (36 DB ops, partial delegation)        │
│    - detection_rules.py router (21 DB ops, zero delegation)     │
│    - test_templates.py router (20 DB ops, zero delegation)      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

25 KiB Raw Blame History

Aegis — Backend Internal Dependency Analysis

Table of Contents

1. Do Routers Import SQLAlchemy Models Directly?

Complete Router-to-Model Import Map

Key Numbers

What This Means

2. Do Services Access the Database Directly?

Complete Service-to-Database Map

Key Numbers

Positive Pattern: Session Injection

3. Do Services Contain Business Logic or Just CRUD?

Category A: Rich Business Logic (5 services)

Category B: External Data Import (8 services)

Category C: Thin CRUD Wrappers (7 services)

The Missing Logic

4. Is Business Logic Separated from Persistence?

The Dependency Graph

Why There Is No Separation

Concrete Example: Scoring a Technique

5. Is Infrastructure Decoupled from Logic?

Infrastructure Dependency Map

What Is Well Isolated

What Is Poorly Isolated

6. What Architecture Is Actually Implemented?

Classification: Inconsistent Layered Architecture with Partial Service Extraction

Pattern 1: Transaction Script (60% of codebase)

Pattern 2: Service Layer (40% of codebase)

The Actual Dependency Direction

Comparison with Named Architectures

Summary Classification

25 KiB

Raw Blame History