kitos/Aegis

Fork 0

Files

Kitos 0b65f51d1c

Aegis CI / lint-and-test (push) Has been cancelled

Details

docs: update architecture analysis and tech debt docs to reflect resolved items

2026-02-18 19:27:52 +01:00

15 KiB

Raw Blame History

SQLAlchemy Performance Analysis — backend/app/services

Analysis Date: 2025-02-18 (updated February 18, 2026)
Scope: All Python files under backend/app/services/
Focus: N+1 queries, missing eager loading, redundant queries, queries in loops

Update (Feb 18, 2026): The most critical N+1 issues have been resolved:

scoring_service.py — bulk_technique_scores() now uses 5 aggregated subqueries instead of per-technique loops (~3,500 queries reduced to ~5).

heatmap_service.py — Extracted to a dedicated service with batch-fetching (test_counts, rule_counts in 2 SQL subqueries instead of per-technique N+1).

SATechniqueRepository.find_all_with_test_counts() — Single query with subqueries providing pre-aggregated counts for all techniques.

Missing database indexes added via Alembic migrations (b024, b026) covering tests, techniques, audit_logs, and detection_rules tables.

Executive Summary

Severity	Count	Files Affected
Critical (N+1)	12	8 files
High (Missing eager loading)	4	4 files
Medium (Redundant queries)	3	3 files

1. operational_metrics_service.py

1.1 `calculate_mttd` — N+1 query problem

Lines: 44–79
Problem type: N+1 — 2 queries per test inside loop

tests = db.query(Test).filter(Test.state == TestState.validated).all()
for test in tests:
    red_start = db.query(AuditLog.timestamp).filter(...).first()   # Query per test
    blue_start = db.query(AuditLog.timestamp).filter(...).first()   # Query per test

Extra queries: 2 × N (N = number of validated tests)
Fix: Use a single query with func.max and case to get both timestamps per test, or batch-fetch all audit log entries for the test IDs in one query.

1.2 `calculate_mttr` — N+1 query problem

Lines: 86–123
Problem type: N+1 — 1 query per test inside loop

tests = db.query(Test).filter(...).all()
for test in tests:
    remediation_complete = db.query(AuditLog.timestamp).filter(
        AuditLog.entity_id == str(test.id), ...
    ).first()

Extra queries: N (N = tests with completed remediation)
Fix: Batch-fetch audit log entries for all test IDs in one query, then build a lookup dict.

1.3 `get_operational_trend` — N+1 query problem

Lines: 354–392
Problem type: N+1 — 1 query per week inside loop

while current < now:
    validated_up_to = db.query(Test).filter(
        Test.state == TestState.validated,
        Test.red_validated_at <= week_end,
    ).all()
    # ... process ...
    current = week_end

Extra queries: ~13 (for 90-day period) or ~52 (for 1-year period)
Fix: Single query with date_trunc and group_by to get counts per week, or fetch all validated tests once and filter in Python.

1.4 `calculate_rejection_rate` — Redundant queries

Lines: 286–328
Problem type: Redundant — 6 separate count queries that could be combined

validated_count = db.query(func.count(Test.id)).filter(...).scalar()
rejected_count = db.query(func.count(Test.id)).filter(...).scalar()
red_rejected = db.query(func.count(Test.id)).filter(...).scalar()
red_total = db.query(func.count(Test.id)).filter(...).scalar()
blue_rejected = db.query(func.count(Test.id)).filter(...).scalar()
blue_total = db.query(func.count(Test.id)).filter(...).scalar()

Extra queries: 5 (could be 1–2 with conditional aggregation)
Fix: Single query with func.count and case for each condition.

2. scoring_service.py

2.1 `calculate_technique_score` — Multiple queries per call

Lines: 26–204
Problem type: 5+ separate queries per technique (Tests, DetectionRule count, TestDetectionResult count, DefensiveTechniqueMapping count, Test.max)

Each call to calculate_technique_score executes:

1 query for all_tests
1 query for total_rules
1 query for triggered_rules (if total_rules > 0)
1 query for total_countermeasures
1 query for most_recent_test

Extra queries per technique: ~5

2.2 `calculate_tactic_score` — N+1 via helper

Lines: 209–234
Problem type: Queries in loop — calls calculate_technique_score for each technique

techniques = db.query(Technique).filter(...).all()
for tech in techniques:
    result = calculate_technique_score(tech, db)  # 5+ queries each

Extra queries: 5 × N (N = techniques in tactic, often 10–50)

2.3 `calculate_actor_coverage_score` — N+1 via helper

Lines: 241–293
Problem type: Queries in loop — calls calculate_technique_score for each technique

for tech in techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × N (N = techniques used by actor)

2.4 `calculate_organization_score` — Severe N+1

Lines: 300–309
Problem type: Queries in loop — calls calculate_technique_score for every technique

all_techniques = db.query(Technique).all()
for tech in all_techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × N where N = total techniques (~700–800) → ~3,500–4,000 queries

2.5 `calculate_organization_score` — Second N+1 loop

Lines: 352–355
Problem type: Queries in loop — second pass over critical techniques

for tech in critical_techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × M (M = critical techniques, ~50–200)

3. d3fend_import_service.py

3.1 `_upsert_techniques` — N+1 query problem

Lines: 90–96
Problem type: N+1 — 1 query per technique in loop

for tech_data in techniques:
    existing = db.query(DefensiveTechnique).filter(
        DefensiveTechnique.d3fend_id == tech_data["d3fend_id"]
    ).first()

Extra queries: N (N = number of D3FEND techniques, ~50–100)

Fix: Pre-load all existing techniques into a dict keyed by d3fend_id before the loop.

3.2 `import_d3fend_mappings` — N+1 query problem

Lines: 324–331
Problem type: N+1 — 1 query per (mitre_id, d3fend_id) pair in nested loop

for mitre_id, d3fend_ids in _ATTACK_TO_D3FEND.items():
    for d3fend_id in d3fend_ids:
        existing = db.query(DefensiveTechniqueMapping).filter(
            DefensiveTechniqueMapping.attack_technique_id == attack_tech.id,
            DefensiveTechniqueMapping.defensive_technique_id == def_tech.id,
        ).first()

Extra queries: ~200–500 (depends on mapping size)

Fix: Pre-load existing mappings into a set of (attack_tech_id, def_tech_id) tuples.

3.3 `get_defenses_for_technique` — Missing eager loading

Lines: 428–453
Problem type: Lazy loading — accesses m.defensive_technique in loop

mappings = db.query(DefensiveTechniqueMapping).filter(...).all()
for m in mappings:
    dt = m.defensive_technique  # Lazy load per mapping

Extra queries: N (N = number of mappings for the technique)

Fix: Add joinedload(DefensiveTechniqueMapping.defensive_technique) to the query.

4. report_generation_service.py

4.1 `generate_purple_campaign_report` — N+1 query problem

Lines: 36–46
Problem type: N+1 — 1 query per test in loop

for test in campaign_tests:
    technique = db.query(Technique).filter(Technique.id == test.technique_id).first()

Extra queries: N (N = number of campaign tests)

Fix: Eager-load Technique when fetching campaign_tests, or batch-query techniques by IDs.

5. osint_enrichment_service.py

5.1 `enrich_technique_with_cves` — N+1 query problem

Lines: 59–75
Problem type: N+1 — 1 query per CVE in loop

for vuln in data.get("vulnerabilities", []):
    exists = db.query(OsintItem.id).filter(
        OsintItem.technique_id == technique.id,
        OsintItem.source_url.contains(cve_id),
    ).first()

Extra queries: Up to 10 per technique (resultsPerPage=10)

5.2 `enrich_all_techniques` — N+1 cascade

Lines: 134–153
Problem type: Queries in loop — calls enrich_technique_with_cves for each technique

techniques = db.query(Technique).all()
for i, tech in enumerate(techniques):
    total += enrich_technique_with_cves(db, tech)  # N+1 inside

Extra queries: ~10 × N (N = all techniques, ~700+)

6. campaign_service.py

6.1 `get_campaign_progress` — Missing eager loading

Lines: 74–92
Problem type: Lazy loading — accesses ct.test for each CampaignTest

campaign_tests = db.query(CampaignTest).filter(...).all()
for ct in campaign_tests:
    test = ct.test  # Lazy load per CampaignTest

Extra queries: N (N = campaign tests)

Fix: Add joinedload(CampaignTest.test) or selectinload(CampaignTest.test).

6.2 `generate_campaign_from_threat_actor` — N+1 query problem

Lines: 155–168
Problem type: N+1 — 1 query per technique in loop

for tech, _at in gap_techniques:
    template = db.query(TestTemplate).filter(
        TestTemplate.mitre_technique_id == tech.mitre_id,
        ...
    ).first()

Extra queries: N (N = gap techniques for the actor)

Fix: Pre-load templates by mitre_id into a dict before the loop.

7. campaign_scheduler_service.py

7.1 `_clone_campaign` — Missing eager loading

Lines: 76–86
Problem type: Lazy loading — accesses ct.test for each CampaignTest

original_cts = db.query(CampaignTest).filter(...).all()
for ct in original_cts:
    src_test = ct.test  # Lazy load per CampaignTest

Extra queries: N (N = campaign tests)

Fix: Add joinedload(CampaignTest.test).

7.2 `check_and_run_recurring_campaigns` — N+1 query problem

Lines: 175–185
Problem type: N+1 — 1 query per campaign for red_tech users

for campaign in due_campaigns:
    # ... clone ...
    red_techs = db.query(User).filter(User.role == "red_tech", ...).all()
    for user in red_techs:
        create_notification(...)  # Also commits per notification

Extra queries: 1 per due campaign (for User query)
Note: create_notification does db.commit() each time — consider batching.

8. snapshot_service.py

8.1 `create_snapshot` — Severe N+1 via helper

Lines: 41–77
Problem type: Queries in loop — calls calculate_technique_score for every technique

techniques = db.query(Technique).all()
for tech in techniques:
    score_data = calculate_technique_score(tech, db)  # 5+ queries each

Extra queries: 5 × N (N = all techniques, ~700+) → ~3,500+ queries

9. status_service.py

9.1 `recalculate_technique_status` — Potential lazy loading

Lines: 28–29
Problem type: Missing eager loading — accesses technique.tests

tests = technique.tests  # Lazy load if technique was loaded without tests

Extra queries: 1 (if technique was loaded without selectinload(Technique.tests))

Note: Caller-dependent; if technique comes from a query without eager loading, this triggers 1 extra query.

10. test_workflow_service.py

10.1 `get_retest_chain` — Redundant queries

Lines: 416–428
Problem type: Redundant — 3 separate queries that could be 1–2

test = db.query(Test).filter(Test.id == tid).first()
original = db.query(Test).filter(Test.id == original_id).first()
retests = db.query(Test).filter(Test.retest_of == original_id).order_by(...).all()

Fix: Single query: get original by original_id, then get all retests in one query. The first test fetch is only needed to determine original_id; could use a CTE or single query with UNION/subquery.

11. Files with no SQLAlchemy performance issues

The following service files were reviewed and do not exhibit the targeted problems:

File	Notes
`audit_service.py`	Single insert per call, no loops
`atomic_import_service.py`	Pre-loads existing_ids, no N+1
`caldera_import_service.py`	Pre-loads existing_ids, no N+1
`compliance_import_service.py`	Pre-loads all_techniques, existing_controls, existing_mappings
`elastic_import_service.py`	Pre-loads existing_ids
`intel_service.py`	Pre-loads techniques and existing_urls
`jira_service.py`	No db.query in loops
`lolbas_import_service.py`	Pre-loads existing_ids
`mitre_sync_service.py`	Pre-loads existing_techniques
`notification_service.py`	Queries are not in loops (create_notification is called in loops but does single insert)
`report_engine.py`	No database access
`score_cache.py`	No direct db queries
`sigma_import_service.py`	Pre-loads existing_ids
`stale_detection_service.py`	Single query with subquery, no N+1
`tempo_service.py`	Single query per call
`threat_actor_import_service.py`	Pre-loads existing_actors, technique_by_mitre_id, existing_rels
`worklog_service.py`	Simple CRUD, no loops

Summary Table

File	Function	Problem	Est. Extra Queries
operational_metrics_service	calculate_mttd	N+1	2×N (validated tests)
operational_metrics_service	calculate_mttr	N+1	N (remediated tests)
operational_metrics_service	get_operational_trend	N+1	~13–52 (weeks)
operational_metrics_service	calculate_rejection_rate	Redundant	5
scoring_service	calculate_organization_score	N+1	~3,500–4,000
scoring_service	calculate_tactic_score	N+1	5×N (tactic techniques)
scoring_service	calculate_actor_coverage_score	N+1	5×N (actor techniques)
scoring_service	calculate_technique_score	Multiple per call	5 per technique
d3fend_import_service	_upsert_techniques	N+1	N (techniques)
d3fend_import_service	import_d3fend_mappings	N+1	~200–500
d3fend_import_service	get_defenses_for_technique	Missing eager load	N (mappings)
report_generation_service	generate_purple_campaign_report	N+1	N (campaign tests)
osint_enrichment_service	enrich_technique_with_cves	N+1	~10 per technique
osint_enrichment_service	enrich_all_techniques	N+1 cascade	~7,000+
campaign_service	get_campaign_progress	Missing eager load	N (campaign tests)
campaign_service	generate_campaign_from_threat_actor	N+1	N (gap techniques)
campaign_scheduler_service	_clone_campaign	Missing eager load	N (campaign tests)
campaign_scheduler_service	check_and_run_recurring_campaigns	N+1	1 per campaign
snapshot_service	create_snapshot	N+1	~3,500+
status_service	recalculate_technique_status	Lazy load	1
test_workflow_service	get_retest_chain	Redundant	2

Recommended Fix Priority

P0 — scoring_service.py calculate_organization_score: ~3,500+ queries per call.
P0 — snapshot_service.py create_snapshot: ~3,500+ queries per snapshot.
P1 — operational_metrics_service.py calculate_mttd, calculate_mttr, get_operational_trend.
P1 — osint_enrichment_service.py enrich_technique_with_cves and enrich_all_techniques.
P2 — d3fend_import_service.py _upsert_techniques, import_d3fend_mappings, get_defenses_for_technique.
P2 — campaign_service.py and campaign_scheduler_service.py.
P3 — report_generation_service.py, test_workflow_service.py, status_service.py.

15 KiB Raw Blame History Unescape Escape

SQLAlchemy Performance Analysis — backend/app/services

Executive Summary

1. operational_metrics_service.py

1.1 calculate_mttd — N+1 query problem

1.2 calculate_mttr — N+1 query problem

1.3 get_operational_trend — N+1 query problem

1.4 calculate_rejection_rate — Redundant queries

2. scoring_service.py

2.1 calculate_technique_score — Multiple queries per call

2.2 calculate_tactic_score — N+1 via helper

2.3 calculate_actor_coverage_score — N+1 via helper

2.4 calculate_organization_score — Severe N+1

2.5 calculate_organization_score — Second N+1 loop