Files
Aegis/docs/SQLALCHEMY_PERFORMANCE_ANALYSIS.md
Kitos 0b65f51d1c
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
docs: update architecture analysis and tech debt docs to reflect resolved items
2026-02-18 19:27:52 +01:00

15 KiB
Raw Blame History

SQLAlchemy Performance Analysis — backend/app/services

Analysis Date: 2025-02-18 (updated February 18, 2026)
Scope: All Python files under backend/app/services/
Focus: N+1 queries, missing eager loading, redundant queries, queries in loops

Update (Feb 18, 2026): The most critical N+1 issues have been resolved:

  • scoring_service.pybulk_technique_scores() now uses 5 aggregated subqueries instead of per-technique loops (~3,500 queries reduced to ~5).
  • heatmap_service.py — Extracted to a dedicated service with batch-fetching (test_counts, rule_counts in 2 SQL subqueries instead of per-technique N+1).
  • SATechniqueRepository.find_all_with_test_counts() — Single query with subqueries providing pre-aggregated counts for all techniques.
  • Missing database indexes added via Alembic migrations (b024, b026) covering tests, techniques, audit_logs, and detection_rules tables.

Executive Summary

Severity Count Files Affected
Critical (N+1) 12 8 files
High (Missing eager loading) 4 4 files
Medium (Redundant queries) 3 3 files

1. operational_metrics_service.py

1.1 calculate_mttd — N+1 query problem

Lines: 4479
Problem type: N+1 — 2 queries per test inside loop

tests = db.query(Test).filter(Test.state == TestState.validated).all()
for test in tests:
    red_start = db.query(AuditLog.timestamp).filter(...).first()   # Query per test
    blue_start = db.query(AuditLog.timestamp).filter(...).first()   # Query per test

Extra queries: 2 × N (N = number of validated tests)
Fix: Use a single query with func.max and case to get both timestamps per test, or batch-fetch all audit log entries for the test IDs in one query.


1.2 calculate_mttr — N+1 query problem

Lines: 86123
Problem type: N+1 — 1 query per test inside loop

tests = db.query(Test).filter(...).all()
for test in tests:
    remediation_complete = db.query(AuditLog.timestamp).filter(
        AuditLog.entity_id == str(test.id), ...
    ).first()

Extra queries: N (N = tests with completed remediation)
Fix: Batch-fetch audit log entries for all test IDs in one query, then build a lookup dict.


1.3 get_operational_trend — N+1 query problem

Lines: 354392
Problem type: N+1 — 1 query per week inside loop

while current < now:
    validated_up_to = db.query(Test).filter(
        Test.state == TestState.validated,
        Test.red_validated_at <= week_end,
    ).all()
    # ... process ...
    current = week_end

Extra queries: ~13 (for 90-day period) or ~52 (for 1-year period)
Fix: Single query with date_trunc and group_by to get counts per week, or fetch all validated tests once and filter in Python.


1.4 calculate_rejection_rate — Redundant queries

Lines: 286328
Problem type: Redundant — 6 separate count queries that could be combined

validated_count = db.query(func.count(Test.id)).filter(...).scalar()
rejected_count = db.query(func.count(Test.id)).filter(...).scalar()
red_rejected = db.query(func.count(Test.id)).filter(...).scalar()
red_total = db.query(func.count(Test.id)).filter(...).scalar()
blue_rejected = db.query(func.count(Test.id)).filter(...).scalar()
blue_total = db.query(func.count(Test.id)).filter(...).scalar()

Extra queries: 5 (could be 12 with conditional aggregation)
Fix: Single query with func.count and case for each condition.


2. scoring_service.py

2.1 calculate_technique_score — Multiple queries per call

Lines: 26204
Problem type: 5+ separate queries per technique (Tests, DetectionRule count, TestDetectionResult count, DefensiveTechniqueMapping count, Test.max)

Each call to calculate_technique_score executes:

  • 1 query for all_tests
  • 1 query for total_rules
  • 1 query for triggered_rules (if total_rules > 0)
  • 1 query for total_countermeasures
  • 1 query for most_recent_test

Extra queries per technique: ~5


2.2 calculate_tactic_score — N+1 via helper

Lines: 209234
Problem type: Queries in loop — calls calculate_technique_score for each technique

techniques = db.query(Technique).filter(...).all()
for tech in techniques:
    result = calculate_technique_score(tech, db)  # 5+ queries each

Extra queries: 5 × N (N = techniques in tactic, often 1050)


2.3 calculate_actor_coverage_score — N+1 via helper

Lines: 241293
Problem type: Queries in loop — calls calculate_technique_score for each technique

for tech in techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × N (N = techniques used by actor)


2.4 calculate_organization_score — Severe N+1

Lines: 300309
Problem type: Queries in loop — calls calculate_technique_score for every technique

all_techniques = db.query(Technique).all()
for tech in all_techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × N where N = total techniques (~700800) → ~3,5004,000 queries


2.5 calculate_organization_score — Second N+1 loop

Lines: 352355
Problem type: Queries in loop — second pass over critical techniques

for tech in critical_techniques:
    result = calculate_technique_score(tech, db)

Extra queries: 5 × M (M = critical techniques, ~50200)


3. d3fend_import_service.py

3.1 _upsert_techniques — N+1 query problem

Lines: 9096
Problem type: N+1 — 1 query per technique in loop

for tech_data in techniques:
    existing = db.query(DefensiveTechnique).filter(
        DefensiveTechnique.d3fend_id == tech_data["d3fend_id"]
    ).first()

Extra queries: N (N = number of D3FEND techniques, ~50100)

Fix: Pre-load all existing techniques into a dict keyed by d3fend_id before the loop.


3.2 import_d3fend_mappings — N+1 query problem

Lines: 324331
Problem type: N+1 — 1 query per (mitre_id, d3fend_id) pair in nested loop

for mitre_id, d3fend_ids in _ATTACK_TO_D3FEND.items():
    for d3fend_id in d3fend_ids:
        existing = db.query(DefensiveTechniqueMapping).filter(
            DefensiveTechniqueMapping.attack_technique_id == attack_tech.id,
            DefensiveTechniqueMapping.defensive_technique_id == def_tech.id,
        ).first()

Extra queries: ~200500 (depends on mapping size)

Fix: Pre-load existing mappings into a set of (attack_tech_id, def_tech_id) tuples.


3.3 get_defenses_for_technique — Missing eager loading

Lines: 428453
Problem type: Lazy loading — accesses m.defensive_technique in loop

mappings = db.query(DefensiveTechniqueMapping).filter(...).all()
for m in mappings:
    dt = m.defensive_technique  # Lazy load per mapping

Extra queries: N (N = number of mappings for the technique)

Fix: Add joinedload(DefensiveTechniqueMapping.defensive_technique) to the query.


4. report_generation_service.py

4.1 generate_purple_campaign_report — N+1 query problem

Lines: 3646
Problem type: N+1 — 1 query per test in loop

for test in campaign_tests:
    technique = db.query(Technique).filter(Technique.id == test.technique_id).first()

Extra queries: N (N = number of campaign tests)

Fix: Eager-load Technique when fetching campaign_tests, or batch-query techniques by IDs.


5. osint_enrichment_service.py

5.1 enrich_technique_with_cves — N+1 query problem

Lines: 5975
Problem type: N+1 — 1 query per CVE in loop

for vuln in data.get("vulnerabilities", []):
    exists = db.query(OsintItem.id).filter(
        OsintItem.technique_id == technique.id,
        OsintItem.source_url.contains(cve_id),
    ).first()

Extra queries: Up to 10 per technique (resultsPerPage=10)


5.2 enrich_all_techniques — N+1 cascade

Lines: 134153
Problem type: Queries in loop — calls enrich_technique_with_cves for each technique

techniques = db.query(Technique).all()
for i, tech in enumerate(techniques):
    total += enrich_technique_with_cves(db, tech)  # N+1 inside

Extra queries: ~10 × N (N = all techniques, ~700+)


6. campaign_service.py

6.1 get_campaign_progress — Missing eager loading

Lines: 7492
Problem type: Lazy loading — accesses ct.test for each CampaignTest

campaign_tests = db.query(CampaignTest).filter(...).all()
for ct in campaign_tests:
    test = ct.test  # Lazy load per CampaignTest

Extra queries: N (N = campaign tests)

Fix: Add joinedload(CampaignTest.test) or selectinload(CampaignTest.test).


6.2 generate_campaign_from_threat_actor — N+1 query problem

Lines: 155168
Problem type: N+1 — 1 query per technique in loop

for tech, _at in gap_techniques:
    template = db.query(TestTemplate).filter(
        TestTemplate.mitre_technique_id == tech.mitre_id,
        ...
    ).first()

Extra queries: N (N = gap techniques for the actor)

Fix: Pre-load templates by mitre_id into a dict before the loop.


7. campaign_scheduler_service.py

7.1 _clone_campaign — Missing eager loading

Lines: 7686
Problem type: Lazy loading — accesses ct.test for each CampaignTest

original_cts = db.query(CampaignTest).filter(...).all()
for ct in original_cts:
    src_test = ct.test  # Lazy load per CampaignTest

Extra queries: N (N = campaign tests)

Fix: Add joinedload(CampaignTest.test).


7.2 check_and_run_recurring_campaigns — N+1 query problem

Lines: 175185
Problem type: N+1 — 1 query per campaign for red_tech users

for campaign in due_campaigns:
    # ... clone ...
    red_techs = db.query(User).filter(User.role == "red_tech", ...).all()
    for user in red_techs:
        create_notification(...)  # Also commits per notification

Extra queries: 1 per due campaign (for User query)
Note: create_notification does db.commit() each time — consider batching.


8. snapshot_service.py

8.1 create_snapshot — Severe N+1 via helper

Lines: 4177
Problem type: Queries in loop — calls calculate_technique_score for every technique

techniques = db.query(Technique).all()
for tech in techniques:
    score_data = calculate_technique_score(tech, db)  # 5+ queries each

Extra queries: 5 × N (N = all techniques, ~700+) → ~3,500+ queries


9. status_service.py

9.1 recalculate_technique_status — Potential lazy loading

Lines: 2829
Problem type: Missing eager loading — accesses technique.tests

tests = technique.tests  # Lazy load if technique was loaded without tests

Extra queries: 1 (if technique was loaded without selectinload(Technique.tests))

Note: Caller-dependent; if technique comes from a query without eager loading, this triggers 1 extra query.


10. test_workflow_service.py

10.1 get_retest_chain — Redundant queries

Lines: 416428
Problem type: Redundant — 3 separate queries that could be 12

test = db.query(Test).filter(Test.id == tid).first()
original = db.query(Test).filter(Test.id == original_id).first()
retests = db.query(Test).filter(Test.retest_of == original_id).order_by(...).all()

Fix: Single query: get original by original_id, then get all retests in one query. The first test fetch is only needed to determine original_id; could use a CTE or single query with UNION/subquery.


11. Files with no SQLAlchemy performance issues

The following service files were reviewed and do not exhibit the targeted problems:

File Notes
audit_service.py Single insert per call, no loops
atomic_import_service.py Pre-loads existing_ids, no N+1
caldera_import_service.py Pre-loads existing_ids, no N+1
compliance_import_service.py Pre-loads all_techniques, existing_controls, existing_mappings
elastic_import_service.py Pre-loads existing_ids
intel_service.py Pre-loads techniques and existing_urls
jira_service.py No db.query in loops
lolbas_import_service.py Pre-loads existing_ids
mitre_sync_service.py Pre-loads existing_techniques
notification_service.py Queries are not in loops (create_notification is called in loops but does single insert)
report_engine.py No database access
score_cache.py No direct db queries
sigma_import_service.py Pre-loads existing_ids
stale_detection_service.py Single query with subquery, no N+1
tempo_service.py Single query per call
threat_actor_import_service.py Pre-loads existing_actors, technique_by_mitre_id, existing_rels
worklog_service.py Simple CRUD, no loops

Summary Table

File Function Problem Est. Extra Queries
operational_metrics_service calculate_mttd N+1 2×N (validated tests)
operational_metrics_service calculate_mttr N+1 N (remediated tests)
operational_metrics_service get_operational_trend N+1 ~1352 (weeks)
operational_metrics_service calculate_rejection_rate Redundant 5
scoring_service calculate_organization_score N+1 ~3,5004,000
scoring_service calculate_tactic_score N+1 5×N (tactic techniques)
scoring_service calculate_actor_coverage_score N+1 5×N (actor techniques)
scoring_service calculate_technique_score Multiple per call 5 per technique
d3fend_import_service _upsert_techniques N+1 N (techniques)
d3fend_import_service import_d3fend_mappings N+1 ~200500
d3fend_import_service get_defenses_for_technique Missing eager load N (mappings)
report_generation_service generate_purple_campaign_report N+1 N (campaign tests)
osint_enrichment_service enrich_technique_with_cves N+1 ~10 per technique
osint_enrichment_service enrich_all_techniques N+1 cascade ~7,000+
campaign_service get_campaign_progress Missing eager load N (campaign tests)
campaign_service generate_campaign_from_threat_actor N+1 N (gap techniques)
campaign_scheduler_service _clone_campaign Missing eager load N (campaign tests)
campaign_scheduler_service check_and_run_recurring_campaigns N+1 1 per campaign
snapshot_service create_snapshot N+1 ~3,500+
status_service recalculate_technique_status Lazy load 1
test_workflow_service get_retest_chain Redundant 2

  1. P0 — scoring_service.py calculate_organization_score: ~3,500+ queries per call.
  2. P0 — snapshot_service.py create_snapshot: ~3,500+ queries per snapshot.
  3. P1 — operational_metrics_service.py calculate_mttd, calculate_mttr, get_operational_trend.
  4. P1 — osint_enrichment_service.py enrich_technique_with_cves and enrich_all_techniques.
  5. P2 — d3fend_import_service.py _upsert_techniques, import_d3fend_mappings, get_defenses_for_technique.
  6. P2 — campaign_service.py and campaign_scheduler_service.py.
  7. P3 — report_generation_service.py, test_workflow_service.py, status_service.py.