Files
Aegis/docs/SQLALCHEMY_PERFORMANCE_ANALYSIS.md
Kitos 0b65f51d1c
Some checks failed
Aegis CI / lint-and-test (push) Has been cancelled
docs: update architecture analysis and tech debt docs to reflect resolved items
2026-02-18 19:27:52 +01:00

453 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SQLAlchemy Performance Analysis — backend/app/services
**Analysis Date:** 2025-02-18 (updated February 18, 2026)
**Scope:** All Python files under `backend/app/services/`
**Focus:** N+1 queries, missing eager loading, redundant queries, queries in loops
> **Update (Feb 18, 2026):** The most critical N+1 issues have been resolved:
> - `scoring_service.py` — `bulk_technique_scores()` now uses 5 aggregated subqueries instead of per-technique loops (~3,500 queries reduced to ~5).
> - `heatmap_service.py` — Extracted to a dedicated service with batch-fetching (`test_counts`, `rule_counts` in 2 SQL subqueries instead of per-technique N+1).
> - `SATechniqueRepository.find_all_with_test_counts()` — Single query with subqueries providing pre-aggregated counts for all techniques.
> - Missing database indexes added via Alembic migrations (b024, b026) covering `tests`, `techniques`, `audit_logs`, and `detection_rules` tables.
---
## Executive Summary
| Severity | Count | Files Affected |
|----------|-------|----------------|
| **Critical (N+1)** | 12 | 8 files |
| **High (Missing eager loading)** | 4 | 4 files |
| **Medium (Redundant queries)** | 3 | 3 files |
---
## 1. operational_metrics_service.py
### 1.1 `calculate_mttd` — N+1 query problem
**Lines:** 4479
**Problem type:** N+1 — 2 queries per test inside loop
```python
tests = db.query(Test).filter(Test.state == TestState.validated).all()
for test in tests:
red_start = db.query(AuditLog.timestamp).filter(...).first() # Query per test
blue_start = db.query(AuditLog.timestamp).filter(...).first() # Query per test
```
**Extra queries:** 2 × N (N = number of validated tests)
**Fix:** Use a single query with `func.max` and `case` to get both timestamps per test, or batch-fetch all audit log entries for the test IDs in one query.
---
### 1.2 `calculate_mttr` — N+1 query problem
**Lines:** 86123
**Problem type:** N+1 — 1 query per test inside loop
```python
tests = db.query(Test).filter(...).all()
for test in tests:
remediation_complete = db.query(AuditLog.timestamp).filter(
AuditLog.entity_id == str(test.id), ...
).first()
```
**Extra queries:** N (N = tests with completed remediation)
**Fix:** Batch-fetch audit log entries for all test IDs in one query, then build a lookup dict.
---
### 1.3 `get_operational_trend` — N+1 query problem
**Lines:** 354392
**Problem type:** N+1 — 1 query per week inside loop
```python
while current < now:
validated_up_to = db.query(Test).filter(
Test.state == TestState.validated,
Test.red_validated_at <= week_end,
).all()
# ... process ...
current = week_end
```
**Extra queries:** ~13 (for 90-day period) or ~52 (for 1-year period)
**Fix:** Single query with `date_trunc` and `group_by` to get counts per week, or fetch all validated tests once and filter in Python.
---
### 1.4 `calculate_rejection_rate` — Redundant queries
**Lines:** 286328
**Problem type:** Redundant — 6 separate count queries that could be combined
```python
validated_count = db.query(func.count(Test.id)).filter(...).scalar()
rejected_count = db.query(func.count(Test.id)).filter(...).scalar()
red_rejected = db.query(func.count(Test.id)).filter(...).scalar()
red_total = db.query(func.count(Test.id)).filter(...).scalar()
blue_rejected = db.query(func.count(Test.id)).filter(...).scalar()
blue_total = db.query(func.count(Test.id)).filter(...).scalar()
```
**Extra queries:** 5 (could be 12 with conditional aggregation)
**Fix:** Single query with `func.count` and `case` for each condition.
---
## 2. scoring_service.py
### 2.1 `calculate_technique_score` — Multiple queries per call
**Lines:** 26204
**Problem type:** 5+ separate queries per technique (Tests, DetectionRule count, TestDetectionResult count, DefensiveTechniqueMapping count, Test.max)
Each call to `calculate_technique_score` executes:
- 1 query for `all_tests`
- 1 query for `total_rules`
- 1 query for `triggered_rules` (if total_rules > 0)
- 1 query for `total_countermeasures`
- 1 query for `most_recent_test`
**Extra queries per technique:** ~5
---
### 2.2 `calculate_tactic_score` — N+1 via helper
**Lines:** 209234
**Problem type:** Queries in loop — calls `calculate_technique_score` for each technique
```python
techniques = db.query(Technique).filter(...).all()
for tech in techniques:
result = calculate_technique_score(tech, db) # 5+ queries each
```
**Extra queries:** 5 × N (N = techniques in tactic, often 1050)
---
### 2.3 `calculate_actor_coverage_score` — N+1 via helper
**Lines:** 241293
**Problem type:** Queries in loop — calls `calculate_technique_score` for each technique
```python
for tech in techniques:
result = calculate_technique_score(tech, db)
```
**Extra queries:** 5 × N (N = techniques used by actor)
---
### 2.4 `calculate_organization_score` — Severe N+1
**Lines:** 300309
**Problem type:** Queries in loop — calls `calculate_technique_score` for every technique
```python
all_techniques = db.query(Technique).all()
for tech in all_techniques:
result = calculate_technique_score(tech, db)
```
**Extra queries:** 5 × N where N = total techniques (~700800) → **~3,5004,000 queries**
---
### 2.5 `calculate_organization_score` — Second N+1 loop
**Lines:** 352355
**Problem type:** Queries in loop — second pass over critical techniques
```python
for tech in critical_techniques:
result = calculate_technique_score(tech, db)
```
**Extra queries:** 5 × M (M = critical techniques, ~50200)
---
## 3. d3fend_import_service.py
### 3.1 `_upsert_techniques` — N+1 query problem
**Lines:** 9096
**Problem type:** N+1 — 1 query per technique in loop
```python
for tech_data in techniques:
existing = db.query(DefensiveTechnique).filter(
DefensiveTechnique.d3fend_id == tech_data["d3fend_id"]
).first()
```
**Extra queries:** N (N = number of D3FEND techniques, ~50100)
**Fix:** Pre-load all existing techniques into a dict keyed by `d3fend_id` before the loop.
---
### 3.2 `import_d3fend_mappings` — N+1 query problem
**Lines:** 324331
**Problem type:** N+1 — 1 query per (mitre_id, d3fend_id) pair in nested loop
```python
for mitre_id, d3fend_ids in _ATTACK_TO_D3FEND.items():
for d3fend_id in d3fend_ids:
existing = db.query(DefensiveTechniqueMapping).filter(
DefensiveTechniqueMapping.attack_technique_id == attack_tech.id,
DefensiveTechniqueMapping.defensive_technique_id == def_tech.id,
).first()
```
**Extra queries:** ~200500 (depends on mapping size)
**Fix:** Pre-load existing mappings into a set of `(attack_tech_id, def_tech_id)` tuples.
---
### 3.3 `get_defenses_for_technique` — Missing eager loading
**Lines:** 428453
**Problem type:** Lazy loading — accesses `m.defensive_technique` in loop
```python
mappings = db.query(DefensiveTechniqueMapping).filter(...).all()
for m in mappings:
dt = m.defensive_technique # Lazy load per mapping
```
**Extra queries:** N (N = number of mappings for the technique)
**Fix:** Add `joinedload(DefensiveTechniqueMapping.defensive_technique)` to the query.
---
## 4. report_generation_service.py
### 4.1 `generate_purple_campaign_report` — N+1 query problem
**Lines:** 3646
**Problem type:** N+1 — 1 query per test in loop
```python
for test in campaign_tests:
technique = db.query(Technique).filter(Technique.id == test.technique_id).first()
```
**Extra queries:** N (N = number of campaign tests)
**Fix:** Eager-load Technique when fetching campaign_tests, or batch-query techniques by IDs.
---
## 5. osint_enrichment_service.py
### 5.1 `enrich_technique_with_cves` — N+1 query problem
**Lines:** 5975
**Problem type:** N+1 — 1 query per CVE in loop
```python
for vuln in data.get("vulnerabilities", []):
exists = db.query(OsintItem.id).filter(
OsintItem.technique_id == technique.id,
OsintItem.source_url.contains(cve_id),
).first()
```
**Extra queries:** Up to 10 per technique (resultsPerPage=10)
---
### 5.2 `enrich_all_techniques` — N+1 cascade
**Lines:** 134153
**Problem type:** Queries in loop — calls `enrich_technique_with_cves` for each technique
```python
techniques = db.query(Technique).all()
for i, tech in enumerate(techniques):
total += enrich_technique_with_cves(db, tech) # N+1 inside
```
**Extra queries:** ~10 × N (N = all techniques, ~700+)
---
## 6. campaign_service.py
### 6.1 `get_campaign_progress` — Missing eager loading
**Lines:** 7492
**Problem type:** Lazy loading — accesses `ct.test` for each CampaignTest
```python
campaign_tests = db.query(CampaignTest).filter(...).all()
for ct in campaign_tests:
test = ct.test # Lazy load per CampaignTest
```
**Extra queries:** N (N = campaign tests)
**Fix:** Add `joinedload(CampaignTest.test)` or `selectinload(CampaignTest.test)`.
---
### 6.2 `generate_campaign_from_threat_actor` — N+1 query problem
**Lines:** 155168
**Problem type:** N+1 — 1 query per technique in loop
```python
for tech, _at in gap_techniques:
template = db.query(TestTemplate).filter(
TestTemplate.mitre_technique_id == tech.mitre_id,
...
).first()
```
**Extra queries:** N (N = gap techniques for the actor)
**Fix:** Pre-load templates by mitre_id into a dict before the loop.
---
## 7. campaign_scheduler_service.py
### 7.1 `_clone_campaign` — Missing eager loading
**Lines:** 7686
**Problem type:** Lazy loading — accesses `ct.test` for each CampaignTest
```python
original_cts = db.query(CampaignTest).filter(...).all()
for ct in original_cts:
src_test = ct.test # Lazy load per CampaignTest
```
**Extra queries:** N (N = campaign tests)
**Fix:** Add `joinedload(CampaignTest.test)`.
---
### 7.2 `check_and_run_recurring_campaigns` — N+1 query problem
**Lines:** 175185
**Problem type:** N+1 — 1 query per campaign for red_tech users
```python
for campaign in due_campaigns:
# ... clone ...
red_techs = db.query(User).filter(User.role == "red_tech", ...).all()
for user in red_techs:
create_notification(...) # Also commits per notification
```
**Extra queries:** 1 per due campaign (for User query)
**Note:** `create_notification` does `db.commit()` each time — consider batching.
---
## 8. snapshot_service.py
### 8.1 `create_snapshot` — Severe N+1 via helper
**Lines:** 4177
**Problem type:** Queries in loop — calls `calculate_technique_score` for every technique
```python
techniques = db.query(Technique).all()
for tech in techniques:
score_data = calculate_technique_score(tech, db) # 5+ queries each
```
**Extra queries:** 5 × N (N = all techniques, ~700+) → **~3,500+ queries**
---
## 9. status_service.py
### 9.1 `recalculate_technique_status` — Potential lazy loading
**Lines:** 2829
**Problem type:** Missing eager loading — accesses `technique.tests`
```python
tests = technique.tests # Lazy load if technique was loaded without tests
```
**Extra queries:** 1 (if technique was loaded without `selectinload(Technique.tests)`)
**Note:** Caller-dependent; if technique comes from a query without eager loading, this triggers 1 extra query.
---
## 10. test_workflow_service.py
### 10.1 `get_retest_chain` — Redundant queries
**Lines:** 416428
**Problem type:** Redundant — 3 separate queries that could be 12
```python
test = db.query(Test).filter(Test.id == tid).first()
original = db.query(Test).filter(Test.id == original_id).first()
retests = db.query(Test).filter(Test.retest_of == original_id).order_by(...).all()
```
**Fix:** Single query: get original by `original_id`, then get all retests in one query. The first test fetch is only needed to determine `original_id`; could use a CTE or single query with `UNION`/subquery.
---
## 11. Files with no SQLAlchemy performance issues
The following service files were reviewed and do **not** exhibit the targeted problems:
| File | Notes |
|------|-------|
| `audit_service.py` | Single insert per call, no loops |
| `atomic_import_service.py` | Pre-loads existing_ids, no N+1 |
| `caldera_import_service.py` | Pre-loads existing_ids, no N+1 |
| `compliance_import_service.py` | Pre-loads all_techniques, existing_controls, existing_mappings |
| `elastic_import_service.py` | Pre-loads existing_ids |
| `intel_service.py` | Pre-loads techniques and existing_urls |
| `jira_service.py` | No db.query in loops |
| `lolbas_import_service.py` | Pre-loads existing_ids |
| `mitre_sync_service.py` | Pre-loads existing_techniques |
| `notification_service.py` | Queries are not in loops (create_notification is called in loops but does single insert) |
| `report_engine.py` | No database access |
| `score_cache.py` | No direct db queries |
| `sigma_import_service.py` | Pre-loads existing_ids |
| `stale_detection_service.py` | Single query with subquery, no N+1 |
| `tempo_service.py` | Single query per call |
| `threat_actor_import_service.py` | Pre-loads existing_actors, technique_by_mitre_id, existing_rels |
| `worklog_service.py` | Simple CRUD, no loops |
---
## Summary Table
| File | Function | Problem | Est. Extra Queries |
|------|----------|---------|--------------------|
| operational_metrics_service | calculate_mttd | N+1 | 2×N (validated tests) |
| operational_metrics_service | calculate_mttr | N+1 | N (remediated tests) |
| operational_metrics_service | get_operational_trend | N+1 | ~1352 (weeks) |
| operational_metrics_service | calculate_rejection_rate | Redundant | 5 |
| scoring_service | calculate_organization_score | N+1 | ~3,5004,000 |
| scoring_service | calculate_tactic_score | N+1 | 5×N (tactic techniques) |
| scoring_service | calculate_actor_coverage_score | N+1 | 5×N (actor techniques) |
| scoring_service | calculate_technique_score | Multiple per call | 5 per technique |
| d3fend_import_service | _upsert_techniques | N+1 | N (techniques) |
| d3fend_import_service | import_d3fend_mappings | N+1 | ~200500 |
| d3fend_import_service | get_defenses_for_technique | Missing eager load | N (mappings) |
| report_generation_service | generate_purple_campaign_report | N+1 | N (campaign tests) |
| osint_enrichment_service | enrich_technique_with_cves | N+1 | ~10 per technique |
| osint_enrichment_service | enrich_all_techniques | N+1 cascade | ~7,000+ |
| campaign_service | get_campaign_progress | Missing eager load | N (campaign tests) |
| campaign_service | generate_campaign_from_threat_actor | N+1 | N (gap techniques) |
| campaign_scheduler_service | _clone_campaign | Missing eager load | N (campaign tests) |
| campaign_scheduler_service | check_and_run_recurring_campaigns | N+1 | 1 per campaign |
| snapshot_service | create_snapshot | N+1 | ~3,500+ |
| status_service | recalculate_technique_status | Lazy load | 1 |
| test_workflow_service | get_retest_chain | Redundant | 2 |
---
## Recommended Fix Priority
1. **P0 — scoring_service.py** `calculate_organization_score`: ~3,500+ queries per call.
2. **P0 — snapshot_service.py** `create_snapshot`: ~3,500+ queries per snapshot.
3. **P1 — operational_metrics_service.py** `calculate_mttd`, `calculate_mttr`, `get_operational_trend`.
4. **P1 — osint_enrichment_service.py** `enrich_technique_with_cves` and `enrich_all_techniques`.
5. **P2 — d3fend_import_service.py** `_upsert_techniques`, `import_d3fend_mappings`, `get_defenses_for_technique`.
6. **P2 — campaign_service.py** and **campaign_scheduler_service.py**.
7. **P3 — report_generation_service.py**, **test_workflow_service.py**, **status_service.py**.