453 lines
15 KiB
Markdown
453 lines
15 KiB
Markdown
# SQLAlchemy Performance Analysis — backend/app/services
|
||
|
||
**Analysis Date:** 2025-02-18 (updated February 18, 2026)
|
||
**Scope:** All Python files under `backend/app/services/`
|
||
**Focus:** N+1 queries, missing eager loading, redundant queries, queries in loops
|
||
|
||
> **Update (Feb 18, 2026):** The most critical N+1 issues have been resolved:
|
||
> - `scoring_service.py` — `bulk_technique_scores()` now uses 5 aggregated subqueries instead of per-technique loops (~3,500 queries reduced to ~5).
|
||
> - `heatmap_service.py` — Extracted to a dedicated service with batch-fetching (`test_counts`, `rule_counts` in 2 SQL subqueries instead of per-technique N+1).
|
||
> - `SATechniqueRepository.find_all_with_test_counts()` — Single query with subqueries providing pre-aggregated counts for all techniques.
|
||
> - Missing database indexes added via Alembic migrations (b024, b026) covering `tests`, `techniques`, `audit_logs`, and `detection_rules` tables.
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
| Severity | Count | Files Affected |
|
||
|----------|-------|----------------|
|
||
| **Critical (N+1)** | 12 | 8 files |
|
||
| **High (Missing eager loading)** | 4 | 4 files |
|
||
| **Medium (Redundant queries)** | 3 | 3 files |
|
||
|
||
---
|
||
|
||
## 1. operational_metrics_service.py
|
||
|
||
### 1.1 `calculate_mttd` — N+1 query problem
|
||
**Lines:** 44–79
|
||
**Problem type:** N+1 — 2 queries per test inside loop
|
||
|
||
```python
|
||
tests = db.query(Test).filter(Test.state == TestState.validated).all()
|
||
for test in tests:
|
||
red_start = db.query(AuditLog.timestamp).filter(...).first() # Query per test
|
||
blue_start = db.query(AuditLog.timestamp).filter(...).first() # Query per test
|
||
```
|
||
|
||
**Extra queries:** 2 × N (N = number of validated tests)
|
||
**Fix:** Use a single query with `func.max` and `case` to get both timestamps per test, or batch-fetch all audit log entries for the test IDs in one query.
|
||
|
||
---
|
||
|
||
### 1.2 `calculate_mttr` — N+1 query problem
|
||
**Lines:** 86–123
|
||
**Problem type:** N+1 — 1 query per test inside loop
|
||
|
||
```python
|
||
tests = db.query(Test).filter(...).all()
|
||
for test in tests:
|
||
remediation_complete = db.query(AuditLog.timestamp).filter(
|
||
AuditLog.entity_id == str(test.id), ...
|
||
).first()
|
||
```
|
||
|
||
**Extra queries:** N (N = tests with completed remediation)
|
||
**Fix:** Batch-fetch audit log entries for all test IDs in one query, then build a lookup dict.
|
||
|
||
---
|
||
|
||
### 1.3 `get_operational_trend` — N+1 query problem
|
||
**Lines:** 354–392
|
||
**Problem type:** N+1 — 1 query per week inside loop
|
||
|
||
```python
|
||
while current < now:
|
||
validated_up_to = db.query(Test).filter(
|
||
Test.state == TestState.validated,
|
||
Test.red_validated_at <= week_end,
|
||
).all()
|
||
# ... process ...
|
||
current = week_end
|
||
```
|
||
|
||
**Extra queries:** ~13 (for 90-day period) or ~52 (for 1-year period)
|
||
**Fix:** Single query with `date_trunc` and `group_by` to get counts per week, or fetch all validated tests once and filter in Python.
|
||
|
||
---
|
||
|
||
### 1.4 `calculate_rejection_rate` — Redundant queries
|
||
**Lines:** 286–328
|
||
**Problem type:** Redundant — 6 separate count queries that could be combined
|
||
|
||
```python
|
||
validated_count = db.query(func.count(Test.id)).filter(...).scalar()
|
||
rejected_count = db.query(func.count(Test.id)).filter(...).scalar()
|
||
red_rejected = db.query(func.count(Test.id)).filter(...).scalar()
|
||
red_total = db.query(func.count(Test.id)).filter(...).scalar()
|
||
blue_rejected = db.query(func.count(Test.id)).filter(...).scalar()
|
||
blue_total = db.query(func.count(Test.id)).filter(...).scalar()
|
||
```
|
||
|
||
**Extra queries:** 5 (could be 1–2 with conditional aggregation)
|
||
**Fix:** Single query with `func.count` and `case` for each condition.
|
||
|
||
---
|
||
|
||
## 2. scoring_service.py
|
||
|
||
### 2.1 `calculate_technique_score` — Multiple queries per call
|
||
**Lines:** 26–204
|
||
**Problem type:** 5+ separate queries per technique (Tests, DetectionRule count, TestDetectionResult count, DefensiveTechniqueMapping count, Test.max)
|
||
|
||
Each call to `calculate_technique_score` executes:
|
||
- 1 query for `all_tests`
|
||
- 1 query for `total_rules`
|
||
- 1 query for `triggered_rules` (if total_rules > 0)
|
||
- 1 query for `total_countermeasures`
|
||
- 1 query for `most_recent_test`
|
||
|
||
**Extra queries per technique:** ~5
|
||
|
||
---
|
||
|
||
### 2.2 `calculate_tactic_score` — N+1 via helper
|
||
**Lines:** 209–234
|
||
**Problem type:** Queries in loop — calls `calculate_technique_score` for each technique
|
||
|
||
```python
|
||
techniques = db.query(Technique).filter(...).all()
|
||
for tech in techniques:
|
||
result = calculate_technique_score(tech, db) # 5+ queries each
|
||
```
|
||
|
||
**Extra queries:** 5 × N (N = techniques in tactic, often 10–50)
|
||
|
||
---
|
||
|
||
### 2.3 `calculate_actor_coverage_score` — N+1 via helper
|
||
**Lines:** 241–293
|
||
**Problem type:** Queries in loop — calls `calculate_technique_score` for each technique
|
||
|
||
```python
|
||
for tech in techniques:
|
||
result = calculate_technique_score(tech, db)
|
||
```
|
||
|
||
**Extra queries:** 5 × N (N = techniques used by actor)
|
||
|
||
---
|
||
|
||
### 2.4 `calculate_organization_score` — Severe N+1
|
||
**Lines:** 300–309
|
||
**Problem type:** Queries in loop — calls `calculate_technique_score` for every technique
|
||
|
||
```python
|
||
all_techniques = db.query(Technique).all()
|
||
for tech in all_techniques:
|
||
result = calculate_technique_score(tech, db)
|
||
```
|
||
|
||
**Extra queries:** 5 × N where N = total techniques (~700–800) → **~3,500–4,000 queries**
|
||
|
||
---
|
||
|
||
### 2.5 `calculate_organization_score` — Second N+1 loop
|
||
**Lines:** 352–355
|
||
**Problem type:** Queries in loop — second pass over critical techniques
|
||
|
||
```python
|
||
for tech in critical_techniques:
|
||
result = calculate_technique_score(tech, db)
|
||
```
|
||
|
||
**Extra queries:** 5 × M (M = critical techniques, ~50–200)
|
||
|
||
---
|
||
|
||
## 3. d3fend_import_service.py
|
||
|
||
### 3.1 `_upsert_techniques` — N+1 query problem
|
||
**Lines:** 90–96
|
||
**Problem type:** N+1 — 1 query per technique in loop
|
||
|
||
```python
|
||
for tech_data in techniques:
|
||
existing = db.query(DefensiveTechnique).filter(
|
||
DefensiveTechnique.d3fend_id == tech_data["d3fend_id"]
|
||
).first()
|
||
```
|
||
|
||
**Extra queries:** N (N = number of D3FEND techniques, ~50–100)
|
||
|
||
**Fix:** Pre-load all existing techniques into a dict keyed by `d3fend_id` before the loop.
|
||
|
||
---
|
||
|
||
### 3.2 `import_d3fend_mappings` — N+1 query problem
|
||
**Lines:** 324–331
|
||
**Problem type:** N+1 — 1 query per (mitre_id, d3fend_id) pair in nested loop
|
||
|
||
```python
|
||
for mitre_id, d3fend_ids in _ATTACK_TO_D3FEND.items():
|
||
for d3fend_id in d3fend_ids:
|
||
existing = db.query(DefensiveTechniqueMapping).filter(
|
||
DefensiveTechniqueMapping.attack_technique_id == attack_tech.id,
|
||
DefensiveTechniqueMapping.defensive_technique_id == def_tech.id,
|
||
).first()
|
||
```
|
||
|
||
**Extra queries:** ~200–500 (depends on mapping size)
|
||
|
||
**Fix:** Pre-load existing mappings into a set of `(attack_tech_id, def_tech_id)` tuples.
|
||
|
||
---
|
||
|
||
### 3.3 `get_defenses_for_technique` — Missing eager loading
|
||
**Lines:** 428–453
|
||
**Problem type:** Lazy loading — accesses `m.defensive_technique` in loop
|
||
|
||
```python
|
||
mappings = db.query(DefensiveTechniqueMapping).filter(...).all()
|
||
for m in mappings:
|
||
dt = m.defensive_technique # Lazy load per mapping
|
||
```
|
||
|
||
**Extra queries:** N (N = number of mappings for the technique)
|
||
|
||
**Fix:** Add `joinedload(DefensiveTechniqueMapping.defensive_technique)` to the query.
|
||
|
||
---
|
||
|
||
## 4. report_generation_service.py
|
||
|
||
### 4.1 `generate_purple_campaign_report` — N+1 query problem
|
||
**Lines:** 36–46
|
||
**Problem type:** N+1 — 1 query per test in loop
|
||
|
||
```python
|
||
for test in campaign_tests:
|
||
technique = db.query(Technique).filter(Technique.id == test.technique_id).first()
|
||
```
|
||
|
||
**Extra queries:** N (N = number of campaign tests)
|
||
|
||
**Fix:** Eager-load Technique when fetching campaign_tests, or batch-query techniques by IDs.
|
||
|
||
---
|
||
|
||
## 5. osint_enrichment_service.py
|
||
|
||
### 5.1 `enrich_technique_with_cves` — N+1 query problem
|
||
**Lines:** 59–75
|
||
**Problem type:** N+1 — 1 query per CVE in loop
|
||
|
||
```python
|
||
for vuln in data.get("vulnerabilities", []):
|
||
exists = db.query(OsintItem.id).filter(
|
||
OsintItem.technique_id == technique.id,
|
||
OsintItem.source_url.contains(cve_id),
|
||
).first()
|
||
```
|
||
|
||
**Extra queries:** Up to 10 per technique (resultsPerPage=10)
|
||
|
||
---
|
||
|
||
### 5.2 `enrich_all_techniques` — N+1 cascade
|
||
**Lines:** 134–153
|
||
**Problem type:** Queries in loop — calls `enrich_technique_with_cves` for each technique
|
||
|
||
```python
|
||
techniques = db.query(Technique).all()
|
||
for i, tech in enumerate(techniques):
|
||
total += enrich_technique_with_cves(db, tech) # N+1 inside
|
||
```
|
||
|
||
**Extra queries:** ~10 × N (N = all techniques, ~700+)
|
||
|
||
---
|
||
|
||
## 6. campaign_service.py
|
||
|
||
### 6.1 `get_campaign_progress` — Missing eager loading
|
||
**Lines:** 74–92
|
||
**Problem type:** Lazy loading — accesses `ct.test` for each CampaignTest
|
||
|
||
```python
|
||
campaign_tests = db.query(CampaignTest).filter(...).all()
|
||
for ct in campaign_tests:
|
||
test = ct.test # Lazy load per CampaignTest
|
||
```
|
||
|
||
**Extra queries:** N (N = campaign tests)
|
||
|
||
**Fix:** Add `joinedload(CampaignTest.test)` or `selectinload(CampaignTest.test)`.
|
||
|
||
---
|
||
|
||
### 6.2 `generate_campaign_from_threat_actor` — N+1 query problem
|
||
**Lines:** 155–168
|
||
**Problem type:** N+1 — 1 query per technique in loop
|
||
|
||
```python
|
||
for tech, _at in gap_techniques:
|
||
template = db.query(TestTemplate).filter(
|
||
TestTemplate.mitre_technique_id == tech.mitre_id,
|
||
...
|
||
).first()
|
||
```
|
||
|
||
**Extra queries:** N (N = gap techniques for the actor)
|
||
|
||
**Fix:** Pre-load templates by mitre_id into a dict before the loop.
|
||
|
||
---
|
||
|
||
## 7. campaign_scheduler_service.py
|
||
|
||
### 7.1 `_clone_campaign` — Missing eager loading
|
||
**Lines:** 76–86
|
||
**Problem type:** Lazy loading — accesses `ct.test` for each CampaignTest
|
||
|
||
```python
|
||
original_cts = db.query(CampaignTest).filter(...).all()
|
||
for ct in original_cts:
|
||
src_test = ct.test # Lazy load per CampaignTest
|
||
```
|
||
|
||
**Extra queries:** N (N = campaign tests)
|
||
|
||
**Fix:** Add `joinedload(CampaignTest.test)`.
|
||
|
||
---
|
||
|
||
### 7.2 `check_and_run_recurring_campaigns` — N+1 query problem
|
||
**Lines:** 175–185
|
||
**Problem type:** N+1 — 1 query per campaign for red_tech users
|
||
|
||
```python
|
||
for campaign in due_campaigns:
|
||
# ... clone ...
|
||
red_techs = db.query(User).filter(User.role == "red_tech", ...).all()
|
||
for user in red_techs:
|
||
create_notification(...) # Also commits per notification
|
||
```
|
||
|
||
**Extra queries:** 1 per due campaign (for User query)
|
||
**Note:** `create_notification` does `db.commit()` each time — consider batching.
|
||
|
||
---
|
||
|
||
## 8. snapshot_service.py
|
||
|
||
### 8.1 `create_snapshot` — Severe N+1 via helper
|
||
**Lines:** 41–77
|
||
**Problem type:** Queries in loop — calls `calculate_technique_score` for every technique
|
||
|
||
```python
|
||
techniques = db.query(Technique).all()
|
||
for tech in techniques:
|
||
score_data = calculate_technique_score(tech, db) # 5+ queries each
|
||
```
|
||
|
||
**Extra queries:** 5 × N (N = all techniques, ~700+) → **~3,500+ queries**
|
||
|
||
---
|
||
|
||
## 9. status_service.py
|
||
|
||
### 9.1 `recalculate_technique_status` — Potential lazy loading
|
||
**Lines:** 28–29
|
||
**Problem type:** Missing eager loading — accesses `technique.tests`
|
||
|
||
```python
|
||
tests = technique.tests # Lazy load if technique was loaded without tests
|
||
```
|
||
|
||
**Extra queries:** 1 (if technique was loaded without `selectinload(Technique.tests)`)
|
||
|
||
**Note:** Caller-dependent; if technique comes from a query without eager loading, this triggers 1 extra query.
|
||
|
||
---
|
||
|
||
## 10. test_workflow_service.py
|
||
|
||
### 10.1 `get_retest_chain` — Redundant queries
|
||
**Lines:** 416–428
|
||
**Problem type:** Redundant — 3 separate queries that could be 1–2
|
||
|
||
```python
|
||
test = db.query(Test).filter(Test.id == tid).first()
|
||
original = db.query(Test).filter(Test.id == original_id).first()
|
||
retests = db.query(Test).filter(Test.retest_of == original_id).order_by(...).all()
|
||
```
|
||
|
||
**Fix:** Single query: get original by `original_id`, then get all retests in one query. The first test fetch is only needed to determine `original_id`; could use a CTE or single query with `UNION`/subquery.
|
||
|
||
---
|
||
|
||
## 11. Files with no SQLAlchemy performance issues
|
||
|
||
The following service files were reviewed and do **not** exhibit the targeted problems:
|
||
|
||
| File | Notes |
|
||
|------|-------|
|
||
| `audit_service.py` | Single insert per call, no loops |
|
||
| `atomic_import_service.py` | Pre-loads existing_ids, no N+1 |
|
||
| `caldera_import_service.py` | Pre-loads existing_ids, no N+1 |
|
||
| `compliance_import_service.py` | Pre-loads all_techniques, existing_controls, existing_mappings |
|
||
| `elastic_import_service.py` | Pre-loads existing_ids |
|
||
| `intel_service.py` | Pre-loads techniques and existing_urls |
|
||
| `jira_service.py` | No db.query in loops |
|
||
| `lolbas_import_service.py` | Pre-loads existing_ids |
|
||
| `mitre_sync_service.py` | Pre-loads existing_techniques |
|
||
| `notification_service.py` | Queries are not in loops (create_notification is called in loops but does single insert) |
|
||
| `report_engine.py` | No database access |
|
||
| `score_cache.py` | No direct db queries |
|
||
| `sigma_import_service.py` | Pre-loads existing_ids |
|
||
| `stale_detection_service.py` | Single query with subquery, no N+1 |
|
||
| `tempo_service.py` | Single query per call |
|
||
| `threat_actor_import_service.py` | Pre-loads existing_actors, technique_by_mitre_id, existing_rels |
|
||
| `worklog_service.py` | Simple CRUD, no loops |
|
||
|
||
---
|
||
|
||
## Summary Table
|
||
|
||
| File | Function | Problem | Est. Extra Queries |
|
||
|------|----------|---------|--------------------|
|
||
| operational_metrics_service | calculate_mttd | N+1 | 2×N (validated tests) |
|
||
| operational_metrics_service | calculate_mttr | N+1 | N (remediated tests) |
|
||
| operational_metrics_service | get_operational_trend | N+1 | ~13–52 (weeks) |
|
||
| operational_metrics_service | calculate_rejection_rate | Redundant | 5 |
|
||
| scoring_service | calculate_organization_score | N+1 | ~3,500–4,000 |
|
||
| scoring_service | calculate_tactic_score | N+1 | 5×N (tactic techniques) |
|
||
| scoring_service | calculate_actor_coverage_score | N+1 | 5×N (actor techniques) |
|
||
| scoring_service | calculate_technique_score | Multiple per call | 5 per technique |
|
||
| d3fend_import_service | _upsert_techniques | N+1 | N (techniques) |
|
||
| d3fend_import_service | import_d3fend_mappings | N+1 | ~200–500 |
|
||
| d3fend_import_service | get_defenses_for_technique | Missing eager load | N (mappings) |
|
||
| report_generation_service | generate_purple_campaign_report | N+1 | N (campaign tests) |
|
||
| osint_enrichment_service | enrich_technique_with_cves | N+1 | ~10 per technique |
|
||
| osint_enrichment_service | enrich_all_techniques | N+1 cascade | ~7,000+ |
|
||
| campaign_service | get_campaign_progress | Missing eager load | N (campaign tests) |
|
||
| campaign_service | generate_campaign_from_threat_actor | N+1 | N (gap techniques) |
|
||
| campaign_scheduler_service | _clone_campaign | Missing eager load | N (campaign tests) |
|
||
| campaign_scheduler_service | check_and_run_recurring_campaigns | N+1 | 1 per campaign |
|
||
| snapshot_service | create_snapshot | N+1 | ~3,500+ |
|
||
| status_service | recalculate_technique_status | Lazy load | 1 |
|
||
| test_workflow_service | get_retest_chain | Redundant | 2 |
|
||
|
||
---
|
||
|
||
## Recommended Fix Priority
|
||
|
||
1. **P0 — scoring_service.py** `calculate_organization_score`: ~3,500+ queries per call.
|
||
2. **P0 — snapshot_service.py** `create_snapshot`: ~3,500+ queries per snapshot.
|
||
3. **P1 — operational_metrics_service.py** `calculate_mttd`, `calculate_mttr`, `get_operational_trend`.
|
||
4. **P1 — osint_enrichment_service.py** `enrich_technique_with_cves` and `enrich_all_techniques`.
|
||
5. **P2 — d3fend_import_service.py** `_upsert_techniques`, `import_d3fend_mappings`, `get_defenses_for_technique`.
|
||
6. **P2 — campaign_service.py** and **campaign_scheduler_service.py**.
|
||
7. **P3 — report_generation_service.py**, **test_workflow_service.py**, **status_service.py**.
|