Files
Aegis/docs/DATA_SOURCES.md

8.5 KiB

Aegis — Data Sources

Aegis imports security data from multiple external sources to populate its test catalog, detection rules, defensive techniques, threat actors, and compliance mappings. This document describes each source, its format, and how to manage imports.


Overview

Source Type Format Destination
MITRE ATT&CK Techniques STIX 2.0 (TAXII / GitHub) techniques
Atomic Red Team Test Templates YAML test_templates
SigmaHQ Detection Rules YAML detection_rules
Elastic Detection Rules Detection Rules TOML detection_rules
CALDERA Test Templates YAML (multi-doc) test_templates
LOLBAS Test Templates YAML test_templates
MITRE D3FEND Defensive Techniques JSON-LD defensive_techniques + mappings
MITRE CTI Threat Actors STIX 2.0 (JSON) threat_actors + technique mappings
NIST 800-53 → ATT&CK Compliance Mappings STIX 2.0 (JSON) compliance_* tables

MITRE ATT&CK (Techniques)

Repository: https://github.com/mitre/cti (Enterprise ATT&CK) Protocol: TAXII 2.0 with GitHub JSON fallback Format: STIX 2.0 bundles Service: mitre_sync_service.py Schedule: Automatic every 24 hours via APScheduler

Extracted fields:

  • mitre_id — External ID (e.g., T1059, T1059.001)
  • name — Technique name
  • description — Full description
  • tactic — ATT&CK tactic (execution, persistence, etc.)
  • platforms — Target platforms (windows, linux, macos, etc.)
  • is_subtechnique — Whether it's a sub-technique
  • url — Link to MITRE page

Manual trigger:

# Via API
curl -X POST http://localhost:8000/api/v1/system/sync-mitre \
  -H "Authorization: Bearer $TOKEN"

# Via container
docker exec aegis-backend python -c "from app.services.mitre_sync_service import sync_mitre_attack; sync_mitre_attack()"

Volume: ~700 techniques (Enterprise ATT&CK v16)

Troubleshooting:

  • If TAXII fails (timeout/rate limit), the service automatically falls back to the GitHub JSON bundle
  • Check data_sources table for last_sync_at and last_sync_stats

Atomic Red Team (Test Templates)

Repository: https://github.com/redcanaryco/atomic-red-team Format: YAML (one file per technique under atomics/T*/T*.yaml) Service: atomic_import_service.py

Extracted fields:

  • name — Test name
  • description — What the test does
  • mitre_technique_id — ATT&CK technique ID
  • attack_commands — Commands to execute
  • expected_detection — What should be detected
  • platform — Target OS
  • severity — Derived from technique context
  • cleanup_commands — Cleanup procedures

Import:

curl -X POST http://localhost:8000/api/v1/system/import-atomic-red-team \
  -H "Authorization: Bearer $TOKEN"

Volume: ~3,500 test templates Frequency: Monthly or after ATT&CK version updates


SigmaHQ (Detection Rules)

Repository: https://github.com/SigmaHQ/sigma Format: YAML (Sigma rule format) Service: sigma_import_service.py

Extracted fields:

  • name — Rule title
  • description — Rule description
  • query — Sigma detection logic
  • mitre_technique_id — Extracted from tags: attack.tXXXX
  • severity — From level field (low, medium, high, critical)
  • platforms — From logsource.product
  • source = "sigma"

Example Sigma rule tags:

tags:
  - attack.execution
  - attack.t1059.001
  - attack.defense_evasion
  - attack.t1562.001

Volume: ~3,000 detection rules Frequency: Monthly recommended

Troubleshooting:

  • Rules without MITRE technique tags in attack.tXXXX format are skipped
  • Duplicate detection is by name + source + mitre_technique_id

Elastic Detection Rules

Repository: https://github.com/elastic/detection-rules Format: TOML (one file per rule under rules/) Service: elastic_import_service.py

Extracted fields:

  • name — Rule name from [rule]
  • description — Rule description
  • query — KQL/EQL query
  • mitre_technique_id — From [[rule.threat]] entries
  • severity — From rule.severity
  • rule_type — eql, query, threshold, etc.
  • source = "elastic"

TOML structure:

[rule]
name = "Scheduled Task Created via Schtasks"
severity = "medium"
type = "eql"

[[rule.threat]]
framework = "MITRE ATT&CK"
[[rule.threat.technique]]
id = "T1053"
name = "Scheduled Task/Job"
[[rule.threat.technique.subtechnique]]
id = "T1053.005"

Volume: ~1,200 detection rules Frequency: Quarterly recommended


CALDERA (Test Templates)

Repository: https://github.com/mitre/caldera Format: YAML (multi-document, abilities under data/abilities/) Service: caldera_import_service.py

Extracted fields:

  • name — Ability name
  • description — What the ability does
  • mitre_technique_id — From technique.attack_id
  • tactic — ATT&CK tactic
  • platforms — Target platforms
  • attack_commands — Commands per platform/executor

Volume: ~500 abilities Frequency: Quarterly recommended


LOLBAS (Test Templates)

Repository: https://github.com/LOLBAS-Project/LOLBAS Format: YAML (one file per binary under yml/OSBinaries/, yml/OtherMSBinaries/, etc.) Service: lolbas_import_service.py

Extracted fields:

  • name — Binary name (e.g., Mshta.exe)
  • mitre_technique_id — From Commands[].MitreID
  • attack_commands — From Commands[].Command
  • description — From Commands[].Description
  • usecase — From Commands[].Usecase

Volume: ~200 living-off-the-land binaries with ~500 commands Frequency: Quarterly recommended


MITRE D3FEND (Defensive Techniques)

Repository: https://d3fend.mitre.org/ Format: JSON-LD (REST API at https://d3fend.mitre.org/api/) Service: d3fend_import_service.py

Extracted fields:

  • d3fend_id — D3FEND identifier (e.g., D3-AL, D3-NI)
  • name — Defensive technique name
  • description — Definition or comment
  • tactic — Defensive tactic (Detect, Isolate, Deceive, Evict, Harden)
  • ATT&CK ↔ D3FEND mappings stored in defensive_technique_mappings

Volume: ~200 defensive techniques Frequency: Annually (D3FEND updates are infrequent)


MITRE CTI — Threat Actors

Repository: https://github.com/mitre/cti (enterprise-attack) Format: STIX 2.0 JSON bundles (intrusion-set, relationship, attack-pattern) Service: threat_actor_import_service.py

Extracted fields:

  • name — Actor name (e.g., APT28)
  • mitre_id — MITRE group ID (e.g., G0007)
  • aliases — Known aliases (JSONB array)
  • description — Full description
  • country — Attribution (when available)
  • motivation — Espionage, financial, etc.
  • target_sectors / target_regions — JSONB arrays
  • Technique mappings via relationship objects

Volume: ~140 threat actors with ~2,000 technique mappings Frequency: Quarterly recommended


NIST 800-53 → ATT&CK (Compliance)

Repository: https://github.com/center-for-threat-informed-defense/attack-control-framework-mappings Format: STIX 2.0 JSON bundles Service: compliance_import_service.py

Extracted fields:

  • ComplianceFramework — Framework name and version
  • ComplianceControl — Control ID (e.g., AC-2), title, category
  • ComplianceControlMapping — Control ↔ ATT&CK technique associations

Volume: ~1,000 controls with ~5,000 mappings Frequency: Annually (mappings are versioned with the framework)


Managing Data Sources

Admin UI

Navigate to System → Data Sources in the Aegis frontend to:

  • View all configured data sources and their sync status
  • Trigger manual imports
  • Enable/disable individual sources
  • View import statistics (imported, updated, errors)

API Endpoints

# List data sources
GET /api/v1/data-sources

# Trigger import for a specific source
POST /api/v1/data-sources/{id}/sync

# Enable/disable a source
PATCH /api/v1/data-sources/{id}
Source Frequency Reason
MITRE ATT&CK Automatic (24h) Core framework, frequent updates
Atomic Red Team Monthly Active community contributions
SigmaHQ Monthly Active community contributions
Elastic Rules Quarterly Major version-aligned releases
CALDERA Quarterly Less frequent updates
LOLBAS Quarterly Less frequent updates
D3FEND Annually Infrequent updates
CTI Actors Quarterly New groups and campaigns
NIST 800-53 Annually Framework revision cycles