docs: enterprise refactor plan with ralph specs

This commit is contained in:
debian
2026-03-04 16:17:03 -05:00
parent 4c92712d20
commit f8191133c8
204 changed files with 32722 additions and 422 deletions

View File

@@ -0,0 +1,77 @@
# ABE — Production Hardening Specification
## Health Endpoints (no auth required)
### GET /health
Returns 200 if server is up.
```json
{ "status": "ok", "version": "0.1.0", "uptime_seconds": 3600 }
```
### GET /ready
Returns 200 if server is ready to accept requests (DB connected, no critical errors).
Returns 503 if not ready.
```json
{ "status": "ready", "db": "connected", "active_sessions": 2 }
```
Used by Docker HEALTHCHECK and Kubernetes readiness probes.
## Docker improvements
### Backend Dockerfile
Add HEALTHCHECK:
```dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3001/health || exit 1
```
### docker-compose.yml updates
- Add healthcheck to backend service
- Add `restart: unless-stopped` to both services
- Add `data/` volume for SQLite persistence
- Load `.env` file: `env_file: .env`
- Add `depends_on: backend: condition: service_healthy` to frontend
### .env.example file
Create `.env.example` in repo root with all variables and example values.
`.env` added to `.gitignore`.
## Error handling improvements
Global Express error handler in `src/server/index.ts`:
- Catch all unhandled errors
- Log with timestamp and stack trace
- Return consistent JSON error format:
```json
{ "error": "Internal server error", "code": "INTERNAL_ERROR", "timestamp": 1705312200000 }
```
Never expose stack traces in production (NODE_ENV=production).
## Graceful shutdown
On SIGTERM/SIGINT:
1. Stop accepting new sessions
2. Wait for active sessions to finish (max 30s)
3. Close DB connection
4. Exit 0
## Concurrency limits
- Max concurrent exploration sessions: configurable via `ABE_MAX_CONCURRENT_SESSIONS` (default: 3)
- If limit reached, POST /api/sessions returns 429 with:
```json
{ "error": "Max concurrent sessions reached", "active": 3, "limit": 3 }
```
## Logging improvements
Replace console.log with structured logger (use `pino`):
```typescript
log.info({ sessionId, url, event: 'session_started' }, 'Session started')
log.error({ anomalyId, error }, 'Failed to capture screenshot')
```
All logs go to stdout (Docker captures them).
Log level configurable via `ABE_LOG_LEVEL` env var (default: 'info').