docs: enterprise refactor plan with ralph specs
This commit is contained in:
77
.ralph/specs/legacy/production-hardening.md
Normal file
77
.ralph/specs/legacy/production-hardening.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# ABE — Production Hardening Specification
|
||||
|
||||
## Health Endpoints (no auth required)
|
||||
|
||||
### GET /health
|
||||
Returns 200 if server is up.
|
||||
```json
|
||||
{ "status": "ok", "version": "0.1.0", "uptime_seconds": 3600 }
|
||||
```
|
||||
|
||||
### GET /ready
|
||||
Returns 200 if server is ready to accept requests (DB connected, no critical errors).
|
||||
Returns 503 if not ready.
|
||||
```json
|
||||
{ "status": "ready", "db": "connected", "active_sessions": 2 }
|
||||
```
|
||||
|
||||
Used by Docker HEALTHCHECK and Kubernetes readiness probes.
|
||||
|
||||
## Docker improvements
|
||||
|
||||
### Backend Dockerfile
|
||||
Add HEALTHCHECK:
|
||||
```dockerfile
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD curl -f http://localhost:3001/health || exit 1
|
||||
```
|
||||
|
||||
### docker-compose.yml updates
|
||||
- Add healthcheck to backend service
|
||||
- Add `restart: unless-stopped` to both services
|
||||
- Add `data/` volume for SQLite persistence
|
||||
- Load `.env` file: `env_file: .env`
|
||||
- Add `depends_on: backend: condition: service_healthy` to frontend
|
||||
|
||||
### .env.example file
|
||||
Create `.env.example` in repo root with all variables and example values.
|
||||
`.env` added to `.gitignore`.
|
||||
|
||||
## Error handling improvements
|
||||
|
||||
Global Express error handler in `src/server/index.ts`:
|
||||
- Catch all unhandled errors
|
||||
- Log with timestamp and stack trace
|
||||
- Return consistent JSON error format:
|
||||
```json
|
||||
{ "error": "Internal server error", "code": "INTERNAL_ERROR", "timestamp": 1705312200000 }
|
||||
```
|
||||
|
||||
Never expose stack traces in production (NODE_ENV=production).
|
||||
|
||||
## Graceful shutdown
|
||||
|
||||
On SIGTERM/SIGINT:
|
||||
1. Stop accepting new sessions
|
||||
2. Wait for active sessions to finish (max 30s)
|
||||
3. Close DB connection
|
||||
4. Exit 0
|
||||
|
||||
## Concurrency limits
|
||||
|
||||
- Max concurrent exploration sessions: configurable via `ABE_MAX_CONCURRENT_SESSIONS` (default: 3)
|
||||
- If limit reached, POST /api/sessions returns 429 with:
|
||||
```json
|
||||
{ "error": "Max concurrent sessions reached", "active": 3, "limit": 3 }
|
||||
```
|
||||
|
||||
## Logging improvements
|
||||
|
||||
Replace console.log with structured logger (use `pino`):
|
||||
```typescript
|
||||
log.info({ sessionId, url, event: 'session_started' }, 'Session started')
|
||||
log.error({ anomalyId, error }, 'Failed to capture screenshot')
|
||||
```
|
||||
|
||||
All logs go to stdout (Docker captures them).
|
||||
Log level configurable via `ABE_LOG_LEVEL` env var (default: 'info').
|
||||
Reference in New Issue
Block a user