Local stack (npuops-platform)

npuops-platform is the compose stack everything else talks to. Keep it running in the background while you develop the rest.

Set up

git clone https://github.com/dudaji-vn/npuops-platform.git
cd npuops-platform
./scripts/bootstrap.sh

See Quick start for the bootstrap prompts.

The stack stays up between sessions:

docker compose up -d           # bring everything up
docker compose ps              # state of every container
docker compose logs -f <svc>   # tail one service
docker compose down            # stop, keep data volumes
docker compose down -v         # stop and drop all data

Layout

npuops-platform/
├── docker-compose.yml
├── .env                       # generated by bootstrap.sh
├── litellm/
│   ├── config.yaml            # model registry
│   ├── callbacks/             # custom pre/post hooks (prompt-injection.py)
│   └── Dockerfile
├── librechat/
│   └── librechat.yaml         # mounted into the LibreChat container
├── langfuse/                  # Langfuse env templates
├── llm-guard/
│   └── scanners.yml           # LLM Guard scanner config
├── monitoring/
│   ├── prometheus.yml
│   ├── rules/                 # alert rules
│   ├── alertmanager.yml
│   ├── secrets/slack-webhook  # gitignored
│   └── grafana/{dashboards,provisioning}/
├── scripts/
│   ├── bootstrap.sh
│   ├── add-model.sh
│   ├── smoke-test.sh
│   ├── e2e-smoke-test.sh
│   └── postgres-init.sh
└── docs/                      # internal design docs

What you usually iterate on

litellm/config.yaml — adding a model. Prefer the helper: ./scripts/add-model.sh.
librechat/librechat.yaml — chat-product config. After editing, docker compose restart librechat. (Or use the admin panel to edit live.)
monitoring/rules/*.yml — Prometheus alert rules. Reload without restart:
```
curl -X POST http://localhost:9090/-/reload
```
litellm/callbacks/*.py — custom LiteLLM hooks (prompt injection, PII, custom routing). After editing, restart litellm:
```
docker compose restart litellm-proxy
```

Smoke tests

./scripts/smoke-test.sh          # quick: LiteLLM ↔ backend
./scripts/e2e-smoke-test.sh      # full: chat → gateway → backend → trace
./scripts/e2e-smoke-test.sh --rebuild    # rebuild the e2e image first

The e2e test is the regression check you run before merging anything in the stack repo.

Conventions

Pin every Docker image version. Never use :latest.
Secrets only via .env. Never hardcode in docker-compose.yml.
Every LiteLLM model must carry backend_type and hardware_id in model_info. The script enforces this; resist the urge to skip it.
Every request must end up with hardware_id in the Langfuse trace, because W6 reports aggregate by it.

See npuops-platform/CLAUDE.md for the full project conventions.

Sibling repos

When you also need to develop a sibling repo:

DudajiVN/
├── npuops-platform/      # running (docker compose up -d)
├── LibreChat/            # cloned for editing
├── nufichat-admin-panel/ # cloned for editing
└── nufi-console/         # cloned for editing

Each sibling repo's dev server points at the platform's exposed ports (http://localhost:4000 for LiteLLM, http://localhost:3080 for LibreChat) and shares the platform's secrets via the sibling's .env.

Each sub-guide spells out the exact env vars to copy.