Local stack (npuops-platform)
Standing up the full compose stack as your dev backbone.
npuops-platform is the compose stack everything else talks to. Keep
it running in the background while you develop the rest.
Set up
git clone https://github.com/dudaji-vn/npuops-platform.git
cd npuops-platform
./scripts/bootstrap.shSee Quick start for the bootstrap prompts.
The stack stays up between sessions:
docker compose up -d # bring everything up
docker compose ps # state of every container
docker compose logs -f <svc> # tail one service
docker compose down # stop, keep data volumes
docker compose down -v # stop and drop all dataLayout
npuops-platform/
├── docker-compose.yml
├── .env # generated by bootstrap.sh
├── litellm/
│ ├── config.yaml # model registry
│ ├── callbacks/ # custom pre/post hooks (prompt-injection.py)
│ └── Dockerfile
├── librechat/
│ └── librechat.yaml # mounted into the LibreChat container
├── langfuse/ # Langfuse env templates
├── llm-guard/
│ └── scanners.yml # LLM Guard scanner config
├── monitoring/
│ ├── prometheus.yml
│ ├── rules/ # alert rules
│ ├── alertmanager.yml
│ ├── secrets/slack-webhook # gitignored
│ └── grafana/{dashboards,provisioning}/
├── scripts/
│ ├── bootstrap.sh
│ ├── add-model.sh
│ ├── smoke-test.sh
│ ├── e2e-smoke-test.sh
│ └── postgres-init.sh
└── docs/ # internal design docsWhat you usually iterate on
litellm/config.yaml— adding a model. Prefer the helper:./scripts/add-model.sh.librechat/librechat.yaml— chat-product config. After editing,docker compose restart librechat. (Or use the admin panel to edit live.)monitoring/rules/*.yml— Prometheus alert rules. Reload without restart:curl -X POST http://localhost:9090/-/reloadlitellm/callbacks/*.py— custom LiteLLM hooks (prompt injection, PII, custom routing). After editing, restart litellm:docker compose restart litellm-proxy
Smoke tests
./scripts/smoke-test.sh # quick: LiteLLM ↔ backend
./scripts/e2e-smoke-test.sh # full: chat → gateway → backend → trace
./scripts/e2e-smoke-test.sh --rebuild # rebuild the e2e image firstThe e2e test is the regression check you run before merging anything in the stack repo.
Conventions
- Pin every Docker image version. Never use
:latest. - Secrets only via
.env. Never hardcode indocker-compose.yml. - Every LiteLLM model must carry
backend_typeandhardware_idinmodel_info. The script enforces this; resist the urge to skip it. - Every request must end up with
hardware_idin the Langfuse trace, because W6 reports aggregate by it.
See npuops-platform/CLAUDE.md for the full project conventions.
Sibling repos
When you also need to develop a sibling repo:
DudajiVN/
├── npuops-platform/ # running (docker compose up -d)
├── LibreChat/ # cloned for editing
├── nufichat-admin-panel/ # cloned for editing
└── nufi-console/ # cloned for editingEach sibling repo's dev server points at the platform's exposed ports
(http://localhost:4000 for LiteLLM, http://localhost:3080 for
LibreChat) and shares the platform's secrets via the sibling's .env.
Each sub-guide spells out the exact env vars to copy.