GGUF Host

Pull a Hugging Face GGUF · chat in browser · OpenAI-compatible API

idle

Docker RAM/CPU caps are set in docker-compose.yml (MEM_LIMIT, CPU_LIMIT). Lower n_ctx if you OOM on a small VPS.


    

OpenAI-compatible endpoints on this host:

POST /v1/chat/completions
POST /v1/completions
GET  /health
GET  /api/status

Example: