Free 5 min read

Hugging Face Not Working?

403 gated model error, Inference API rate-limited (429), Space not loading or sleeping, model download timeout, or Hub down? Fix it step by step.

Hugging Face live status

Hugging Face — live status

Updated every 5 minutes. Full history at prismix.dev/service/huggingface.

Full status →

What's wrong? Diagnose fast

🔒

HTTP 403 on model download

Model is "gated" — requires license agreement + HF account. Go to the model page → click "Agree and access repository". Then huggingface-cli login or set HF_TOKEN env var.

⚠️

Inference API 429 error

Free tier: ~50 requests/hour per IP. Add Authorization: Bearer hf_... header to authenticate (higher limits). Upgrade to HF Pro ($9/mo) for production limits.

💤

Space is sleeping

Free Spaces pause after inactivity to save costs. Click the Space to wake it — startup takes 30–60 seconds. Pro Spaces (paid) don't sleep.

🚧

Space stuck in Building

App startup error. Click the "Logs" tab on the Space page to see the Python traceback. Common causes: missing dependency in requirements.txt, wrong Gradio version, GPU out of memory.

🐌

Model download slow or hangs

Large models (7B+ = 10–40GB) are slow. Use HF token for faster authenticated download. Set HF_HUB_ENABLE_HF_TRANSFER=1 for faster downloads with hf_transfer. Use resume_download=True to retry.

💾

Dataset download fails

Some datasets require HF account acceptance (same as gated models). Use: datasets.load_dataset("name", token="hf_..."). Large datasets need sufficient disk space — check with df -h.

Setting up HF token (fixes most 403 and 429 errors)

  1. 1 Go to huggingface.co/settings/tokens → click New token → select Read scope → copy the hf_... token.
  2. 2 Python / transformers: run huggingface-cli login (interactive) or huggingface-cli login --token hf_xxx (non-interactive).
  3. 3 Environment variable (CI/CD): set HF_TOKEN=hf_xxx — the transformers library picks it up automatically.
  4. 4 API calls: add header Authorization: Bearer hf_xxx to your HTTP requests to the Inference API.

Hugging Face error codes

Error Meaning Fix
403 Forbidden Gated model — requires license agreement + HF token Accept license on model page, run huggingface-cli login
401 Unauthorized HF token invalid or expired Generate new token at huggingface.co/settings/tokens
429 Too Many Requests Inference API rate limit reached Add HF token to requests, upgrade to HF Pro
404 Not Found Model ID wrong or model deleted Verify exact model ID at huggingface.co/models
503 Service Unavailable Inference model loading (cold start) Retry after 20s, model is loading from disk
RepositoryNotFoundError Model/dataset doesn't exist or private Check spelling; if private, ensure HF_TOKEN has access

Step-by-step fix

  1. 1

    Check live status

    Visit prismix.dev/service/huggingface or status.huggingface.co for the official HF status page. Hub, Spaces, and Inference API statuses are shown separately.

  2. 2

    Fix 403 gated model

    Go to the model page on huggingface.co → find the "Gated model" banner → click Agree and access repository. Log in with your HF account. Then authenticate locally: huggingface-cli login and paste your token. Note: some gated models (Llama, Gemma) may require waiting for approval from Meta/Google — check your email.

  3. 3

    Fix Inference API 429

    Authenticate your API requests: add the header Authorization: Bearer hf_your_token. The free tier with auth has significantly higher limits. For production workloads, use Hugging Face Dedicated Endpoints (serverless or dedicated GPU) instead of the shared Inference API.

  4. 4

    Wake a sleeping Space

    Click anywhere in the Space to start it. The yellow banner "This Space is sleeping" means it paused to save compute. Startup takes 30–60 seconds. If the Space is in "Error" state: click Logs on the Space page to see the startup traceback. Common fixes: upgrade gradio in requirements.txt, fix import errors, reduce model size to fit within Space GPU limits.

  5. 5

    Speed up model downloads

    Install hf_transfer for multi-threaded downloads: pip install hf_transfer then set HF_HUB_ENABLE_HF_TRANSFER=1. Downloads can be 3–5× faster. For interrupted downloads, just re-run `from_pretrained()` — transformers auto-resumes from the last checkpoint.

  6. 6

    Fix local model cache issues

    If a model loads incorrectly or seems corrupted: clear and re-download. Find the cache directory with python -c "from huggingface_hub import constants; print(constants.HF_HUB_CACHE)". Delete the model subfolder, then re-run from_pretrained(). Default cache: ~/.cache/huggingface/hub/.

🔔

Get alerted when Hugging Face goes down

Star Hugging Face on Prismix and get emailed the moment Hub or Inference API status changes. Free.

Frequently asked questions

Why is Hugging Face not working?

403 = gated model (accept license + huggingface-cli login). 429 = Inference API rate limit (add HF token to requests). Space sleeping = click to wake. Hub outage = check prismix.dev/service/huggingface. Model too large for VRAM = use quantized version.

Hugging Face 403 error — how to fix?

Go to the model page on huggingface.co → click "Agree and access repository". Generate a token at huggingface.co/settings/tokens → run "huggingface-cli login" or set HF_TOKEN=hf_... in your environment. Some models (Llama) also require approval from the publisher.

Hugging Face Inference API not working — 429 error?

Add "Authorization: Bearer hf_your_token" to your API requests. Free tier without auth: ~50 req/hour per IP. With token: much higher limits. For production: use Serverless Inference or Dedicated Endpoints. Upgrade to HF Pro ($9/mo) for priority access.

Hugging Face Space not loading — how to fix?

Check the Space status icon: yellow = sleeping (click to wake, 30-60s). Red = error (check Logs tab). Gray = stopped (owner disabled). Building = startup in progress. If in Error, fix the issue in requirements.txt or app code and push a new commit.

Hugging Face model download slow — how to speed it up?

Install hf_transfer (pip install hf_transfer) and set HF_HUB_ENABLE_HF_TRANSFER=1. This enables multi-threaded downloading and is 3-5x faster. Also authenticate with your HF token for faster CDN speeds on gated models.

Related ML tools