Modal Not Working?
Cold start too slow, image build failing, function timeout, volume writes not persisting, secrets missing, or GPU quota exceeded? Check live status and fix it fast.
Modal — live status
Updated every 5 minutes. Full history at prismix.dev/service/modal.
What's wrong? Diagnose fast
Cold start taking 10-30+ seconds
GPU cold start = image pull + container boot + Python imports. Use keep_warm=1 to maintain a warm container. Pre-load models outside the function body (module-level or with @modal.enter()) so re-warm starts reuse the loaded model without re-downloading weights.
Image build failed
Check the Builds tab in the Modal dashboard for the error. Common cause: pip package needs a system library. Add apt_install() before pip_install(). Examples: libgl1 (OpenCV), libsndfile1 (audio), ffmpeg (video processing). Use image.run_commands("...") for complex setup.
Function times out
Default timeout = 300s (5 min). Long ML jobs need explicit timeout: @app.function(timeout=3600). Max = 86400s (24h). Checkpoint to Modal Volumes periodically inside long-running functions — timed-out functions lose all in-memory outputs. Monitor progress with modal.experimental.stop_fetching_outputs() for streaming jobs.
Volume writes not persisting
Must call vol.commit() inside the function before it returns. Uncommitted writes are discarded when the container exits. For long training jobs: commit every N steps. Pattern: with vol.batch_upload() as batch: batch.put_file(...).
Secret not accessible / os.environ missing
Modal Secrets are not available in local code — only inside functions running on Modal. Add to decorator: @app.function(secrets=[modal.Secret.from_name("MY_SECRET")]). Inside the function: os.environ["MY_VAR"]. To create: modal secret create MY_SECRET MY_VAR=value.
GPU quota exceeded or billing
Free tier: $30 credit at signup. Check usage at modal.com/billing. If GPU requests fail: account may have reached the concurrency limit. Request an increase at modal.com/support. GPU pricing: A10G $0.30/hr, A100 $2.50/hr, H100 $4.50/hr. Use gpu_config=modal.GPU.A10G(count=1) for multi-GPU.
Modal patterns quick reference
Warm-start GPU function with model pre-loading
import modal
app = modal.App("my-app")
image = (
modal.Image.debian_slim()
.apt_install("libgl1", "ffmpeg") # system deps FIRST
.pip_install("torch", "transformers")
)
@app.cls(gpu="A10G", image=image, keep_warm=1, timeout=600)
class Model:
@modal.enter()
def load(self):
# runs once when container starts — model stays warm
from transformers import pipeline
self.pipe = pipeline("text-generation", model="gpt2")
@modal.method()
def generate(self, prompt: str) -> str:
return self.pipe(prompt)[0]["generated_text"] Volume with explicit commit
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
@app.function(volumes={"/data": vol}, timeout=3600)
def train():
import torch
model = MyModel()
for epoch in range(100):
train_epoch(model)
# checkpoint every 10 epochs
if epoch % 10 == 0:
torch.save(model.state_dict(), f"/data/checkpoint_{epoch}.pt")
vol.commit() # REQUIRED — writes lost without this Secrets usage
# Create secret: modal secret create openai-secret OPENAI_API_KEY=sk-...
@app.function(secrets=[modal.Secret.from_name("openai-secret")])
def call_openai():
import os, openai
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
return client.chat.completions.create(...) Modal GPU options
| GPU | VRAM | Price/hr | Use case |
|---|---|---|---|
| T4 | 16 GB | ~$0.06 | Light inference, dev/test |
| L4 | 24 GB | ~$0.20 | Mid-tier inference, fine-tuning small models |
| A10G | 24 GB | ~$0.30 | General inference, fine-tuning 7B-13B |
| A100 (40GB) | 40 GB | ~$2.50 | Training, large model inference |
| A100 (80GB) | 80 GB | ~$3.50 | Large models (Llama 70B, fine-tune 30B+) |
| H100 | 80 GB | ~$4.50 | Fastest training, cutting-edge models |
Step-by-step fix
- 1
Check live Modal status
Visit prismix.dev/service/modal. Modal tracks scheduling, builds, and dashboard independently.
- 2
Fix cold starts
Add
keep_warm=1to your function/class decorator. Move model loading to a@modal.enter()method inside a@app.clsclass. This loads the model once when the container starts, not on every call. - 3
Fix image build failures
Open the Builds tab in the Modal dashboard. Read the error message. Add system dependencies with
.apt_install("pkg-name")BEFORE.pip_install("...")in your image definition. Use.run_commands("bash -c ...")for arbitrary build steps. - 4
Fix function timeout
Add
timeout=3600(or higher) to your function decorator. Maximum is86400(24h). For long training jobs: add periodicvol.commit()calls to checkpoint progress to a Volume. - 5
Fix volume writes / secrets
Volume writes: call
vol.commit()before the function returns. Secrets: addsecrets=[modal.Secret.from_name("MY_SECRET")]to the decorator, then access inside the function viaos.environ["MY_VAR"].
Get alerted when Modal goes down
Star Modal on Prismix and get emailed the moment status changes. Free, no credit card.
Frequently asked questions
Why is Modal not working?
Modal issues: (1) cold start slow (GPU functions take 5-30s — use keep_warm=1 and @modal.enter() for model loading); (2) image build failed (apt_install system deps before pip_install, check Builds tab); (3) function timeout (default 300s — add timeout=3600 or higher); (4) volume writes lost (call vol.commit() before function return); (5) secret missing (add secrets=[modal.Secret.from_name("NAME")] to decorator); (6) outage (prismix.dev/service/modal).
Is Modal down right now?
Check prismix.dev/service/modal for live Modal status. Also status.modal.com. Modal may have partial outages affecting scheduling, image builds, or storage independently.
Modal cold start slow — how to make it faster?
Modal cold start fix: (1) add keep_warm=1 to keep one container warm at all times; (2) move expensive imports and model loading to @modal.enter() method in @app.cls() class — runs once per container start, not per call; (3) try snapshot_restore pattern for large Python dependency trees; (4) use a smaller GPU (T4 starts faster than A100) for low-latency tasks.
Modal volume writes not persisting — why?
Modal Volumes require explicit commit. Call vol.commit() inside the function before it exits. Uncommitted writes are discarded. For long training loops: commit every N steps. Pattern: torch.save(model.state_dict(), "/data/checkpoint.pt"); vol.commit().
Modal image build failing — how to fix?
Check the Builds tab in the Modal dashboard for the full error log. Most common cause: pip package needs a system library not in the base image. Fix: add .apt_install("package-name") before .pip_install("...") in your image chain. Common system deps: libgl1 (OpenCV/cv2), libsndfile1 (torchaudio), ffmpeg (video), libssl-dev (cryptography).