Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
Everyone remembers that sneaky download of Gemini Nano earlier this month? and if you talk to it, it will happily tell you it’s a Gemma.
Since some friends were interested but don’t want to talk to it via dev tools like talking to some poor house elf via a keyhole on a locked door, made a 5 minute vibe coded extension to run it.
Nothing required just need Google chrome, 16gb RAM, and some disk space. No llama.cpp, no vllm etc. no tinkering (no fun I know).
It’s quite fast and smooth, feels like ~20t/s+ on my laptop without gpu. I have no actual information on how fast though. All handled by chrome. It has 9216 tokens available per session, set by chrome. The model is run in chrome fully local.
Use case…. Um spelling check so google wont know my spelling sucks ? Quick summary of long internet post? Just cute ?
Anyway here is the one click add extension:
Or if you want to tinker a little and don’t want to call it Dobby(the house elf of chrome) here’s the repo:
[link] [comments]
More from r/LocalLLaMA
-
Does GPU spacing matter if we’re undervolting anyways?
May 23
-
Inference provider tiers by Cache-hit rates, using openrouter data
May 23
-
Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else
May 23
-
Any reason to run dense over MOE for RAGs?
May 23
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.