r/LocalLLaMA · · 1 min read

We gave a Reachy Mini a real-time voice brain

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

We gave a Reachy Mini a real-time voice brain

We attended an event the other day and found this little guy lying on our desk, a Reachy Mini from Hugging Face.

It belongs to the daughter of the event organizer. We got curious about how it worked, and an hour later we'd given it a brain.

The model basically becomes Reachy. It hears through its mic, sees through its camera, talks through its speaker, and calls motion tools to physically react while it talks.

Repo: https://github.com/opper-ai/reachy-voice-realtime

Key things:

  • Web UI to watch the camera feed, transcript, and tool calls live.
  • 19 motion and perception tools the model calls mid-conversation (emotes, head/antenna/body movement, camera, sound direction).
  • Mimics you, wave and it waves back, nod and it nods, tilt your head and it tilts.
  • Runs on GPT Realtime 2, routed through Opper so the model is a one-line swap.
  • The realtime client and tool layer are separate, so you can also wire it straight to a provider or a local/OS realtime model.

Setup's in the README (Python 3.12+), MIT licensed.

We handed it back to his daugther so now she can finally talk to her robot.

submitted by /u/facethef
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA