Watch local LLMs escape the rooms you design
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
| Hello! I'd like to share my repo for WATCH MY ESCAPE: https://github.com/cjami/watch-my-escape It's an inverted escape room game where you design the maps and LLMs have to try to escape them. It uses traditional action verbs (e.g. push, pull, pick-up) to interact with the visible environment, just like classic adventure games. There are currently 5 model presets (downloads when running an escape with them):
All are at Q4_K_M so should fit in about 8GB of VRAM. Tested on a 4090, 3070 and a M1. You can easily configure it for any model on HF by changing values in the config file: https://github.com/cjami/watch-my-escape/blob/main/src/watch_my_escape/llm/config.py It features a fully kitted map editor as well so you can create whatever you want and test models on them. It is completely font-based so you can use whatever emojis are available to represent objects. Also supports import/export via JSON. The main technique used here is splitting the agent's action into two steps: 'Think then Act' - having a free reasoning step followed by a grammar constrained action step via llama.cpp. This allows us to use small models reliably within a game environment with structured output. Note: they are not spatially reasoning, but just moving from one visible object to another (would overwhelm small models otherwise). Quick setup (need uv and node.js installed): It should then auto-detect and install the appropriate llama-cpp-python wheel for your hardware (metal, cuda, vulkan, cpu or rocm via override) during setup. This was created over a week for the 'Build Small' hackathon by Hugging Face x Gradio. Use it to try out different LLMs or make your own personal benchmarks! Hopefully this also provides a glimpse into how LLMs can be used in future games :) [link] [comments] |
More from r/LocalLLaMA
-
Been running Qwen3.6-27B through a 3-critic harness. The harness matters more than I thought
Jun 30
-
I Hate Dario Amodei, and everything he stands for.
Jun 29
-
Introducing LongCat-2.0 - , a large-scale MoE language model with 1.6 trillion total parameters and ~48 billion activated per token. This was the stealth model that was on Openrouter under the name 'owl-alpha'.
Jun 29
-
Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.