I did what Microsoft wouldn't - updated POML VS Code extension
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
What's a POML?
Microsoft came up with this really cool HTML style mark-up language that allows you to make modular prompt templates, with all sorts of neat features like local AI support via OpenAI API, setting runtime parameters for your LLM, and embedding documents into the prompt.
You could even send the prompt directly to your LLM via the VS Code extension.
What happened to it?
I don't fucking know.
They supported it for 2-3 months, then ghosted when it didn't hit KPIs or something, I guess.
Then a VS Code or dependency update exposed a bug in how they handled />, which is actually fairly common in POML when you embed documents. This broke the ability to directly send prompts to the LLM - you could copy them out of the preview, but it was slower and less efficient.
What I did
I used OpenCode (which doesn't get enough play here - I only found out about it because someone posted a repo for an extension to it) and the opencode-power-pack (said extension) to try to find the bug and update some of the more egregiously outdated dependencies.
It took me a couple of days to get working, mostly because I wound up breaking the preview panel after updating some of the dependencies. That only showed up when I compiled to VSIX, instead of extension debug mode.
Who should use this?
- Prompt/agent experimenters
- People who want to write/edit with LLMs
- People who have lots of prompts that reuse common elements
Local AI Pointers
- Open up VS Code
Settingsmenu and searchPOML. - Set your
ProvidertoOpenAI Chat Completion. - Set your API target URL.
- You need to set the
API Key, even if your server doesn't use one. - Set a default model and temperature. (These can be overridden in your POML file.)
- Set
Tracetoverbose, as that gives you useful data to for troubleshooting.
Things I MIGHT do
- Add support for LM Studio and Lemonade as providers
- Incorporate TOC-based dynamic loading
[link] [comments]
More from r/LocalLLaMA
-
How small can the orchestration model in an agent be? (separating it from code-gen — that obviously wants a big model)
May 22
-
BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
May 22
-
trained a prompt injection detector using ml-intern and DeepSeek v4 Flash, runs in the browser
May 22
-
ByteShape Qwen3.6-35B-A3B: 30% faster than Unsloth IQ on 6GB VRAM laptop
May 22
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.