Embeddings for NVIDIA's Nemotron Personas
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
I extracted embedding vectors for nvidia/Nemotron-Personas dataset.
It's an incredible resource consisting of millions of synthetic personas with detailed backgrounds (names, ages, occupations, hobbies, and more), but finding specific personas or clustering them is difficult. To solve this, I used Qwen 0.6B to compute embeddings. While 0.6B is lightweight, it works perfectly for running semantic searches or finding K-Nearest Neighbors to build out persona groups.
You can find the precomputed embedding vectors (Korea, Japan, France, USA). Please check out web demo.
- Dataset: https://huggingface.co/collections/tantara/nemotron-personas-embedding
- Web Demo: https://www.microworld.dev/
Let me know what you think or if you end up using it for any of your local agent projects!
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.