r/MachineLearning · June 15, 2026 · 1 min read

Concept-Vector: A design framework for human-interpretable word embeddings [P]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

This project distills a model's word embeddings into human-interpretable "concept-vectors", i.e. vectors in which each component tracks concerns like semantics, syntax, and even statistics potentially, while associating each component with a human readable and human definable label. These distilled vector components are then joined with undefined trainable components then passed to a model.

Check the readme/repo and supporting docs for details.

For transparency, this is a data design project. I have quite a bit of experience with data transformation and manipulation, but limited experience with NNs. I have not tested this on models, and I currently don't have the resources to build a comprehensive database to test it on models. I'm posting primarily for human feedback/criticism, and simply to share the idea since this is as far as I can currently take it.

Concept-Vector: A design framework for human-interpretable word embeddings [P]

Discussion (0)

More from r/MachineLearning