Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.
Computer Science > Artificial Intelligence
Title:Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Abstract:We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English, and normalise the raw ingredient strings to 1,790 canonical entries via an LLM-augmented pipeline. A 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph, 2,247 typed compound nodes across 15 categories, seed three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both via injected ingredient-ingredient walks at controlled mixing, placing each model at a distinct point on the chemistry-vs-recipe-context spectrum.
| Subjects: | Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY) |
| Cite as: | arXiv:2605.22391 [cs.AI] |
| (or arXiv:2605.22391v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.22391
arXiv-issued DOI via DataCite (pending registration)
|
Submission history
From: Josef Liyanjun Chen [view email][v1] Thu, 21 May 2026 12:23:38 UTC (6,566 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
Ancillary files (details):
- csv/direction_orthogonal.csv
- csv/factor_top_alignments_ica_chem_n20.csv
- csv/factor_top_alignments_ica_cooc_n20.csv
- csv/factor_top_alignments_ica_core_n20.csv
- csv/linear_probe.csv
- csv/linear_probe_continuous.csv
- csv/mode_atlas_chem.csv
- csv/mode_atlas_cooc.csv
- csv/mode_atlas_core.csv
- csv/procrustes_sensory.csv
- csv/weat.csv
- epicure_chem.csv
- epicure_cooc.csv
- epicure_core.csv
- supplement.pdf
- vocab.csv
Current browse context:
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — NLP / Computation & Language
-
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
May 22
-
Broadening Access to Transportation Safety Data with Generative AI: A Schema-Grounded Framework for Spatial Natural Language Queries
May 22
-
Sem-Detect: Semantic Level Detection of AI Generated Peer-Reviews
May 22
-
Probabilistic Attribution For Large Language Models
May 22
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.