Hugging Face Daily Papers · · 4 min read

The Data Manifold under the Microscope

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce Manifold Microscope, a controlled benchmark for studying data-manifold geometry with finite-difference estimates of curvature, reach, and volume on grid-sampled image manifolds.</p>\n","updatedAt":"2026-06-19T15:00:32.126Z","author":{"_id":"6a35145664ab30f0167a9071","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a35145664ab30f0167a9071/zB1_y2PtMJNL_MUTHdkuZ.jpeg","fullname":"Marios Koulakis","name":"koulakis","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8436585664749146},"editors":["koulakis"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6a35145664ab30f0167a9071/zB1_y2PtMJNL_MUTHdkuZ.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.15760","authors":[{"_id":"6a351864156f0a50f94c1b28","user":{"_id":"6a35145664ab30f0167a9071","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a35145664ab30f0167a9071/zB1_y2PtMJNL_MUTHdkuZ.jpeg","isPro":false,"fullname":"Marios Koulakis","user":"koulakis","type":"user","name":"koulakis"},"name":"Marios Koulakis","status":"claimed_verified","statusLastChangedAt":"2026-06-19T14:19:12.085Z","hidden":false},{"_id":"6a351864156f0a50f94c1b29","name":"Constantin Seibold","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6a35145664ab30f0167a9071/6O8vrURLTUwOuf31E7v5P.png"],"publishedAt":"2026-06-14T00:00:00.000Z","submittedOnDailyAt":"2026-06-19T00:00:00.000Z","title":"The Data Manifold under the Microscope","submittedOnDailyBy":{"_id":"6a35145664ab30f0167a9071","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6a35145664ab30f0167a9071/zB1_y2PtMJNL_MUTHdkuZ.jpeg","isPro":false,"fullname":"Marios Koulakis","user":"koulakis","type":"user","name":"koulakis"},"summary":"A significant gap exists between theory and practice in deep learning. Generalization and approximation error bounds are often derived for simplified models or are too loose to be informative. Many rely on the manifold hypothesis and on geometric regularity such as intrinsic dimension, curvature, and reach. Progress requires insight into data-manifold geometry and suitable benchmarks, yet existing options are polarized: analytic manifolds with known geometry but limited applicability, or real-world datasets where geometry is only coarsely estimable. We introduce a benchmarking framework for studying data geometry. We repurpose and extend dSprites and COIL-20 with additional transformation dimensions and dense, axis-aligned sampling, and pair them with finite-difference estimators that recover curvature, reach, and volume at near-ground-truth accuracy in a regime where general-purpose estimators are unreliable or difficult to deploy. The framework is intended as a controlled testbed, useful as a calibration environment for geometric estimators and a sandbox for probing theoretical assumptions. To illustrate its use, we present two application studies, namely assessing the scaling behavior of the bounds of Genovese et al. and Fefferman et al., and tracking the layer-wise geometry of a β-VAE, highlighting the behavior of current bounds and the value of controlled benchmarks for guiding and validating future theory.\n A reference implementation is available at https://github.com/koulakis/manifold-microscope.","upvotes":0,"discussionId":"6a351864156f0a50f94c1b2a","githubRepo":"https://github.com/koulakis/manifold-microscope","githubRepoAddedBy":"user","ai_summary":"A benchmarking framework is introduced to study data-manifold geometry by extending dSprites and COIL-20 datasets with additional transformation dimensions and dense sampling, enabling accurate estimation of curvature, reach, and volume for theoretical analysis and validation.","ai_keywords":["manifold hypothesis","intrinsic dimension","curvature","reach","volume","dSprites","COIL-20","finite-difference estimators","geometric regularity","β-VAE","generalization bounds","approximation error bounds"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.15760.md","query":{}}">
Papers
arxiv:2606.15760

The Data Manifold under the Microscope

Published on Jun 14
· Submitted by
Marios Koulakis
on Jun 19
Authors:

Abstract

A benchmarking framework is introduced to study data-manifold geometry by extending dSprites and COIL-20 datasets with additional transformation dimensions and dense sampling, enabling accurate estimation of curvature, reach, and volume for theoretical analysis and validation.

A significant gap exists between theory and practice in deep learning. Generalization and approximation error bounds are often derived for simplified models or are too loose to be informative. Many rely on the manifold hypothesis and on geometric regularity such as intrinsic dimension, curvature, and reach. Progress requires insight into data-manifold geometry and suitable benchmarks, yet existing options are polarized: analytic manifolds with known geometry but limited applicability, or real-world datasets where geometry is only coarsely estimable. We introduce a benchmarking framework for studying data geometry. We repurpose and extend dSprites and COIL-20 with additional transformation dimensions and dense, axis-aligned sampling, and pair them with finite-difference estimators that recover curvature, reach, and volume at near-ground-truth accuracy in a regime where general-purpose estimators are unreliable or difficult to deploy. The framework is intended as a controlled testbed, useful as a calibration environment for geometric estimators and a sandbox for probing theoretical assumptions. To illustrate its use, we present two application studies, namely assessing the scaling behavior of the bounds of Genovese et al. and Fefferman et al., and tracking the layer-wise geometry of a β-VAE, highlighting the behavior of current bounds and the value of controlled benchmarks for guiding and validating future theory. A reference implementation is available at https://github.com/koulakis/manifold-microscope.

Community

Paper author Paper submitter about 7 hours ago

We introduce Manifold Microscope, a controlled benchmark for studying data-manifold geometry with finite-difference estimates of curvature, reach, and volume on grid-sampled image manifolds.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.15760
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.15760 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.15760 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.15760 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers