This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.07116\">Stabilized neural Hamilton--Jacobi--Bellman solvers: Error analysis and applications in model-based reinforcement learning</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.07762\">Generative optimal transport via forward-backward HJB matching</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.18566\">HJ-Gauss: A Monte-Carlo HJ Reachability Scheme</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.10191\">Policy Iteration for Stationary Discounted Hamilton--Jacobi--Bellman Equations: A Viscosity Approach</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.07157\">Learned Lagrangian Models of PDEs via Euler-Lagrange Residual Minimization</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2604.11829\">Learning on the Temporal Tangent Bundle for Physics-Informed Neural Networks</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.09058\">Nonlinear GENERIC Informed Neural Networks (N-GINNs): learning GENERIC dynamics with non-quadratic dissipation potentials</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{"user":"librarian-bot"}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-06-03T01:59:03.045Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":360,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7347700595855713},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.28983","authors":[{"_id":"6a1ea90e808ddbc3c7d43ff3","name":"Jose Marie Antonio Miñoza","hidden":false},{"_id":"6a1ea90e808ddbc3c7d43ff4","name":"Erika Fille T. Legara","hidden":false},{"_id":"6a1ea90e808ddbc3c7d43ff5","name":"Christopher P. Monterola","hidden":false}],"publishedAt":"2026-05-27T00:00:00.000Z","submittedOnDailyAt":"2026-06-02T00:00:00.000Z","title":"The Hamilton-Jacobi Theory of Deep Learning","submittedOnDailyBy":{"_id":"647629b2a4a46a7bba1b70c3","avatarUrl":"/avatars/bf87597b02319f31651447e1123a1fec.svg","isPro":false,"fullname":"Jose Marie Antonio Minoza","user":"jomaminoza","type":"user","name":"jomaminoza"},"summary":"In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.","upvotes":1,"discussionId":"6a1ea90e808ddbc3c7d43ff6","ai_summary":"Neural network training is formulated as a Hamilton--Jacobi initial-value problem where gradient steps correspond to solving viscous Hamilton--Jacobi equations, with connections to residual networks, transformers, and RNNs through shared mathematical structures.","ai_keywords":["Hamilton--Jacobi initial-value problems","viscous Hamilton--Jacobi equation","Hopf--Cole propagator","log-sum-exp layers","residual networks","transformers","recurrent architectures","RNNs","LSTMs","SSMs","tropical algebra","convex optimization","minimax optimal generalization rate","adversarial robustness","backpropagation","co-state equation","Pontryagin Maximum Principle","PDE quadrature","influence function","softmax attribution weights","entropy landscape","fold bifurcations"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"673dfcb072711222f01e29fd","name":"cair-ph","fullname":"Center for AI Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/647629b2a4a46a7bba1b70c3/-cCnBfB_4QNfFt6xqEtv9.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"69ccc93092e44910c0000aef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/mi2JPBKB8k5sCqQ2jP4bh.png","isPro":false,"fullname":"James Martin","user":"hansh6","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"673dfcb072711222f01e29fd","name":"cair-ph","fullname":"Center for AI Research","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/647629b2a4a46a7bba1b70c3/-cCnBfB_4QNfFt6xqEtv9.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.28983.md"}">
The Hamilton-Jacobi Theory of Deep Learning
Abstract
Neural network training is formulated as a Hamilton--Jacobi initial-value problem where gradient steps correspond to solving viscous Hamilton--Jacobi equations, with connections to residual networks, transformers, and RNNs through shared mathematical structures.
In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.
Community
This comment has been hidden This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.28983 in a model README.md to link it from this page.
Cite arxiv.org/abs/2605.28983 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2605.28983 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.