Derivative-Free Neural Network Optimization: MNIST Case [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| A direct optimization test was conducted on a neural network for MNIST image classification. The network features a 784-32-10 architecture with a total of 25,450 continuous parameters (weights and biases). Instead of employing backpropagation or gradient information, the parameters were optimized using MDP, a Derivative-Free Optimization method. The objective was to directly minimize the Cross-Entropy Loss on a subset of 5,000 training images. Final evaluations were performed on independent validation and test sets. In the best run, MDP achieved an objective loss of 0.0004083, a validation accuracy of 93.7%, and a test accuracy of 93.4%. These results outperform the baseline established by Adam, which achieved a final loss of 0.002945, a validation accuracy of 91.8%, and a test accuracy of 91.7% using the same network architecture. Notably, this optimization was successfully performed over a 25,450-dimensional search space, achieving convergence across 1,000,000 function evaluations without relying on gradients or population-based methods. The code for this test, along with other Python implementation examples, is available in the examples folder of the official project repository: [link] [comments] |
More from r/MachineLearning
-
PaddleOCR (v3/v4/v5/v6) implemented in C++ with ncnn [P]
Jun 13
-
is a preprint from an independent researcher worthy of arxiv endorsement if it got cited by a Peking University lab's paper 1 month after release? [D]
Jun 12
-
AMAZON ML SUMMER SCHOOL 2026 [D]
Jun 12
-
Just thinking, what about conducting a 1 day virtual session on fundamentals of computer vision ??? [D]
Jun 12
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.