Loss functions in Instance Representation Learning [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
| In Wu et. al, the MLE objective is computationally infeasible due to the high number of images in the dataset. With large n, the denominator in (2) is hard to compute. Therefore, they use NCE (Noise-Contrastive Estimation). Essentially, they approximate the difficult loss in (3) with the easier to compute loss in (7). However, we end up estimating the denominator anyways in (8). Why not just approximate the denominator in (2) with (8)? I asked Claude about this and it said something about it being a biased estimator, but I didn't really get that. I'm also a little confused on the connection of the original NCE formulation as being a way to estimate density and the way it is used here; do we do this because NCE loss is easier to compute and as m (the number of noise samples) increases, we get the gradients of NCE loss and gradients of NLL loss to match? [link] [comments] |
More from r/MachineLearning
-
Price elasticity model [R]
Jun 29
-
Rejected MICCAI paper: workshop -> journal/conference or directly journal/conference [R]
Jun 29
-
I built a demo agricultural planning system with an AI advisor for small-scale farmers in Nicaragua using NASA data [p]
Jun 29
-
I'm trying to implement CALM paper, and I have some questions. [P]
Jun 29
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.