MLmediumLoss functions optimization~15m
Adam vs SGD Generalization Gap
Problem
Adam optimizer typically converges faster than SGD but often generalizes worse on held-out data. Explain the mechanism behind this generalization gap.
Reference solution
Reference solution available after you attempt the question.
Ready to solve it?
Start a session on Mockbit #79. You'll get graded with specific critique when you submit.
Related ML questions
← Back homemockbit.io/q/79