MLmediumLoss functions optimization~15m

Adam vs SGD Generalization Gap

Name: Mockbit
Price: 9.99 USD

Problem

Adam optimizer typically converges faster than SGD but often generalizes worse on held-out data. Explain the mechanism behind this generalization gap.

Reference solution

Reference solution available after you attempt the question.

Ready to solve it?

Start a session on Mockbit #79. You'll get graded with specific critique when you submit.