https://jeremy9959.net/Math-3094-Spring-2021/published_notes/notes/GD.html
Gradient descent optimization algorithms:
- Momentum
- Nesterov accelerated gradient
- Adagrad
- Adadelta
- RMSprop
- Adam
- AdaMax
- Nadam
- AMSGrad
https://rasbt.github.io/mlxtend/user_guide/general_concepts/gradient-optimization/
https://ruder.io/optimizing-gradient-descent/
https://johnchenresearch.github.io/demon/