Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2) 原创

1737-汪同学

发表文章数:61

首页 » 算法 » 正文
                    Today, Knowledge concerning about the optimization of deep learning is written here. What is the meaning of optimaztion? The following ppt shows us the answer.</p> 

Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创

1. SGD with Momentum (SGDM)

Just as the name shows us, SGDM is invented by combining SGD with Momentum. For SGDM, the process of updating parameters is shown in the following ppt. What we should pay attention to, or in the other words what makes us better understand the meanings of SGDM, is

v

i

v^i

vi is actually the weighted sum of all the previous gradient and the closer gradient has more influence on current momentum.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创
What is the adavantage of adding momentum in SGD program? For SGD, it is easy to lead us to local minima point rather than the global minima point. However, adding momentum takes the history information into account, which means, if we explains it in a more vivid way, SGDM offers us the ability to think whether we are standing at the local minima point.
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创

2. Adagrad

It has been introduced in last blog. So, I do not want to explain it again. (I am a little bit lazy, haha).
Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创

3. RMSProp

RMSProp makes a little change on the formula of Adagrad. In Adagrad,

v

t

v_t

vt is the sum of square of the past gradient. However, in RMSProp,

v

t

=

α

v

t

1

+

(

1

α

)

(

g

t

1

)

2

v_t = /alpha v_{t-1}+(1-/alpha)(g_{t-1})^2

vt=αvt1+(1α)(gt1)2. We can changes the value of

α

/alpha

α to make

v

t

1

v_{t-1}

vt1 have more or less influence on the current gradient. In common situation, we always set

α

/alpha

α as a large number with the afraid of a too large

g

t

1

g_{t-1}

gt1 making

η

v

t

/frac{/eta}{/sqrt{v_t}}

vt
η
too close to zero.

Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创

4. Adam

If we ignore some little differences, Adam can be seen as the combination of SGDM and RMSProp. The little change is that we change the form of

m

t

m_t

mt, which can be called as de-biasing. The reason of this change is that the value of

m

t

m_t

mt is too close to zero at the beginning of updating.

Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2)
                    原创

未经允许不得转载:作者:1737-汪同学, 转载或复制请以 超链接形式 并注明出处 拜师资源博客
原文地址:《Notes for Deep Learning Lessons of Pro. Hung-yi Lee (2) 原创》 发布于2021-10-11

分享到:
赞(0) 打赏

评论 抢沙发

评论前必须登录!

  注册



长按图片转发给朋友

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

Vieu3.3主题
专业打造轻量级个人企业风格博客主题!专注于前端开发,全站响应式布局自适应模板。

登录

忘记密码 ?

您也可以使用第三方帐号快捷登录

Q Q 登 录
微 博 登 录