1495-冯同学

, ,

# 线性回归

## 最大似然估计MLE

y

(

i

)

=

θ

T

x

(

i

)

+

ε

(

i

)

y^{(i)}=/theta^{T} x^{(i)}+/varepsilon^{(i)}

y(i)=θTx(i)+ε(i)

p

(

ϵ

(

i

)

)

=

1

2

π

σ

exp

(

(

ϵ

(

i

)

)

2

2

σ

2

)

p/left(/epsilon^{(i)}/right)=/frac{1}{/sqrt{2 /pi} /sigma} /exp /left(-/frac{/left(/epsilon^{(i)}/right)^{2}}{2 /sigma^{2}}/right)

p(ϵ(i))=2π
σ
1
exp(2σ2(ϵ(i))2)

p

(

y

(

i

)

x

(

i

)

;

θ

)

=

1

2

π

σ

exp

(

(

y

(

i

)

θ

T

x

(

i

)

)

2

2

σ

2

)

p/left(y^{(i)} /mid x^{(i)} ; /theta/right)=/frac{1}{/sqrt{2 /pi} /sigma} /exp /left(-/frac{/left(y^{(i)}-/theta^{T} x^{(i)}/right)^{2}}{2 /sigma^{2}}/right)

p(y(i)x(i);θ)=2π
σ
1
exp(2σ2(y(i)θTx(i))2)

L

(

θ

)

=

i

=

1

m

p

(

y

(

i

)

x

(

i

)

;

θ

)

L(/theta)=/prod_{i=1}^{m} p/left(y^{(i)} /mid x^{(i)} ; /theta/right)

L(θ)=i=1mp(y(i)x(i);θ)

=

i

=

1

m

1

2

π

σ

exp

(

(

y

(

i

)

θ

T

x

(

i

)

)

2

2

σ

2

)

=/prod_{i=1}^{m} /frac{1}{/sqrt{2 /pi} /sigma} /exp /left(-/frac{/left(y^{(i)}-/theta^{T} x^{(i)}/right)^{2}}{2 /sigma^{2}}/right)

=i=1m2π
σ
1
exp(2σ2(y(i)θTx(i))2)

## 高斯对数似然与最小二乘

(

θ

)

=

log

L

(

θ

)

=

log

i

=

1

m

1

2

π

σ

exp

(

(

y

(

i

)

θ

T

x

(

i

)

)

2

2

σ

2

)

=

i

=

1

m

log

1

2

π

σ

exp

(

(

y

(

i

)

θ

T

x

(

i

)

)

2

2

σ

2

)

=

m

log

1

2

π

σ

1

σ

2

1

2

i

=

1

m

(

y

(

i

)

θ

T

x

(

i

)

)

2

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

2

/begin{aligned} /ell(/theta) &=/log L(/theta) // &=/log /prod_{i=1}^{m} /frac{1}{/sqrt{2 /pi} /sigma} /exp /left(-/frac{/left(y^{(i)}-/theta^{T} x^{(i)}/right)^{2}}{2 /sigma^{2}}/right) // &=/sum_{i=1}^{m} /log /frac{1}{/sqrt{2 /pi} /sigma} /exp /left(-/frac{/left(y^{(i)}-/theta^{T} x^{(i)}/right)^{2}}{2 /sigma^{2}}/right) // &=m /log /frac{1}{/sqrt{2 /pi} /sigma}-/frac{1}{/sigma^{2}} /cdot /frac{1}{2} /sum_{i=1}^{m}/left(y^{(i)}-/theta^{T} x^{(i)}/right)^{2} // & J(/theta)=/frac{1}{2} /sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right)^{2} /end{aligned}

(θ)=logL(θ)=logi=1m2π
σ
1
exp(2σ2(y(i)θTx(i))2)
=i=1mlog2π
σ
1
exp(2σ2(y(i)θTx(i))2)
=mlog2π
σ
1
σ2121i=1m(y(i)θTx(i))2
J(θ)=21i=1m(hθ(x(i))y(i))2

## θ /theta 的解析式的求解过程

X的每一行对应一个样本, 共M个样本(measurements)
X的每一列对应样本的一个维度，共N维(regressors)

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

2

=

1

2

(

X

θ

y

)

T

(

X

θ

y

)

J(θ)=21i=1m(hθ(x(i))y(i))2=21(Xθy)T(Xθy)

:

θ

J

(

θ

)

=

θ

(

1

2

(

X

θ

y

)

T

(

X

θ

y

)

)

=

θ

(

1

2

(

θ

T

X

T

y

T

)

(

X

θ

y

)

)

: /quad /nabla_{/theta} J(/theta)=/nabla_{/theta}/left(/frac{1}{2}(X /theta-y)^{T}(X /theta-y)/right)=/nabla_{/theta}/left(/frac{1}{2}/left(/theta^{T} X^{T}-y^{T}/right)(X /theta-y)/right)

:θJ(θ)=θ(21(Xθy)T(Xθy))=θ(21(θTXTyT)(Xθy))

=

θ

(

1

2

(

θ

T

X

T

X

θ

θ

T

X

T

y

y

T

X

θ

+

y

T

y

)

)

=

1

2

(

2

X

T

X

θ

X

T

y

(

y

T

X

)

T

)

=

X

T

X

θ

X

T

y

0

/begin{array}{l} =/nabla_{/theta}/left(/frac{1}{2}/left(/theta^{T} X^{T} X /theta-/theta^{T} X^{T} y-y^{T} X /theta+y^{T} y/right)/right) // =/frac{1}{2}/left(2 X^{T} X /theta-X^{T} y-/left(y^{T} X/right)^{T}/right)=X^{T} X /theta-X^{T} y /longrightarrow /text 0（求驻点） /end{array}

=θ(21(θTXTXθθTXTyyTXθ+yTy))=21(2XTXθXTy(yTX)T)=XTXθXTy0

## 最小二乘法意义下的参数最优解

θ

=

(

X

T

X

)

1

X

T

y

/theta=/left(X^{T} X/right)^{-1} X^{T} y

θ=(XTX)1XTy

X

T

X

X^{T} X

XTX不可逆或防止过拟合，增加

λ

/lambda

λ扰动

θ

=

(

X

T

X

+

λ

I

)

1

X

T

y

/theta=/left(X^{T} X+/lambda I/right)^{-1} X^{T} y

θ=(XTX+λI)1XTy

X

θ

=

y

X

T

X

θ

=

X

T

y

θ

=

(

X

T

X

)

1

X

T

y

/begin{array}{l} X /theta=y /Rightarrow X^{T} X /theta=X^{T} y // /Rightarrow /theta=/left(X^{T} X/right)^{-1} X^{T} y /end{array}

Xθ=yXTXθ=XTyθ=(XTX)1XTy

## 正则与防止过拟合

{

λ

>

0

ρ

[

0

,

1

]

/left/{/begin{array}{l}/lambda>0 // /rho /in[0,1]/end{array}/right.

{λ>0ρ[0,1]

/square

L2-norm:

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

2

+

λ

j

=

1

n

θ

j

2

J(θ
)=
21i=1m(hθ
(x(i))y(i))
2
+
λj=1nθj2

L

1

/square /mathrm{L} 1

L1 -norm :

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

2

+

λ

j

=

1

n

θ

j

J(θ
)=
21i=1m(hθ
(x(i))y(i))
2
+
λj=1nθj

/square

Elastic Net

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

ˉ

(

x

(

i

)

)

y

(

i

)

)

2

+

λ

(

ρ

j

=

1

n

θ

j

+

(

1

ρ

)

j

=

1

n

θ

j

2

)

J(/vec{/theta})=/frac{1}{2} /sum_{i=1}^{m}/left(h_{/bar{/theta}}/left(x^{(i)}/right)-y^{(i)}/right)^{2}+/lambda/left(/rho /cdot /sum_{j=1}^{n}/left|/theta_{j}/right|+(1-/rho) /cdot /sum_{j=1}^{n} /theta_{j}^{2}/right)

J(θ
)=
21i=1m(hθˉ(x(i))y(i))2+λ(ρj=1nθj+(1ρ)j=1nθj2)

## Moore-Penrose广义逆矩阵(伪逆)

x

=

(

A

T

A

)

1

A

T

b

x=/left(A^{T} A/right)^{-1} A^{T} /cdot b

x=(ATA)1ATb, 从方程解的直观意义上, 可以定义:

A

+

=

(

A

T

A

)

1

A

T

A^{+}=/left(A^{T} A/right)^{-1} A^{T}

A+=(ATA)1AT

A

+

=

(

A

T

A

)

1

A

T

A^{+}=/left(A^{T} A/right)^{-1} A^{T}

A+=(ATA)1AT 即为

A

1

A^{-1}

A1

(

A

T

A

)

1

A

T

=

A

1

(

A

T

)

1

A

T

=

A

1

/left(A^{T} A/right)^{-1} A^{T}=A^{-1}/left(A^{T}/right)^{-1} A^{T}=A^{-1}

(ATA)1AT=A1(AT)1AT=A1

/square

A

+

A^+

A+称为

A

A

A的广义逆 (违逆)。

### SVD计算矩阵的广义逆

/square

×

n

/times /mathrm{n}

×n 的 矩阵A, 若它的SVD分解为:

A

=

U

Σ

V

T

A=U /cdot /Sigma /cdot V^{T}

A=UΣVT

/square

A

/mathrm{A}

A 的广义逆为

:

A

+

=

V

Σ

1

U

T

: A^{+}=V /cdot /Sigma^{-1} /cdot U^{T}

:A+=VΣ1UT

×

n

/times /mathrm{n}

×n 的可逆阵， 则

A

A

+

=

I

A /cdot A^{+}=I

AA+=I

m

n

m/neq n

m=n, 则

A

+

=

(

A

T

A

)

1

A

A^{+}=/left(A^{T} /cdot A/right)^{-1} A

A+=(ATA)1A

## 梯度下降法

J

(

θ

)

=

1

2

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

2

J(/theta)=/frac{1}{2} /sum_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)-y^{(i)}/right)^{2}

J(θ)=21i=1m(hθ(x(i))y(i))2

• 初始化

θ

/theta

(随机初始化)

• 沿着负梯度方向迭代， 更新后的

θ

/theta

使

J

(

θ

)

J(/theta)

盒小

θ

=

θ

α

J

(

θ

)

θ

/theta=/theta-/alpha /cdot /frac{/partial J(/theta)}{/partial /theta}

α

:

/alpha:

学习率、步长

### 梯度方向

θ

j

J

(

θ

)

=

θ

j

1

2

(

h

θ

(

x

)

y

)

2

=

2

1

2

(

h

θ

(

x

)

y

)

θ

j

(

h

θ

(

x

)

y

)

=

(

h

θ

(

x

)

y

)

θ

j

(

i

=

0

n

θ

i

x

i

y

)

=

(

h

θ

(

x

)

y

)

x

j

/begin{aligned} /frac{/partial}{/partial /theta_{j}} J(/theta) &=/frac{/partial}{/partial /theta_{j}} /frac{1}{2}/left(h_{/theta}(x)-y/right)^{2} // &=2 /cdot /frac{1}{2}/left(h_{/theta}(x)-y/right) /cdot /frac{/partial}{/partial /theta_{j}}/left(h_{/theta}(x)-y/right) // &=/left(h_{/theta}(x)-y/right) /cdot /frac{/partial}{/partial /theta_{j}}/left(/sum_{i=0}^{n} /theta_{i} x_{i}-y/right) // &=/left(h_{/theta}(x)-y/right) x_{j} /end{aligned}

θjJ(θ)=θj21(hθ(x)y)2=221(hθ(x)y)θj(hθ(x)y)=(hθ(x)y)θj(i=0nθixiy)=(hθ(x)y)xj

• 批量梯度下降法
Repeat until convergence {

θ

j

:

=

θ

j

+

α

i

=

1

m

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

i

)

/theta_{j}:=/theta_{j}+/alpha /sum_{i=1}^{m}/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x_{j}^{(i)}

θj:=θj+αi=1m(y(i)hθ(x(i)))xj(i)
}

• 随机梯度下降法
Loop
for

i

=

1

/mathrm{i}=1

i=1 to

m

,

{

/mathrm{m},/{

m,{

θ

j

:

=

θ

j

+

α

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

/theta_{j}:=/theta_{j}+/alpha/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x

θj:=θj+α(y(i)hθ(x(i)))x
}

• mini-batch
Repeat until convergence{

θ

j

:

=

θ

j

+

α

i

=

1

m

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

/theta_{j}:=/theta_{j}+/alpha /sum_{i=1}^{m}/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x_{j}^{(}

θj:=θj+αi=1m(y(i)hθ(x(i)))xj(
}

Loop{
for

i

=

1

/mathrm{i}=1

i=1 to

m

,

{

/mathrm{m},/{

m,{

θ

j

:

=

θ

j

+

α

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

i

)

/theta_{j}:=/theta_{j}+/alpha/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x_{j}^{(i)}

θj:=θj+α(y(i)hθ(x(i)))xj(i)
}
}

# Logistic回归

h

θ

(

x

)

=

g

(

θ

T

x

)

=

1

1

+

e

θ

T

x

hθ(x)=g(θTx)=1+eθTx1

g

(

x

)

=

(

1

1

+

e

x

)

=

e

x

(

1

+

e

x

)

2

g^{/prime}(x)=/left(/frac{1}{1+e^{-x}}/right)^{/prime}=/frac{e^{-x}}{/left(1+e^{-x}/right)^{2}}

g(x)=(1+ex1)=(1+ex)2ex

=

1

1

+

e

x

e

x

1

+

e

x

=

1

1

+

e

x

(

1

1

1

+

e

x

)

=/frac{1}{1+e^{-x}} /cdot /frac{e^{-x}}{1+e^{-x}}=/frac{1}{1+e^{-x}} /cdot/left(1-/frac{1}{1+e^{-x}}/right)

=1+ex11+exex=1+ex1(11+ex1)

=

g

(

x

)

(

1

g

(

x

)

)

=g(x) /cdot(1-g(x))

=g(x)(1g(x))

## Logistic回归参数估计

/square

P

(

y

=

1

x

;

θ

)

=

h

θ

(

x

)

P

(

y

=

0

x

;

θ

)

=

1

h

θ

(

x

)

/begin{aligned} P(y=1 /mid x ; /theta) &=h_{/theta}(x) // P(y=0 /mid x ; /theta) &=1-h_{/theta}(x) /end{aligned}

P(y=1x;θ)P(y=0x;θ)=hθ(x)=1hθ(x)

p

(

y

x

;

θ

)

=

(

h

θ

(

x

)

)

y

(

1

h

θ

(

x

)

)

1

y

L

(

θ

)

=

p

(

y

X

;

θ

)

=

i

=

1

m

p

(

y

(

i

)

x

(

i

)

;

θ

)

=

i

=

1

m

(

h

θ

(

x

(

i

)

)

)

y

(

i

)

(

1

h

θ

(

x

(

i

)

)

)

1

y

(

i

)

/begin{aligned} p(y /mid x ; /theta) &=/left(h_{/theta}(x)/right)^{y}/left(1-h_{/theta}(x)/right)^{1-y} // L(/theta) &=p(/vec{y} /mid X ; /theta) // &=/prod_{i=1}^{m} p/left(y^{(i)} /mid x^{(i)} ; /theta/right) // &=/prod_{i=1}^{m}/left(h_{/theta}/left(x^{(i)}/right)/right)^{y^{(i)}}/left(1-h_{/theta}/left(x^{(i)}/right)/right)^{1-y^{(i)}} /end{aligned}

p(yx;θ)L(θ)=(hθ(x))y(1hθ(x))1y=p(y
X;θ)
=i=1mp(y(i)x(i);θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)

l

(

θ

)

=

log

L

(

θ

)

=

m

y

(

i

)

log

h

(

x

(

i

)

)

+

(

1

y

(

i

)

)

log

(

1

h

(

x

(

i

)

)

)

l(/theta)=/log L(/theta)=/sum^{m} y^{(i)} /log h/left(x^{(i)}/right)+/left(1-y^{(i)}/right) /log /left(1-h/left(x^{(i)}/right)/right)

l(θ)=logL(θ)=my(i)logh(x(i))+(1y(i))log(1h(x(i)))

l

(

θ

)

θ

j

=

i

=

1

m

(

y

(

i

)

h

(

x

(

i

)

)

1

y

(

i

)

1

h

(

x

(

i

)

)

)

h

(

x

(

i

)

)

θ

j

/frac{/partial l(/theta)}{/partial /theta_{j}}=/sum_{i=1}^{m}/left(/frac{y^{(i)}}{h/left(x^{(i)}/right)}-/frac{1-y^{(i)}}{1-h/left(x^{(i)}/right)}/right) /cdot /frac{/partial h/left(x^{(i)}/right)}{/partial /theta_{j}}

θjl(θ)=i=1m(h(x(i))y(i)1h(x(i))1y(i))θjh(x(i))

=

i

=

1

m

(

y

(

i

)

g

(

θ

T

x

(

i

)

)

1

y

(

i

)

1

g

(

θ

T

x

(

i

)

)

)

g

(

θ

T

x

(

i

)

)

θ

j

=/sum_{i=1}^{m}/left(/frac{y^{(i)}}{g/left(/theta^{T} x^{(i)}/right)}-/frac{1-y^{(i)}}{1-g/left(/theta^{T} x^{(i)}/right)}/right) /cdot /frac{/partial g/left(/theta^{T} x^{(i)}/right)}{/partial /theta_{j}}

=i=1m(g(θTx(i))y(i)1g(θTx(i))1y(i))θjg(θTx(i))

=

i

=

1

m

(

y

(

i

)

g

(

θ

T

x

(

i

)

)

1

y

(

i

)

1

g

(

θ

T

x

(

i

)

)

)

g

(

θ

T

x

(

i

)

)

(

1

g

(

θ

T

x

(

i

)

)

)

θ

T

x

(

i

)

θ

j

=/sum_{i=1}^{m}/left(/frac{y^{(i)}}{g/left(/theta^{T} x^{(i)}/right)}-/frac{1-y^{(i)}}{1-g/left(/theta^{T} x^{(i)}/right)}/right) /cdot g/left(/theta^{T} x^{(i)}/right) /cdot/left(1-g/left(/theta^{T} x^{(i)}/right)/right) /cdot /frac{/partial /theta^{T} x^{(i)}}{/partial /theta_{j}}

=i=1m(g(θTx(i))y(i)1g(θTx(i))1y(i))g(θTx(i))(1g(θTx(i)))θjθTx(i)

=

i

=

1

m

(

y

(

i

)

(

1

g

(

θ

T

x

(

i

)

)

)

(

1

y

(

i

)

)

g

(

θ

T

x

(

i

)

)

)

x

j

(

i

)

=/sum_{i=1}^{m}/left(y^{(i)}/left(1-g/left(/theta^{T} x^{(i)}/right)/right)-/left(1-y^{(i)}/right) g/left(/theta^{T} x^{(i)}/right)/right) /cdot x_{j}^{(i)}

=i=1m(y(i)(1g(θTx(i)))(1y(i))g(θTx(i)))xj(i)

=

i

=

1

m

(

y

(

i

)

g

(

θ

T

x

(

i

)

)

)

x

j

(

i

)

=/sum_{i=1}^{m}/left(y^{(i)}-g/left(/theta^{T} x^{(i)}/right)/right) /cdot x_{j}^{(i)}

=i=1m(y(i)g(θTx(i)))xj(i)

### Logistic回归参数的学习规则：

θ

j

:

=

θ

j

+

α

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

i

)

/theta_{j}:=/theta_{j}+/alpha/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x_{j}^{(i)}

θj:=θj+α(y(i)hθ(x(i)))xj(i)

Repeat until convergence{

θ

j

:

=

θ

j

+

α

i

=

1

m

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

i

)

/theta_{j}:=/theta_{j}+/alpha /sum_{i=1}^{m}/left(y^{(i)}-h_{/theta}/left(x^{(i)}/right)/right) x_{j}^{(i)}

θj:=θj+αi=1m(y(i)hθ(x(i)))xj(i)
}

Loop { for

i

=

1

/mathrm{i}=1

i=1 to

m

,

{

/mathrm{m}, /{

m,{

θ

j

:

=

θ

j

+

α

(

y

(

i

)

h

θ

(

x

(

i

)

)

)

x

j

(

i

)

θj:=θj+α(y(i)hθ(x(i)))xj(i)
}
}

Vieu3.3主题

Q Q 登 录