机器学习进阶(一)回归

1187-吴同学

发表文章数:38

首页 » 算法 » 正文

线性回归

  1. 目标函数

    J

    (

    θ

    )

    =

    1

    2

    i

    =

    1

    n

    (

    h

    θ

    (

    x

    (

    i

    )

    )

    y

    (

    i

    )

    )

    2

    =

    1

    2

    (

    X

    θ

    y

    )

    T

    (

    X

    θ

    y

    )

    J(/theta)=/frac{1}{2}/sum_{i=1}^n(h_{/theta}(x^{(i)})-y^{(i)})^2=/frac{1}{2}(X/theta-y)^T(X/theta-y)

    J(θ)=21i=1n(hθ(x(i))y(i))2=21(Xθy)T(Xθy)

  2. 求解方法
    正规求解:

    θ

    =

    (

    X

    T

    X

    )

    1

    X

    T

    y

    /theta=(X^TX)^{-1}X^Ty

    θ=(XTX)1XTy
    梯度下降:

    J

    (

    θ

    )

    θ

    j

    =

    θ

    j

    α

    i

    =

    1

    n

    (

    h

    θ

    (

    x

    (

    i

    )

    )

    y

    (

    i

    )

    )

    x

    j

    /frac{/partial J(/theta)}{/partial /theta_j}=/theta_j-/alpha/sum_{i=1}^n(h_{/theta}(x^{(i)})-y^{(i)})x_j

    θjJ(θ)=θjαi=1n(hθ(x(i))y(i))xj
    3.正则项

/quad

L0正则项 L1正则项 L2正则项 Elastic Net
形式

I

{

θ

i

0

}

/sum I_{/{/theta_i /neq0/}}

I{θi=0}

θ

i

/sum /vert/theta_i /vert

θi

θ

i

2

/sum /theta_i^2

θi2

ρ

θ

i

+

(

1

ρ

)

θ

i

2

,

ρ

[

0

,

1

]

/rho /sum /vert/theta_i /vert+(1-/rho)/sum /theta_i^2 , /rho /in[0,1]

ρθi+(1ρ)θi2,ρ[0,1]

含义 L0正则化的值是模型参数中非零参数的个数 L1范数是指向量中各个元素绝对值之和 L2正则化标识各个参数的平方的和的开方值 Elastic Net则为L1和L2的加权组合
结果倾向 L1会趋向于产生少量的特征,而其他的特征都是0 L2会选择更多的特征,但这些特征都会接近于0
使用场景 在所有特征中只有少数特征起重要作用的情况下,选择L1比较合适。L1不仅可以作为正则化手段,其在特征选择时候非常有用 如果所有特征中,大部分特征都能起作用,而且起的作用很平均,那么使用L2更合适。
  1. R

    2

    R^2

    R2 评估

    R

    2

    =

    i

    =

    1

    n

    (

    y

    i

    ^

    y

    )

    2

    i

    =

    1

    n

    (

    y

    i

    y

    )

    2

    =

    1

    ϵ

    ^

    i

    2

    i

    =

    1

    n

    (

    y

    i

    y

    )

    2

    =

    1

    (

    y

    ^

    y

    i

    )

    2

    i

    =

    1

    n

    (

    y

    i

    y

    )

    2

    R^2=/frac{/sum_{i=1}^n (/hat{y_i}- /overline{y})^2}{/sum_{i=1}^n (y_i- /overline{y})^2}=1-/frac{/hat{/epsilon} _i^2}{/sum_{i=1}^n (y_i- /overline{y})^2}=1-/frac{(/hat{y}-y_i) ^2}{/sum_{i=1}^n (y_i- /overline{y})^2}

    R2=i=1n(yiy)2i=1n(yi^y)2=1i=1n(yiy)2ϵ^i2=1i=1n(yiy)2(y^yi)2

    (

    i

    =

    1

    n

    (

    y

    i

    y

    )

    2

    i

    =

    1

    n

    (

    y

    i

    ^

    y

    )

    2

    +

    i

    =

    1

    n

    ϵ

    ^

    i

    2

    )

    (/sum_{i=1}^n (y_i- /overline{y})^2/ge/sum_{i=1}^n (/hat{y_i}- /overline{y})^2 +/sum_{i=1}^n/hat{/epsilon} _i^2)

    (i=1n(yiy)2i=1n(yi^y)2+i=1nϵ^i2)
    其中,

    y

    ^

    /hat{y}

    y^为估计值,

    x

    i

    ,

    y

    i

    x_i,y_i

    xi,yi为样本值。

    R

    2

    R^2

    R2越大,拟合效果越好。括号中的等号当且仅当

    θ

    /theta

    θ为无偏估计时成立。

梯度下降

/quad

批量梯度下降 随机梯度下降 mini-batch 梯度下降
式子

θ

j

:

θ

j

α

i

=

1

n

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

x

j

/theta_j: /theta_j-/alpha/sum_{i=1}^n(h_{/theta}(x^{(i)})-y^{(i)})x_j

θj:θjαi=1n(hθ(x(i))y(i))xj

θ

j

:

θ

j

α

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

x

j

/theta_j:/theta_j-/alpha (h_{/theta}(x^{(i)})-y^{(i)})x_j

θj:θjα(hθ(x(i))y(i))xj

θ

j

:

θ

j

α

i

=

1

m

(

h

θ

(

x

(

i

)

)

y

(

i

)

)

x

j

/theta_j: /theta_j-/alpha/sum_{i=1}^m(h_{/theta}(x^{(i)})-y^{(i)})x_j

θj:θjαi=1m(hθ(x(i))y(i))xj,

m

<

n

m<n

m<n

局部加权线性回归

  1. 目标函数

    J

    (

    θ

    )

    =

    i

    =

    1

    n

    w

    (

    i

    )

    (

    h

    θ

    (

    x

    (

    i

    )

    )

    y

    (

    i

    )

    )

    2

    J(/theta)=/sum_{i=1}^nw^{(i)}(h_{/theta}(x^{(i)})-y^{(i)})^2

    J(θ)=i=1nw(i)(hθ(x(i))y(i))2

    w

    w

    w为权重,若为高斯函数,则

    w

    (

    i

    )

    =

    e

    x

    p

    (

    (

    x

    (

    i

    )

    x

    )

    2

    2

    τ

    2

    )

    w^{(i)}=exp(-/frac{(x^{(i)}-x)^2}{2/tau^2})

    w(i)=exp(2τ2(x(i)x)2),其中

    τ

    /tau

    τ为带宽。

logistic回归

  1. 对数线性模型
    对数几率:

    log

    i

    t

    (

    p

    )

    =

    log

    p

    1

    p

    =

    log

    h

    θ

    (

    x

    )

    1

    h

    θ

    (

    x

    )

    =

    θ

    T

    x

    /log it(p)=/log/frac{p}{1-p}=/log/frac{h_{/theta}(x)}{1-h_{/theta}(x)}=/theta^Tx

    logit(p)=log1pp=log1hθ(x)hθ(x)=θTx

  2. sigmoid函数

    g

    (

    z

    )

    =

    1

    1

    +

    e

    z

    g(z)=/frac{1}{1+e^{-z}}

    g(z)=1+ez1

    g

    (

    z

    )

    =

    g

    (

    z

    )

    (

    1

    g

    (

    z

    )

    )

    g'(z)=g(z)(1-g(z))

    g(z)=g(z)(1g(z))

  3. 参数估计
    假定:

    P

    (

    y

    =

    1

    x

    ;

    θ

    )

    =

    h

    θ

    (

    x

    )

    ,

    P

    (

    y

    =

    0

    x

    ;

    θ

    )

    =

    1

    h

    θ

    (

    x

    )

    P(y=1|x;/theta)=h_{/theta}(x),P(y=0|x;/theta)=1-h_{/theta}(x)

    P(y=1x;θ)=hθ(x),P(y=0x;θ)=1hθ(x)
    则有

    p

    (

    y

    x

    ;

    θ

    )

    =

    (

    h

    θ

    (

    x

    )

    )

    y

    (

    1

    h

    θ

    (

    x

    )

    )

    1

    y

    p(y|x;/theta)=(h_{/theta}(x))^y(1-h_{/theta}(x))^{1-y}

    p(yx;θ)=(hθ(x))y(1hθ(x))1y

    log

    (

    L

    (

    θ

    )

    )

    =

    log

    i

    =

    1

    n

    p

    (

    y

    x

    ;

    θ

    )

    =

    i

    =

    1

    n

    y

    (

    i

    )

    log

    h

    (

    x

    (

    i

    )

    )

    +

    (

    1

    y

    (

    i

    )

    )

    log

    (

    1

    h

    (

    x

    (

    i

    )

    )

    )

    /log(L(/theta))=/log /prod_{i=1}^n p(y|x;/theta)=/sum_{i=1}^n y^{(i)}/log h(x^{(i)})+(1-y^{(i)})/log(1-h(x^{(i)}))

    log(L(θ))=logi=1np(yx;θ)=i=1ny(i)logh(x(i))+(1y(i))log(1h(x(i)))

    θ

    j

    /theta_j

    θj求偏导

    l

    (

    θ

    )

    θ

    j

    =

    i

    =

    1

    n

    (

    y

    (

    i

    )

    h

    (

    x

    (

    i

    )

    )

    1

    y

    (

    i

    )

    1

    h

    (

    x

    (

    i

    )

    )

    )

    h

    (

    x

    (

    i

    )

    )

    θ

    j

    =

    i

    =

    1

    n

    (

    y

    (

    i

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    1

    y

    (

    i

    )

    1

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    θ

    j

    =

    i

    =

    1

    n

    (

    y

    (

    i

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    1

    y

    (

    i

    )

    1

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    (

    1

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    θ

    T

    x

    (

    i

    )

    θ

    j

    =

    i

    =

    1

    n

    (

    y

    (

    i

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    x

    (

    i

    )

    /begin{aligned} /frac{/partial l(/theta)}{/partial /theta_j} &= /sum_{i=1}^n(/frac{y^{(i)}}{h(x^{(i)})}- /frac{1-y^{(i)}}{1-h(x^{(i)})})/frac{/partial h(x^{(i)})}{/partial /theta_j}// &=/sum_{i=1}^n(/frac{y^{(i)}}{g(/theta^Tx^{(i)})}- /frac{1-y^{(i)}}{1-g(/theta^Tx^{(i)})})/frac{/partial g(/theta^Tx^{(i)})}{/partial /theta_j}// &=/sum_{i=1}^n(/frac{y^{(i)}}{g(/theta^Tx^{(i)})}- /frac{1-y^{(i)}}{1-g(/theta^Tx^{(i)})})g(/theta^Tx^{(i)})(1-g(/theta^Tx^{(i)}))/frac{/partial /theta^Tx^{(i)}}{/partial /theta_j}// &=/sum_{i=1}^n (y^{(i)}-g(/theta^Tx^{(i)}))x^{(i)} /end{aligned}

    θjl(θ)=i=1n(h(x(i))y(i)1h(x(i))1y(i))θjh(x(i))=i=1n(g(θTx(i))y(i)1g(θTx(i))1y(i))θjg(θTx(i))=i=1n(g(θTx(i))y(i)1g(θTx(i))1y(i))g(θTx(i))(1g(θTx(i)))θjθTx(i)=i=1n(y(i)g(θTx(i)))x(i)

  4. 损失函数

    L

    (

    θ

    )

    =

    log

    (

    L

    (

    θ

    )

    )

    L(/theta)=-/log(L(/theta))

    L(θ)=log(L(θ))

  5. 梯度下降
    批量梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    i

    =

    1

    n

    (

    y

    (

    i

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    x

    (

    i

    )

    /theta_j: /theta_j+/alpha/sum_{i=1}^n(y^{(i)}-g(/theta^Tx^{(i)}))x^{(i)}

    θj:θj+αi=1n(y(i)g(θTx(i)))x(i)
    随机梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    (

    y

    (

    i

    )

    g

    (

    θ

    T

    x

    (

    i

    )

    )

    )

    x

    (

    i

    )

    /theta_j: /theta_j+/alpha(y^{(i)}-g(/theta^Tx^{(i)}))x^{(i)}

    θj:θj+α(y(i)g(θTx(i)))x(i)
    mini-batch梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    i

    =

    1

    m

    (

    y

    (

    i

    )

    h

    θ

    (

    x

    (

    i

    )

    )

    )

    x

    (

    i

    )

    ,

    m

    <

    n

    /theta_j: /theta_j+/alpha/sum_{i=1}^m(y^{(i)}-h_/theta (x^{(i)}))x^{(i)},m<n

    θj:θj+αi=1m(y(i)hθ(x(i)))x(i),m<n

/quad

线性回归 逻辑回归

h

θ

(

x

(

i

)

)

h_/theta (x^{(i)})

hθ(x(i))

h

θ

(

x

(

i

)

)

)

=

θ

T

x

h_/theta (x^{(i)}))=/theta^Tx

hθ(x(i)))=θTx

h

θ

(

x

(

i

)

)

)

=

1

1

+

e

x

h_/theta (x^{(i)}))=/frac{1}{1+e^{-x}}

hθ(x(i)))=1+ex1

假设

ϵ

/epsilon

ϵ(也就是

y

=

θ

T

x

+

ϵ

y=/theta^Tx+/epsilon

y=θTx+ϵ)服从高斯分布,是指数族分布

y

y

y服从二项分布,是指数族分布

指数族分布的函数梯度下降都有类似的形式

Softmax回归

  1. s

    o

    f

    t

    m

    a

    x

    softmax

    softmax函数

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    =

    e

    x

    p

    (

    z

    k

    )

    i

    =

    1

    K

    e

    x

    p

    (

    z

    i

    )

    softmax(z_k)=/frac{exp(z_k)}{/sum_{i=1}^K exp(z_i)}

    softmax(zk)=i=1Kexp(zi)exp(zk)

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    z

    k

    =

    e

    x

    p

    (

    z

    k

    )

    i

    =

    1

    K

    e

    x

    p

    (

    z

    i

    )

    e

    x

    p

    (

    z

    k

    )

    e

    x

    p

    (

    z

    k

    )

    (

    i

    =

    1

    K

    e

    x

    p

    (

    z

    i

    )

    )

    2

    =

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    (

    1

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    )

    /frac{/partial softmax(z_k)}{/partial z_k}=/frac{exp(z_k)/sum_{i=1}^K exp(z_i)-exp(z_k)exp(z_k)}{(/sum_{i=1}^K exp(z_i))^2}=softmax(z_k)(1-softmax(z_k))

    zksoftmax(zk)=(i=1Kexp(zi))2exp(zk)i=1Kexp(zi)exp(zk)exp(zk)=softmax(zk)(1softmax(zk))

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    z

    j

    =

    e

    x

    p

    (

    z

    k

    )

    e

    x

    p

    (

    z

    j

    )

    (

    i

    =

    1

    K

    e

    x

    p

    (

    z

    i

    )

    )

    2

    =

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    s

    o

    f

    t

    m

    a

    x

    (

    z

    j

    )

    /frac{/partial softmax(z_k)}{/partial z_j}=/frac{-exp(z_k)exp(z_j)}{(/sum_{i=1}^K exp(z_i))^2}=-softmax(z_k)softmax(z_j)

    zjsoftmax(zk)=(i=1Kexp(zi))2exp(zk)exp(zj)=softmax(zk)softmax(zj)

    log

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    z

    k

    =

    (

    z

    k

    log

    i

    =

    1

    K

    e

    x

    p

    (

    z

    i

    )

    )

    z

    k

    =

    1

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    /frac{/partial /log softmax(z_k)}{/partial z_k}=/frac{/partial (z_k-/log /sum_{i=1}^K exp(z_i))}{/partial z_k}=1-softmax(z_k)

    zklogsoftmax(zk)=zk(zklogi=1Kexp(zi))=1softmax(zk)

    (

    log

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    z

    k

    =

    1

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    =

    1

    s

    o

    f

    t

    m

    a

    x

    (

    z

    k

    )

    )

    /left(/frac{/partial /log softmax(z_k)}{/partial z_k}=/frac{1}{softmax(z_k)}/partial softmax(z_k)=1-softmax(z_k)/right)

    (zklogsoftmax(zk)=softmax(zk)1softmax(zk)=1softmax(zk))

  2. k

    k

    k分类
    对于第

    k

    k

    k类,参数为

    θ

    k

    =

    (

    θ

    1

    ,

    ,

    θ

    m

    )

    T

    /theta_k=(/theta_1,/dots,/theta_m)^T

    θk=(θ1,,θm)T

    m

    m

    m为数据

    x

    x

    x的维数,

    Θ

    /Theta

    Θ是一个

    k

    ×

    m

    k/times m

    k×m的矩阵。第

    i

    i

    i个样本

    x

    (

    i

    )

    x^{(i)}

    x(i)的标签

    y

    (

    i

    )

    =

    (

    y

    1

    (

    i

    )

    ,

    ,

    y

    k

    (

    i

    )

    )

    /bm{y}^{(i)}=(y_1^{(i)},/dots,y_k^{(i)})

    y(i)=(y1(i),,yk(i))
    则假定:

    P

    (

    y

    =

    k

    x

    ;

    θ

    )

    =

    e

    x

    p

    (

    θ

    k

    T

    x

    )

    k

    =

    1

    K

    e

    x

    p

    (

    θ

    k

    T

    x

    )

    ,

    k

    =

    1

    ,

    2

    ,

    ,

    K

    /begin{aligned} P(y=k|x;/theta)&=/frac{exp(/theta_k^Tx)}{/sum_{k=1}^K exp(/theta_k^Tx)},k=1,2,/dots,K// /end{aligned}

    P(y=kx;θ)=k=1Kexp(θkTx)exp(θkTx),k=1,2,,K

    (

    y

    (

    i

    )

    )

    T

    =

    (

    y

    ^

    1

    (

    i

    )

    ,

    ,

    y

    ^

    K

    (

    i

    )

    )

    =

    (

    P

    (

    y

    =

    1

    x

    (

    i

    )

    ;

    θ

    )

    ,

    ,

    P

    (

    y

    =

    K

    x

    (

    i

    )

    )

    (/bm{y}^{(i)})^T=(/hat{y}^{(i)}_1,/dots,/hat{y}^{(i)}_K)=(P(y=1|x^{(i)};/theta),/dots,P(y=K|x^{(i)})

    (y(i))T=(y^1(i),,y^K(i))=(P(y=1x(i);θ),,P(y=Kx(i)).

  3. 对数似然函数

    log

    (

    L

    (

    θ

    )

    )

    =

    log

    i

    =

    1

    n

    p

    (

    y

    x

    ;

    θ

    )

    =

    l

    o

    g

    i

    =

    1

    n

    k

    =

    1

    K

    (

    P

    (

    y

    =

    k

    x

    ;

    θ

    )

    )

    y

    k

    (

    i

    )

    =

    l

    o

    g

    i

    =

    1

    n

    k

    =

    1

    K

    (

    e

    x

    p

    (

    θ

    k

    T

    x

    (

    i

    )

    )

    k

    =

    1

    K

    e

    x

    p

    (

    θ

    k

    T

    x

    (

    i

    )

    )

    )

    )

    y

    k

    (

    i

    )

    (

    =

    i

    =

    1

    n

    i

    =

    1

    K

    y

    k

    (

    i

    )

    log

    y

    ^

    k

    (

    i

    )

    =

    i

    =

    1

    n

    (

    y

    (

    i

    )

    )

    T

    log

    y

    ^

    =

    y

    T

    log

    y

    ^

    )

    =

    i

    =

    1

    n

    k

    =

    1

    K

    y

    k

    (

    i

    )

    (

    θ

    k

    T

    x

    (

    i

    )

    log

    k

    =

    1

    K

    e

    x

    p

    (

    θ

    k

    T

    x

    (

    i

    )

    )

    )

    )

    /begin{aligned} /log(L(/theta))&=/log /prod_{i=1}^n p(y|x;/theta)// &=log /prod_{i=1}^n /prod_{k=1}^K (P(y=k|x;/theta))^{y_k^{(i)}}// &=log /prod_{i=1}^n /prod_{k=1}^K /left(/frac{exp(/theta_k^Tx^{(i)})}{/sum_{k=1}^K exp(/theta_k^Tx^{(i)}))} /right)^{y_k^{(i)}}// &/left(=/sum_{i=1}^n /sum_{i=1}^K y_k^{(i)} /log /hat{y}^{(i)}_k =/sum_{i=1}^n (/bm{y}^{(i)})^T/log/bm{/hat{y}} =/bm{y}^T/log/bm{/hat{y}} /right)// &=/sum_{i=1}^n /sum_{k=1}^K y_k^{(i)} (/theta_k^Tx^{(i)}-/log/sum_{k=1}^K exp(/theta_k^Tx^{(i)}))) /end{aligned}

    log(L(θ))=logi=1np(yx;θ)=logi=1nk=1K(P(y=kx;θ))yk(i)=logi=1nk=1K(k=1Kexp(θkTx(i)))exp(θkTx(i)))yk(i)(=i=1ni=1Kyk(i)logy^k(i)=i=1n(y(i))Tlogy^=yTlogy^)=i=1nk=1Kyk(i)(θkTx(i)logk=1Kexp(θkTx(i))))

θ

j

/theta_j

θj求偏导

L

(

θ

)

θ

j

=

i

=

1

n

y

j

(

i

)

(

1

e

x

p

(

θ

j

T

x

(

i

)

)

l

=

1

K

e

x

p

(

θ

l

T

x

(

i

)

)

)

(

x

(

i

)

)

T

/begin{aligned} /frac{/partial L(/theta)}{/partial /theta_j}=/sum_{i=1}^n y_j^{(i)} (1-/frac{exp(/theta_j^Tx^{(i)})}{/sum_{l=1}^K exp(/theta_l^Tx^{(i)})}) (x^{(i)})^T// /end{aligned}

θjL(θ)=i=1nyj(i)(1l=1Kexp(θlTx(i))exp(θjTx(i)))(x(i))T

  1. 损失函数

    L

    (

    θ

    )

    =

    log

    (

    L

    (

    θ

    )

    )

    L(/theta)=-/log(L(/theta))

    L(θ)=log(L(θ))

  2. 梯度下降
    批量梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    i

    =

    1

    n

    y

    j

    (

    i

    )

    (

    1

    e

    x

    p

    (

    θ

    j

    T

    x

    (

    i

    )

    )

    l

    =

    1

    K

    e

    x

    p

    (

    θ

    l

    T

    x

    (

    i

    )

    )

    )

    (

    x

    (

    i

    )

    )

    T

    /theta_j: /theta_j+/alpha/sum_{i=1}^n y_j^{(i)} (1-/frac{exp(/theta_j^Tx^{(i)})}{/sum_{l=1}^K exp(/theta_l^Tx^{(i)})}) (x^{(i)})^T

    θj:θj+αi=1nyj(i)(1l=1Kexp(θlTx(i))exp(θjTx(i)))(x(i))T
    随机梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    y

    j

    (

    i

    )

    (

    1

    e

    x

    p

    (

    θ

    j

    T

    x

    (

    i

    )

    )

    l

    =

    1

    K

    e

    x

    p

    (

    θ

    l

    T

    x

    (

    i

    )

    )

    )

    (

    x

    (

    i

    )

    )

    T

    /theta_j: /theta_j+/alpha y_j^{(i)} (1-/frac{exp(/theta_j^Tx^{(i)})}{/sum_{l=1}^K exp(/theta_l^Tx^{(i)})}) (x^{(i)})^T

    θj:θj+αyj(i)(1l=1Kexp(θlTx(i))exp(θjTx(i)))(x(i))T
    mini-batch梯度下降

    θ

    j

    :

    θ

    j

    +

    α

    i

    =

    1

    m

    y

    j

    (

    i

    )

    (

    1

    e

    x

    p

    (

    θ

    j

    T

    x

    (

    i

    )

    )

    l

    =

    1

    K

    e

    x

    p

    (

    θ

    l

    T

    x

    (

    i

    )

    )

    )

    (

    x

    (

    i

    )

    )

    T

    ,

    m

    <

    n

    /theta_j: /theta_j+/alpha/sum_{i=1}^m y_j^{(i)} (1-/frac{exp(/theta_j^Tx^{(i)})}{/sum_{l=1}^K exp(/theta_l^Tx^{(i)})}) (x^{(i)})^T,m<n

    θj:θj+αi=1myj(i)(1l=1Kexp(θlTx(i))exp(θjTx(i)))(x(i))T,m<n

未经允许不得转载:作者:1187-吴同学, 转载或复制请以 超链接形式 并注明出处 拜师资源博客
原文地址:《机器学习进阶(一)回归》 发布于2021-02-24

分享到:
赞(0) 打赏

评论 抢沙发

评论前必须登录!

  注册



长按图片转发给朋友

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

Vieu3.3主题
专业打造轻量级个人企业风格博客主题!专注于前端开发,全站响应式布局自适应模板。

登录

忘记密码 ?

您也可以使用第三方帐号快捷登录

Q Q 登 录
微 博 登 录