代码:
$$\begin{aligned}
KPI&=(N+S)W \\
PI&=N+S \\
I&=W
\end{aligned}$$
$$\begin{aligned}
loss&=(y_i-Q(s,a;\theta))^2 \\
&=(r+\gamma \max Q(s^{'},a^{'};\theta^{-})-Q(s,a;\theta)) ^2\\
\end{aligned}$$ $y
效果如下:
K
P
I
=
(
N
+
S
)
W
P
I
=
N
+
S
I
=
W
\begin{aligned} KPI&=(N+S)W \\ PI&=N+S \\ I&=W \end{aligned}
KPIPII=(N+S)W=N+S=W
l
o
s
s
=
(
y
i
−
Q
(
s
,
a
;
θ
)
)
2
=
(
r
+
γ
max
Q
(
s
′
,
a
′
;
θ
−
)
−
Q
(
s
,
a
;
θ
)
)
2
\begin{aligned} loss&=(y_i-Q(s,a;\theta))^2 \\ &=(r+\gamma \max Q(s^{'},a^{'};\theta^{-})-Q(s,a;\theta)) ^2\\ \end{aligned}
loss=(yi−Q(s,a;θ))2=(r+γmaxQ(s′,a′;θ−)−Q(s,a;θ))2
|