机器学习笔记之马尔可夫链蒙特卡洛方法(三)MH采样算法

您所在的位置:网站首页 马尔可夫算法设计与实现 机器学习笔记之马尔可夫链蒙特卡洛方法(三)MH采样算法

机器学习笔记之马尔可夫链蒙特卡洛方法(三)MH采样算法

2024-01-22 10:55| 来源: 网络整理| 查看: 265

机器学习笔记之马尔可夫链蒙特卡洛方法——MH采样算法 引言回顾:马尔可夫链与平稳分布马尔可夫链平稳分布 MH采样算法采样思路MH采样算法过程

引言

上一节介绍了马尔可夫链(Markov Chain)以及平稳分布。本节将马尔可夫链与蒙特卡洛方法相结合,介绍MH采样算法。

回顾:马尔可夫链与平稳分布 马尔可夫链

基于齐次马尔可夫假设,以一阶齐次马尔可夫假设 为例, t + 1 t+1 t+1时刻的随机变量 X t + 1 \mathcal X_{t+1} Xt+1​与 t t t时刻的随机变量 X t \mathcal X_t Xt​之间的关联关系如下: P ( X t + 1 ∣ X t , X t − 1 , ⋯   , X 1 ) = P ( X t + 1 ∣ X t ) P(\mathcal X_{t+1} \mid \mathcal X_t,\mathcal X_{t-1},\cdots,\mathcal X_1) = P(\mathcal X_{t+1} \mid \mathcal X_t) P(Xt+1​∣Xt​,Xt−1​,⋯,X1​)=P(Xt+1​∣Xt​) 针对马尔可夫链中任一时刻的随机变量 X t \mathcal X_t Xt​可选择的结果均是离散的 这种情况,定义共包含 K \mathcal K K种选择方式, t t t时刻随机变量 X t \mathcal X_t Xt​的概率分布 π ( X t ) \pi(\mathcal X_t) π(Xt​)表示如下: π ( X t ) = [ π ( X t = x 1 ) , π ( X t = x 2 ) , ⋯   , π ( X t = x K ) ] K × 1 T \pi(\mathcal X_t) = \left[\pi(\mathcal X_t = x_1),\pi(\mathcal X_t = x_2),\cdots,\pi(\mathcal X_t = x_{\mathcal K})\right]^{T}_{\mathcal K \times 1} π(Xt​)=[π(Xt​=x1​),π(Xt​=x2​),⋯,π(Xt​=xK​)]K×1T​ 对应地,该马尔可夫链的状态转移矩阵是一个 K × K \mathcal K \times \mathcal K K×K的方阵;并且矩阵中的每一个元素 a i j ∈ A a_{ij} \in \mathcal A aij​∈A均表示基于转移过程选择的条件概率: A = [ a i j ] K × K i , j ∈ { 1 , 2 , ⋯   , K } a i j = P ( X t + 1 = x j ∣ X t = x i ) t = 1 , 2 , ⋯ \mathcal A = [a_{ij}]_{\mathcal K \times \mathcal K} \quad i,j \in \{1,2,\cdots,\mathcal K\} \\ a_{ij} = P(\mathcal X_{t+1} = x_j \mid \mathcal X_t = x_i) \quad t=1,2,\cdots A=[aij​]K×K​i,j∈{1,2,⋯,K}aij​=P(Xt+1​=xj​∣Xt​=xi​)t=1,2,⋯

平稳分布

平稳分布(Stationary Distribution)表示马尔可夫链具有某种平稳性质的概率分布。平稳分布存在的底层逻辑在于 状态转移矩阵是一个双随机矩阵:

状态转移矩阵是一个 K × K \mathcal K \times \mathcal K K×K的方阵;状态转移矩阵各行、列元素之和为1;

基于双随机矩阵的性质,得到状态转移矩阵的特征值均 ≤ 1 \leq1 ≤1恒成立。 通过证明,只要马尔可夫链的状态转移矩阵 确定的条件下,在未来的转移过程中必然会逼近至平稳分布。 具体证明过程请看上一节内容~传送门

假设 m m m时刻状态下达到平稳分布,则有: π ( X m ) = π ( X m + 1 ) = π ( X m + 2 ) = ⋯ \pi(\mathcal X_m) = \pi(\mathcal X_{m+1}) = \pi(\mathcal X_{m+2}) = \cdots π(Xm​)=π(Xm+1​)=π(Xm+2​)=⋯

如何判定当前时刻的分布是否为平稳分布?需要借助细致平衡原则(Detail Balance): π ( X = x i ) ⋅ P ( X = x ∗ ∣ X = x ) = π ( X = x ∗ ) ⋅ P ( X = x ∣ X = x ∗ ) \pi(\mathcal X = x_i) \cdot P(\mathcal X = x^* \mid \mathcal X = x) = \pi(\mathcal X = x^*) \cdot P(\mathcal X = x \mid \mathcal X = x^*) π(X=xi​)⋅P(X=x∗∣X=x)=π(X=x∗)⋅P(X=x∣X=x∗) 细致平衡是概率分布是平稳分布的充分非必要条件,因此可以通过细致平衡去判别平稳分布,反之不然。

MH采样算法

在蒙特卡洛方法介绍中提到,推断(Inference)关心的问题是后验概率 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(Z∣X)的结果,或者是关于 P ( Z ∣ X ) P(\mathcal Z\mid \mathcal X) P(Z∣X)的期望信息: E Z ∣ X [ f ( Z ) ] = ∫ Z ∣ X P ( Z ∣ X ) f ( Z ) d Z \mathbb E_{\mathcal Z \mid \mathcal X} [f(\mathcal Z)] = \int_{\mathcal Z \mid\mathcal X} P(\mathcal Z \mid \mathcal X)f(\mathcal Z) d\mathcal Z EZ∣X​[f(Z)]=∫Z∣X​P(Z∣X)f(Z)dZ 通过蒙特卡洛方法,从概率分布 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(Z∣X)中进行采样,再进行计算: z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) ∼ P ( Z ∣ X ) E Z ∣ X [ f ( Z ) ] = 1 N ∑ i = 1 N f ( z ( i ) ) z^{(1)},z^{(2)},\cdots,z^{(N)} \sim P(\mathcal Z \mid \mathcal X) \\ \mathbb E_{\mathcal Z \mid \mathcal X} [f(\mathcal Z)] = \frac{1}{N} \sum_{i=1}^{N} f(z^{(i)}) z(1),z(2),⋯,z(N)∼P(Z∣X)EZ∣X​[f(Z)]=N1​i=1∑N​f(z(i)) 但真实情况是:从 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(Z∣X)中采集样本是非常复杂的事情。因此借助马尔可夫链(Markov Chain)来间接获取样本信息。

采样思路

马尔可夫链是如何实现采样的? 假设已经得到一个状态转移矩阵 A ∗ \mathcal A^* A∗, A ∗ \mathcal A^* A∗满足:在一阶齐次马尔可夫假设的条件下,任意两个连续状态下的随机变量 X \mathcal X X对应的概率分布,均满足细致平衡原则。

换句话说, A ∗ \mathcal A^* A∗使马尔可夫链 { X T } \{\mathcal X_{T}\} {XT​}共用同一个概率分布,也就是平稳分布。

此时,给定一个初始节点: X i n i t = x i ( i = 1 , 2 , ⋯   , K ) \mathcal X_{init} = x_i \quad (i=1,2,\cdots,\mathcal K) Xinit​=xi​(i=1,2,⋯,K)通过状态转移过程,随机得到下一状态随机变量的选择结果 x j x_j xj​: x j ∼ P ( X 1 ∣ X i n i t = x i ) x_j \sim P(\mathcal X_1 \mid \mathcal X_{init} = x_i) xj​∼P(X1​∣Xinit​=xi​)同上,根据上一时刻的状态转移过程,随机得到下一状态随机变量的选择结果 x k x_k xk​: x k ∼ P ( X 2 ∣ X 1 = x j ) x_k \sim P(\mathcal X_2 \mid \mathcal X_1 = x_j) xk​∼P(X2​∣X1​=xj​)

以此类推,最终会得到这样一组样本点集合。这些样本点均服从平稳分布: { x i , x j , x k , ⋯   } \{x_i,x_j,x_k,\cdots\} {xi​,xj​,xk​,⋯}

MH采样算法过程

基于上述采样思路,将获取平稳分布问题转化为:如何找到一个恰当的状态转移矩阵,使得马尔可夫链的各分布是平稳分布。

首先,构建关于 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(Z∣X)的马尔可夫链,并随机构建一个状态转移矩阵 Q \mathcal Q Q: Q = [ q i j ] K × K ( i , j ∈ { 1 , 2 , ⋯   , K } ) \mathcal Q = [q_{ij}]_{\mathcal K \times \mathcal K} \quad (i,j \in \{1,2,\cdots,\mathcal K\}) Q=[qij​]K×K​(i,j∈{1,2,⋯,K}) 此时关于细致平衡的等式两项有: 不能说是一定不相等,而是相等的概率极低。任意随机一个 Q \mathcal Q Q就能得到平稳分布,那运气可太好了~ P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ≠ P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) P(\mathcal Z =z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z) \neq P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) P(Z=z∣X)⋅Q(Z=z∗∣Z=z)=P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)

基于上述式子,假设存在关于 z , z ∗ z,z^* z,z∗的函数 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗),使得: P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ⋅ α ( z , z ∗ ) = P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) ⋅ α ( z ∗ , z ) P(\mathcal Z =z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z) \cdot \alpha(z,z^*)= P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) \cdot \alpha(z^*,z) P(Z=z∣X)⋅Q(Z=z∗∣Z=z)⋅α(z,z∗)=P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)⋅α(z∗,z) 我们将 Q ( Z = z ∗ ∣ Z = z ) ⋅ α ( z , z ∗ ) Q(\mathcal Z = z^* \mid \mathcal Z = z) \cdot \alpha(z,z^*) Q(Z=z∗∣Z=z)⋅α(z,z∗)记作 P ( Z = z ∗ ∣ Z = z ) \mathcal P(\mathcal Z = z^* \mid \mathcal Z = z) P(Z=z∗∣Z=z); 反之, Q ( Z = z ∣ Z = z ∗ ) ⋅ α ( z ∗ , z ) \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) \cdot \alpha(z^*,z) Q(Z=z∣Z=z∗)⋅α(z∗,z)记作 P ( Z = z ∣ Z = z ∗ ) \mathcal P(\mathcal Z = z \mid \mathcal Z = z^*) P(Z=z∣Z=z∗)。上述公式可转化为: 只是一个符号的变换, P \mathcal P P和 P P P不是同一个符号~ P ( Z = z ∣ X ) ⋅ P ( Z = z ∗ ∣ Z = z ) = P ( Z = z ∗ ∣ X ) ⋅ P ( Z = z ∣ Z = z ∗ ) P(\mathcal Z = z \mid \mathcal X) \cdot \mathcal P(\mathcal Z = z^* \mid \mathcal Z = z) = P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal P(\mathcal Z = z \mid \mathcal Z = z^*) P(Z=z∣X)⋅P(Z=z∗∣Z=z)=P(Z=z∗∣X)⋅P(Z=z∣Z=z∗) 此时,上述公式满足细致平衡原则,此时的分布是平稳分布。

回溯上述过程,我们称 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗)函数为接收率,其具体表示如下: α ( z , z ∗ ) = min ⁡ [ 1 , P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ] \alpha(z,z^*) = \mathop{\min}\left[1,\frac{P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*)}{P(\mathcal Z = z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z)}\right] α(z,z∗)=min[1,P(Z=z∣X)⋅Q(Z=z∗∣Z=z)P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)​] 对接受率函数 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗)进行分析:

观察 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗),因为 Q ( Z = z ∣ Z = z ∗ ) \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) Q(Z=z∣Z=z∗)和 Q ( Z = z ∣ Z = z ∗ ) \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) Q(Z=z∣Z=z∗)均是条件概率,因此分数项大于等于0恒成立。从而 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗)的值域为: α ( z , z ∗ ) ∈ [ 0 , 1 ] \alpha(z,z^*) \in [0,1] α(z,z∗)∈[0,1]将 α ( z , z ∗ ) \alpha(z,z^*) α(z,z∗)代入上式中,有: P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ⋅ α ( z , z ∗ ) = P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ⋅ min ⁡ [ 1 , P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ] = min ⁡ [ P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) , P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) ] = P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) ⋅ min ⁡ [ 1 , P ( Z = z ∣ X ) ⋅ Q ( Z = z ∗ ∣ Z = z ) P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) ] = P ( Z = z ∗ ∣ X ) ⋅ Q ( Z = z ∣ Z = z ∗ ) ⋅ α ( z ∗ , z ) \begin{aligned} & P(\mathcal Z =z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z) \cdot \alpha(z,z^*) \\ & = P(\mathcal Z =z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z) \cdot \mathop{\min}\left[1,\frac{P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*)}{P(\mathcal Z = z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z)}\right] \\ & = \min \left[P(\mathcal Z =z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z),P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*)\right] \\ & = P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z=z^*) \cdot \mathop{\min}\left[1,\frac{P(\mathcal Z = z \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z)}{P(\mathcal Z = z^* \mid \mathcal X)\cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*)}\right] \\ & = P(\mathcal Z = z^* \mid \mathcal X) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*) \cdot \alpha(z^*,z) \end{aligned} ​P(Z=z∣X)⋅Q(Z=z∗∣Z=z)⋅α(z,z∗)=P(Z=z∣X)⋅Q(Z=z∗∣Z=z)⋅min[1,P(Z=z∣X)⋅Q(Z=z∗∣Z=z)P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)​]=min[P(Z=z∣X)⋅Q(Z=z∗∣Z=z),P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)]=P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)⋅min[1,P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)P(Z=z∣X)⋅Q(Z=z∗∣Z=z)​]=P(Z=z∗∣X)⋅Q(Z=z∣Z=z∗)⋅α(z∗,z)​

该接收率思路即MH采样算法(Metropolis Hastings)。 它的具体算法表示如下:

从(0,1)均匀分布中进行采样; u ∼ U ( 0 , 1 ) u \sim \mathcal U(0,1) u∼U(0,1)以上一时刻采样结果 z ( t − 1 ) z^{(t-1)} z(t−1)确定的条件下,对状态转移矩阵当前时刻 z ( t ) z^{(t)} z(t)进行采样; 此时,随机采出一个结果—— z ∗ z^* z∗,但不能直接作为 z ( t ) z^{(t)} z(t),需要满足后续的判断条件。 z ∗ ∼ Q ( Z = z ∣ Z = z ( t − 1 ) ) z^* \sim \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^{(t-1)}) z∗∼Q(Z=z∣Z=z(t−1))计算 α \alpha α函数的具体结果: α = min ⁡ [ 1 , P ( Z = z ∗ ) ⋅ Q ( Z = z ∣ Z = z ∗ ) P ( Z = z ) ⋅ Q ( Z = z ∗ ∣ Z = z ) ] \alpha = \min \left[1, \frac{P(\mathcal Z = z^*) \cdot \mathcal Q(\mathcal Z = z \mid \mathcal Z = z^*)}{P(\mathcal Z = z) \cdot \mathcal Q(\mathcal Z = z^* \mid \mathcal Z = z)}\right] α=min[1,P(Z=z)⋅Q(Z=z∗∣Z=z)P(Z=z∗)⋅Q(Z=z∣Z=z∗)​]对 u u u和 α \alpha α结果进行如下对比: 类似于‘拒绝采样’,从0-1均匀分布中采样的 u u u自身没有任何实际意义,它只是用来衡量 α \alpha α结果的性能。 如果: u ≤ α → z ( t ) = z ∗ u \leq \alpha \to z^{(t)} = z^* u≤α→z(t)=z∗; 存在 α \alpha α的概率接收 z ∗ z^* z∗样本。否则: z ( t ) = z ( t − 1 ) z^{(t)} = z^{(t-1)} z(t)=z(t−1) 此次的采样被拒绝,依然使用上一时刻的样本结果 z ( t − 1 ) z^{(t-1)} z(t−1)作为本时刻的输出样本。这次迭代确实白跑~样本数量没有减少,下次迭代依然将 z ( t − 1 ) z^{(t-1)} z(t−1)作为条件,基于该条件的概率分布进行采样。 最终会得到一系列样本: { z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) } \{z^{(1)},z^{(2)},\cdots,z^{(N)}\} {z(1),z(2),⋯,z(N)} 并最后通过上述样本点使用蒙特卡洛方法近似求解概率分布 P ( Z ∣ X ) P(\mathcal Z \mid \mathcal X) P(Z∣X)的期望结果: E Z ∣ X [ f ( Z ) ] = 1 N ∑ i = 1 N f ( z ( i ) ) \mathbb E_{\mathcal Z \mid \mathcal X} [f(\mathcal Z)] = \frac{1}{N} \sum_{i=1}^{N} f(z^{(i)}) EZ∣X​[f(Z)]=N1​i=1∑N​f(z(i))

下一节将介绍吉布斯采样算法(Gibbs)

相关参考: 机器学习-白板推导系列-蒙特卡洛方法3(Monte Carlo Method)-MH采样(Metropolis Hastings)



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3