【图像检索】DOLG论文

2024-03-01 19:11| 来源: 网络整理| 查看: 265

DOLG

文章目录 DOLGDOLG 模型Local BranchOrthogonal Fusion Module 附录向量的投影softplus激活函数参考图像检索任务通常从数据库中检索出和query图片相似的图片。检索任务通常使用全局特征向量通过相似性检索出候选图像，然后再利用他们的局部特征进行re-rank候选者。检索任务分为两个步骤，每个步骤分别利用全局特征和局部特征。DOLG模型通过利用图像中的全局和局部信息进行端到端的检索。首先利用多簇卷积层和自注意力提取局部特征；然后从局部特征中提取与全局表示正交的分量；最后将正交分量与全局表示合并聚合生成最终表示。

在这里插入图片描述

DOLG 模型

DOLG模型是一种用于端到端图像检索的信息融合框架。DOLG分别提取了局部特征 f l f_{l} fl和全局特征 f g f_{g} fg，然后利用正交融合模块从局部特征 f l f_{l} fl中提取与全局特征 f g f_{g} fg正交的分量 f l , o r t h f_{l,orth} fl,orth。理解正交分量这块需要用到向量的投影知识，见附录。最后合并正交分量 f l , o r t h f_{l,orth} fl,orth和全局特征 f g f_{g} fg，再经过一个全连接层输出最后的特征描述。

在这里插入图片描述

class DolgNet(LightningModule): def __init__(self, input_dim, hidden_dim, output_dim, num_of_classes): super().__init__() # backbone self.cnn = timm.create_model( 'resnet101', pretrained=True, features_only=True, in_chans=input_dim, out_indices=(2, 3) ) # 正交融合 self.orthogonal_fusion = OrthogonalFusion() # local 分支 self.local_branch = DolgLocalBranch(512, hidden_dim) self.gap = nn.AdaptiveAvgPool2d(1) self.gem_pool = GeM() self.fc_1 = nn.Linear(1024, hidden_dim) self.fc_2 = nn.Linear(int(2*hidden_dim), output_dim) self.criterion = ArcFace( in_features=output_dim, out_features=num_of_classes, scale_factor=30, margin=0.15, criterion=nn.CrossEntropyLoss() ) def forward(self, x): output = self.cnn(x) # 局部特征 local_feat = self.local_branch(output[0]) # ,hidden_channel,16,16 # 全局特征 res4 -->Gem-->FC global_feat = self.fc_1(self.gem_pool(output[1]).squeeze()) # ,1024 # 局部特征和全局特征的正交融合，输出为正交分量和全局特征的合并 feat = self.orthogonal_fusion(local_feat, global_feat) # 池化 feat = self.gap(feat).squeeze() # 全连接 feat = self.fc_2(feat) return feat Local Branch

局部分支包括多簇卷积层和自注意模块。多簇卷积层模块模拟特征金字塔，可以处理不同图像实例之间的比例变化，自注意模块利用注意力机制进行建模。

多簇模型包括三个dilated 卷积层和一个全局池化层。三个dilated 卷积层用来获得不同感受野的特征图。然后这四个特征图合并，并传送到一个 1 × 1 1 \times1 1×1卷积层。接下来，特征图进入自注意模块，为了进一步建模每一个局部特征点的重要性。自注意模块的输入先经过一个 1 × 1 1 \times 1 1×1的conv-bn模块，然后分成两路，一路对特征进行 L 2 L_{2} L2正则化，另一路经过Relu激活函数和带有SoftPlus激活函数的 1 × 1 1 \times1 1×1卷积层，然后两路特征进行矩阵运算输出局部特征。在这里插入图片描述

class MultiAtrous(nn.Module): def __init__(self, in_channel, out_channel, size, dilation_rates=[3, 6, 9]): super().__init__() self.dilated_convs = [ nn.Conv2d(in_channel, int(out_channel/4), kernel_size=3, dilation=rate, padding=rate) for rate in dilation_rates ] # P --> C,512,1-->R self.gap_branch = nn.Sequential( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channel, int(out_channel/4), kernel_size=1), nn.ReLU(), nn.Upsample(size=(size, size), mode='bilinear') ) self.dilated_convs.append(self.gap_branch) # nn.ModuleList它将不同的模块储存在一起，这些模块之间并没有什么先后顺序可言 self.dilated_convs = nn.ModuleList(self.dilated_convs) def forward(self, x): local_feat = [] for dilated_conv in self.dilated_convs: local_feat.append(dilated_conv(x)) local_feat = torch.cat(local_feat, dim=1) return local_feat class DolgLocalBranch(nn.Module): def __init__(self, in_channel, out_channel, hidden_channel=2048): super().__init__() # 多簇 self.multi_atrous = MultiAtrous(in_channel, hidden_channel, size=int(Config.image_size/8)) self.conv1x1_1 = nn.Conv2d(hidden_channel, out_channel, kernel_size=1) self.conv1x1_2 = nn.Conv2d(out_channel, out_channel, kernel_size=1, bias=False) self.conv1x1_3 = nn.Conv2d(out_channel, out_channel, kernel_size=1) self.relu = nn.ReLU() self.bn = nn.BatchNorm2d(out_channel) self.softplus = nn.Softplus() def forward(self, x): # 多簇 local_feat = self.multi_atrous(x) # C,1024,1 local_feat = self.conv1x1_1(local_feat) # R local_feat = self.relu(local_feat) # C,1024,1 local_feat = self.conv1x1_2(local_feat) # B local_feat = self.bn(local_feat) # R -->C,1024,1-->S attention_map = self.relu(local_feat) attention_map = self.conv1x1_3(attention_map) attention_map = self.softplus(attention_map) # L2norm local_feat = F.normalize(local_feat, p=2, dim=1) local_feat = local_feat * attention_map return local_feat Orthogonal Fusion Module

在这里插入图片描述

以 f l f_{l} fl和 f g f_{g} fg为输入，计算每个局部特征点 f l ( i , j ) f_{l}^{\left( i, j \right)} fl(i,j)在全局特征 f g f_{g} fg上的投影 f l , p r o j ( i , j ) f_{l,proj}^{\left( i, j \right)} fl,proj(i,j)，公式表示如下：

f l , p r o j ( i , j ) = f l ( i , j ) ⋅ f g ∣ f g ∣ 2 ⋅ f g f_{l,proj}^{\left( i, j \right)} = \frac{f_{l}^{\left( i, j \right)} \cdot f_{g} }{| f_{g} |^{2} } \cdot f_{g} fl,proj(i,j)=∣fg∣2fl(i,j)⋅fg⋅fg

其中 f l ( i , j ) ⋅ f g f_{l}^{\left( i, j \right)} \cdot f_{g} fl(i,j)⋅fg是点乘运算， ∣ f g ∣ 2 | f_{g} |^{2} ∣fg∣2是 f g f_{g} fg的 L 2 L_{2} L2范数。

f l ( i , j ) ⋅ f g = ∑ c = 1 C f l , c ( i , j ) f g , c ∣ f g ∣ 2 = ∑ c = 1 C ( f g , c ) 2 \begin{matrix} f_{l}^{\left( i, j \right)} \cdot f_{g} = \sum_{c=1}^{C} f_{l,c}^{\left( i, j \right)} f_{g,c} \\ | f_{g} |^{2} = \sum_{c=1}^{C} \left( f_{g,c} \right)^{2} \end{matrix} fl(i,j)⋅fg=∑c=1Cfl,c(i,j)fg,c∣fg∣2=∑c=1C(fg,c)2

正交分量是局部特征向量与其投影向量的差值， f l , o r t h i , j f_{l, orth}^{i,j} fl,orthi,j可以用公式表示为：

f l , o r t h i , j = f l ( i , j ) − f l , p r o j ( i , j ) f_{l, orth}^{i,j} = f_{l}^{\left( i,j \right)} - f_{l, proj}^{\left( i,j \right)} fl,orthi,j=fl(i,j)−fl,proj(i,j)

class OrthogonalFusion(nn.Module): def __init__(self): super().__init__() def forward(self, local_feat, global_feat): # 按行求f_g的l_2范数 global_feat_norm = torch.norm(global_feat, p=2, dim=1) # f_l * f_g 矩阵乘法 # torch.bmm 计算两个tensor的矩阵乘法 projection = torch.bmm(global_feat.unsqueeze(1), torch.flatten( local_feat, start_dim=2)) # f_l * f_g * f_g projection = torch.bmm(global_feat.unsqueeze( 2), projection).view(local_feat.size()) # (f_l * f_g * f_g) / (f_g * f_g) projection = projection / \ (global_feat_norm * global_feat_norm).view(-1, 1, 1, 1) # 正交分量 orthogonal_comp = local_feat - projection global_feat = global_feat.unsqueeze(-1).unsqueeze(-1) # 合并正交分量和全局特征 return torch.cat([global_feat.expand(orthogonal_comp.size()), orthogonal_comp], dim=1) 附录向量的投影

投影，指图形的影子投到一个面或者一条线上。

如下图所示，有两个向量 a ⃗ \vec{a} a 和 b ⃗ \vec{b} b ，向量 p ⃗ \vec{p} p 是向量 b ⃗ \vec{b} b 在向量 a ⃗ \vec{a} a 上的投影。

在这里插入图片描述

我们假设 p = x a p=xa p=xa，只要求出 x x x就可以得出向量 p p p。由于向量 a a a和向量 e e e垂直，所以 a ⋅ e = a T e = a T ( b − p ) = a T ( b − x a ) = 0 a \cdot e=a^{T}e=a^{T}\left( b-p \right)=a^{T}\left( b - xa \right) = 0 a⋅e=aTe=aT(b−p)=aT(b−xa)=0。那么 x = a T b a T a x=\frac{a^{T}b}{a^{T}a} x=aTaaTb，则 p = a T b a T a ⋅ a p =\frac{a^{T}b}{a^{T}a}\cdot a p=aTaaTb⋅a

softplus激活函数

softplus可以看做是ReLu激活函数的平滑。数学公式如下所示： S o f t p l u s ( x ) = l o g ( 1 + e x ) Softplus\left( x \right) = log \left( 1 + e^{x} \right) Softplus(x)=log(1+ex) 在这里插入图片描述

参考 Kaggle第一人 | 详细解读2021Google地标识别第一名解决方案DOLG论文激活函数（3）ReLU、softplus激活函数

【本文地址】

【图像检索】DOLG论文

【图像检索】DOLG论文

今日新闻

推荐新闻