怎么用ai恢复老照片

2023-03-15 20:26| 来源: 网络整理| 查看: 265

怎么用ai恢复老照片

Hi everybody! I’m a research engineer at the Mail.ru Group computer vision team. In this article, I’m going to tell a story of how we’ve created AI-based photo restoration project for old military photos. What is «photo restoration»? It consists of three steps:

大家好！我是Mail.ru集团计算机视觉团队的研究工程师。在本文中，我将讲一个故事，说明我们如何为旧军事照片创建基于AI的照片恢复项目。什么是“照片恢复”？它包括三个步骤：

we find all the image defects: fractures, scuffs, holes;

我们发现所有图像缺陷：断裂，磨损，破洞；

we inpaint the discovered defects, based on the pixel values around them;

我们根据发现的缺陷周围的像素值对其进行修补；

we colorize the image.

我们为图像着色。

Further, I’ll describe every step of photo restoration and tell you how we got our data, what nets we trained, what we accomplished, and what mistakes we made.

此外，我将描述照片恢复的每个步骤，并告诉您我们如何获取数据，培训了哪些网络，完成了哪些工作以及犯了哪些错误。

寻找缺陷 (Looking for defects)

We want to find all the pixels related to defects in an uploaded photo. First, we need to figure out what kind of pictures will people upload. We talked to the founders of «Immortal Regiment» project, a non-commercial organization storing the legacy photos of WW2, who shared their data with us. Upon analyzing it, we noticed that people upload mostly individual or group portraits with a moderate to a large number of defects.

我们要查找与上传的照片中的缺陷相关的所有像素。首先，我们需要弄清楚人们会上传什么样的图片。我们与“不朽军团”项目的创始人进行了交谈，该项目是一个非商业性组织，存储了WW2的旧照片，并与我们共享了数据。经过分析，我们注意到人们上传的个人或团体肖像大多带有中等到大量的缺陷。

Then we had to collect a training set. The training set for a segmentation task is an image and a mask where all the defects are marked. The easiest way to do it is to let the assessors create the segmentation masks. Of course, people know very well how to find defects, but that would take too long.

然后我们必须收集训练集。分割任务的训练集是标记所有缺陷的图像和蒙版。最简单的方法是让评估者创建细分蒙版。当然，人们非常了解如何找到缺陷，但这将花费很长时间。

It can take one hour or the whole workday to mark the defect pixels in one photo. Therefore, it’s not easy to collect a training set of more than 100 images in a few weeks. That’s why we tried to augment our data and created our own defects: we’d take a good photo, add defects using random walks on the image, and end up with a mask showing the image parts with the defects. Without augmentations, we’ve got 68 manually labeled photos in training set and 11 photos in the validation set.

标记一张照片中的缺陷像素可能需要一个小时或整个工作日。因此，在几周内收集超过100张图像的训练集并不容易。这就是为什么我们试图扩充数据并创建自己的缺陷的原因：我们将拍摄一张好照片，使用图像上的随机游走添加缺陷，最后得到一个遮罩，以显示带有缺陷的图像部分。不进行扩充，我们在训练集中有68张手动标记的照片，在验证集中有11张照片。

The most popular segmentation approach: take Unet with pre-trained encoder and minimize the sum of BCE (binary cross-entropy) and DICE (Sørensen–Dice coefficient).

最受欢迎的分割方法：将Unet与经过预训练的编码器结合使用，并最小化BCE( 二进制交叉熵 )和DICE( Sørensen–Dice系数 )之和。

What problems arise when we use this segmentation approach for our task?

当我们使用这种细分方法完成任务时会出现什么问题？

Even if it looks like there are tons of defects in the photo, that it’s very old and shabby, the area with defects is still much smaller than the undamaged one. To solve this issue, we can increase the positive class weight in BCE; an optimal weight would be the ratio of clean pixels to defective ones.

即使看起来照片中有很多缺陷，而且非常破旧，但缺陷区域仍然比未损坏的区域小得多。为了解决这个问题，我们可以增加BCE中的正班级权重。最佳权重应为干净像素与有缺陷像素的比率。

The second problem is that if we use an out-of-box Unet with pre-trained encoder (Albunet-18, for example), we lose lots of positional data. The first layer of Albunet-18 consists of a convolution with a kernel 5 and a stride that equals two. It allows the net to work fast. We traded-off the net operation time to have better defects localization: we removed max pooling after the first layer, decreased stride to 1 and decreased the convolution kernel to 3.

第二个问题是，如果我们使用带有预训练编码器的开箱即用的Unet(例如Albunet-18)，则会丢失很多位置数据。 Albunet-18的第一层由卷积组成，其内核为5，步幅等于2。它允许网络快速运行。我们权衡了净操作时间以实现更好的缺陷定位：在第一层之后删除了最大池，将步幅减小到1，并将卷积核减小到3。

If we work with small images by compressing them, for example, to 256 x 256 or 512 x 512 pixels, then small defects will disappear due to interpolation. Therefore, we need to work with larger images. We are currently segmenting defects in 1024 x 1024 sized photos in production. That’s why we had to train the net on big image crops. However, this causes problems with a small batch size on a single GPU.

如果我们通过将小图像压缩到例如256 x 256或512 x 512像素来处理小图像，则由于插值，小缺陷将消失。因此，我们需要处理较大的图像。我们目前正在生产中分割1024 x 1024尺寸的照片中的缺陷。这就是为什么我们必须在大图像作物上训练网络。但是，这会在单个GPU上产生小批量的问题。

During the training, we can fit about 20 images on one GPU. Because of that, we end up with inaccurate mean and standard deviation values in BatchNorm layers. We can solve this problem using In-place BatchNorm, that, on the one hand, saves the memory space, and on the other hand, has a Synchronized BatchNorm version, that synchronizes statistics across all GPUs. Now we calculate the mean and standard deviation values not for 20 images on a single GPU, but for 80 images from 4 GPUs. This improves the net convergence.

在培训期间，我们可以在一个GPU上容纳约20张图像。因此，我们最终在BatchNorm图层中获得了不准确的均值和标准差值。我们可以使用就地BatchNorm解决此问题，一方面可以节省内存空间，另一方面可以使用Synchronized BatchNorm版本，该版本可以在所有GPU之间同步统计信息。现在，我们不计算单个GPU上20张图像的平均值和标准偏差值，而是计算4个GPU上80张图像的平均值和标准偏差值。这改善了网络收敛。

Finally, upon increasing BCE weight, changing architecture, and using In-place BatchNorm, we made the segmentation better. However, it wouldn’t cost too much to do something even better by adding Test Time Augmentation. We can run the net once on an input picture, then mirror it and rerun the net to find all the small defects.

最后，在增加BCE权重，更改体系结构并使用就地BatchNorm后，我们使分段效果更好。但是，通过添加“测试时间增强”来做更好的事情并不会花费太多。我们可以在输入图片上运行一次网络，然后对其进行镜像，然后重新运行网络以查找所有小的缺陷。

The net converges in 18 hours on four GeForce 1080Ti. Inference takes 290 ms. It’s quite long, but that’s the price of our better-than-default performance. Validation DICE equals 0,35, and ROCAUC — 0,93.

在四台GeForce 1080Ti上，网络可以在18小时内收敛。推理需要290毫秒。这很长，但这就是我们优于默认性能的代价。验证DICE等于0.35，并且ROCAUC等于0.93。

图像修补 (Image inpainting)

Same with the segmentation task we used Unet. To do inpainting we’d upload an original image and a mask where we marked all the clean area with ones, and with zeros — all the pixels we want to inpaint. This is how we were collecting data: for any photo from an open-source image dataset, for example, OpenImagesV4, we add the defects similar to those we see in real life. Then we’d trained the net to restore the missing parts.

与使用Unet的细分任务相同。要进行修复，我们将上载原始图像和遮罩，并在其中用1和0(要修复的所有像素)标记所有干净区域。这就是我们收集数据的方式：对于来自开源图像数据集(例如OpenImagesV4)的任何照片，我们都会添加与现实生活中类似的缺陷。然后，我们训练了网络以恢复丢失的零件。

How can we modify Unet for this task?

我们如何修改此任务的Unet？

We can use partial convolution instead of an original one. The idea is that when we convolve an area with some kernel, we don’t take the defect pixels values into account. This makes the inpainting more precise. We show you an example from the recent NVIDIA paper. They used Unet with a default 2-dimensional convolution in the middle picture and a partial convolution — in the picture on the right.

我们可以使用部分卷积代替原始卷积。这个想法是，当我们将某个区域与某个内核进行卷积时，我们不会考虑缺陷像素值。这样可以使修复更加精确。我们向您展示最近NVIDIA论文中的一个例子。他们使用的Unet在中间图片中具有默认的二维卷积，而在右侧图片中具有部分卷积。

We trained the net for five days. On the last day, we froze BatchNorms to make the borders of the painted part less visible.

我们训练了网络五天。在最后一天，我们冻结了BatchNorms，以使被涂漆零件的边框不可见。

It takes the net 50 ms to process one 512 x 512 picture. Validation PSNR equals 26,4. However, you can’t totally rely on the metrics in this task. To choose the best model, we run several good models on valuation images, anonymized the results, and then voted for the ones we liked the most. That’s how we picked our final model.

处理一张512 x 512图片需要50毫秒的净时间。验证PSNR等于26,4。但是，您不能完全依赖此任务中的指标。为了选择最佳模型，我们在估值图像上运行了几个好的模型，将结果匿名化，然后投票选出我们最喜欢的模型。这就是我们选择最终模型的方式。

I’ve mentioned earlier that we artificially added some defects to the clean images. You should always track the maximum size of added defects during training; in a case when you feed an image with a very large defect to the net it’s never dealt with at training stage, the net will run wild and produce an inapplicable result. Therefore, if you need to fix large defects, augment your training set with them.

前面已经提到过，我们人为地给干净的图像添加了一些缺陷。您应该始终跟踪培训过程中所添加缺陷的最大大小；如果您将一个有很大缺陷的图像馈送到网上，而在训练阶段却从未处理过，则该网会疯狂运行并产生不适用的结果。因此，如果您需要修复较大的缺陷，请使用这些缺陷来扩充您的培训。

Here is the example of how our algorithm works:

这是我们的算法如何工作的示例：

显色 ( Colorization)

We segmented the defects and inpainted them; the third step — color reconstruction. Like I said before, there are lots of individual and group portraits among Immortal Regiment photos. We wanted our net to work well with them. We decided to come up with our own colorization since none of the existing services could color the portraits quickly and efficiently. We want our colorized photos to be more believable.

我们对缺陷进行了分割并修复。第三步-颜色重建。就像我之前说过的，“不朽军团”照片中有很多个人和团体肖像。我们希望我们的网络与他们合作良好。由于现有服务都无法快速，有效地为肖像上色，因此我们决定使用自己的颜色。我们希望我们的彩色照片更加可信。

GitHub has a popular repository for photo colorization. It does a good job but still has some issues. For example, it tends to paint clothes blue. That’s why we rejected it as well.

GitHub有一个流行的用于照片着色的存储库。它做得很好，但仍然存在一些问题。例如，它倾向于将衣服涂成蓝色。这就是为什么我们也拒绝它。

So, we decided to create an algorithm for image colorization. The most obvious idea: take a black-and-white image and predict three channels: red, green, and blue. However, we can make our job easier: work not with RGB color representation, but with YCbCr color representation. Y component is brightness (luma). An uploaded black-and-white image is Y channel, and we are going to reuse it. Now we need to predict Cb and Cr: Cb is the difference of blue color and brightness and Cr — the difference of red color and brightness.

因此，我们决定创建一种用于图像着色的算法。最明显的主意：拍摄黑白图像并预测三个通道：红色，绿色和蓝色。但是，我们可以使工作更加轻松：不使用RGB颜色表示，而是使用YCbCr颜色表示。 Y分量是亮度(亮度)。上传的黑白图像是Y通道，我们将重用它。现在我们需要预测Cb和Cr：Cb是蓝色和亮度的差，而Cr是红色和亮度的差。

Why did we choose YCbCr representation? A human eye is more sensitive changes in brightness than to color changes. That’s why we reuse Y component (brightness) which a human eye is most sensitive to and predict Cb and Cr that we might make a mistake with since we can’t notice color falsity very well. This specific characteristic was widely used at the dawn of color television when channel capacity wasn’t enough to transmit all the colors. The picture was transmitted in YCbCr, unchanged to the Y component, and Cb and Cr were reduced by half.

为什么选择YCbCr表示形式？人眼对亮度的变化比对颜色的变化更敏感。这就是为什么我们重复使用人眼最敏感的Y分量(亮度)并预测可能会犯错的Cb和Cr的原因，因为我们不能很好地注意到色差。当频道容量不足以传输所有颜色时，这种特殊的特性在彩色电视的黎明时被广泛使用。图片在YCbCr中传输，与Y分量相同，并且Cb和Cr减少了一半。

如何创建基线 (How to create a baseline )

We can take Unet with a pretrained encoder and minimize L1 Loss between the existing CbCr values and predicted ones. We want to color portraits and, therefore, besides OpenImages photos, we need more task-specific photos.

我们可以将Unet与经过预训练的编码器结合使用，并将现有CbCr值与预测值之间的L1损耗最小化。我们要给肖像上色，因此，除了OpenImages照片外，我们还需要更多特定于任务的照片。

Where can we get colorized photos of people dressed in a military uniform? There are people on the internet who colorize old photos as a hobby or for a price. They do it very carefully, trying to be very precise. When they color a uniform, shoulder boards, and medals, they refer to the archive materials, so the results of their work are trustworthy. All in all, we used 200 manually colorized pictures with people in military uniform on them.

我们在哪里可以获得穿着军装的人的彩色照片？互联网上有些人将旧照片着色是一种业余爱好或需要付出一定的代价。他们非常小心地做，试图做到非常精确。当他们为制服，护肩板和奖章上色时，他们会参考档案资料，因此他们的工作结果值得信赖。总而言之，我们使用了200张带有人工制服的彩色图片。

The other useful data source is The Workers’ and Peasants’ Red Army website. One of its founders had his picture taken in pretty much every World War 2 Soviet uniform available.

另一个有用的数据源是工人和农民的红军网站。它的一位创始人几乎每次都可以在第二次世界大战的苏联制服上拍摄他的照片。

In some pictures, he imitated the poses of people from the famous archive photos. It’s a good thing that his pictures have white background: it allowed us to augment the data very well by adding various natural objects in the background. We also used some regular portraits, supplementing them with insignias and other wartime attributes.

在某些图片中，他从著名的档案照片中模仿了人们的姿势。他的图片有白色背景是一件好事：通过在背景中添加各种自然物体，我们可以很好地扩充数据。我们还使用了一些常规的肖像，并用徽章和其他战时属性来补充它们。

We trained AlbuNet-50 — it’s a Unet that uses pretrained ResNet-50 as an encoder. The net started to give adequate results: the skin was pink, the eyes — gray-blue, the shoulder boards — yellowish. However, the problem was that it leaves some areas on photo untouched. This was caused by the fact that according to error L1 find such optimum where it’s better to do nothing than trying to predict some color.

我们培训了AlbuNet-50，这是一个Unet，它使用了预先培训的ResNet-50作为编码器。蚊帐开始产生足够的效果：皮肤是粉红色的，眼睛是灰蓝色，肩板是淡黄色。但是，问题在于它使照片上的某些区域保持不变。这是由于以下事实导致的：根据错误L1找到最佳位置，最好什么都不做，而不是尝试预测某种颜色。

Klimbim克林比姆(Klimbim)进行了手动着色

How can we solve this problem? We need a discriminator: a neural network that would receive an image and tell us whether it looks realistic or not. One of the pictures below is colored manually and the other — by our generator, AlbuNet-50. How does human distinguish manually and automatically colored photos? By looking at details. Can you tell where the automatically colorized photo by our baseline solution is?

我们如何解决这个问题？我们需要一个鉴别器：一个神经网络，它将接收图像并告诉我们它看起来是否逼真。下面的图片之一是手动着色的，另一张是由我们的发电机AlbuNet-50着色的。人类如何手动和自动区分彩色照片？通过查看细节。您能告诉我们基准解决方案自动将彩色照片放在哪里吗？

回答 (Answer) the picture on the left is colored manually, on the right — automatically. 左侧的图片是手动着色的，右侧的是自动着色的。

We use the discriminator from the Self-Attention GAN paper. It’s a small convolution net with so-called Self-Attention built in the top layers. It allows us to «pay more attention» to the image details. We also use spectral normalization. You can find more information in the abovementioned paper. We’ve trained the net with a combination of L1 loss and a loss from the discriminator. Now the net colorizes the image details better, and the background looks more consistent. One more example: on the left is the work by net trained with L1 loss only; on the right — with a combination of L1 discriminator losses.

我们使用Self-Attention GAN论文中的鉴别器。这是一个小型卷积网络，在顶层构建了所谓的“自我注意”。它使我们可以“更加关注”图像的细节。我们还使用频谱归一化。您可以在上述论文中找到更多信息。我们结合了L1损失和鉴别器损失来训练网络。现在，网使图像细节更好地着色，并且背景看起来更加一致。再举一个例子：左边是仅受L1损失训练的网络工作。右侧-加上L1鉴别器损耗。

Training process took two days on four GeForce 1080Ti. It takes the net 30 ms to process a 512 x 512 picture. Validation MSE — 34.4. Just like with inpainting, metrics you don’t want to rely on metrics. That’s why we picked six models with the best validation metrics and blindly voted for the best model.

在四台GeForce 1080Ti上进行了两天的培训。处理512 x 512图片需要30毫秒的净时间。验证MSE-34.4。就像修复一样，您也不想依赖指标。因此，我们选择了六个具有最佳验证指标的模型，并盲目投票赞成最佳模型。

When we've already created a production system and launched a website we continued experimenting and concluded that we better minimize not per-pixel L1 loss, but perceptual loss. To calculate it, we feed the net predictions and a ground-truthl photo to VGG-16 net, take the feature maps on the bottom layers and compare them with MSE. This approach paints more areas and gives more colorful results.

当我们已经创建了一个生产系统并启动了一个网站时，我们继续进行实验，得出的结论是，我们最好不要最小化每个像素的L1损失，而是最小化感知损失。为了进行计算，我们将网状预测和地面真实照片馈入VGG-16网络，在底层获取特征图，并将其与MSE进行比较。这种方法可绘制更多区域，并提供更多彩色效果。

回顾 (Recap )

Unet is a pretty cool model. At the first segmentation task, we faced a problem during the training, and work with high-resolution images and that’s why we use In-Place BatchNorm. At our second task (Inpainting) we used Partial Convolution instead of a default one, and it allowed us to get better results. When working on colorization, we added a small discriminator net which penalized the generator for unrealistic images. We also used a perceptual loss.

Unet是一个非常酷的模型。在第一个分割任务中，我们在训练过程中遇到了问题，并使用高分辨率图像，这就是为什么我们使用就地BatchNorm的原因。在第二个任务(修复)中，我们使用了部分卷积而不是默认卷积，它使我们可以获得更好的结果。在进行着色时，我们添加了一个小的鉴别器网，对生成器提供不真实的图像会造成损失。我们还使用了感知损失。

Second conclusion — assessors are essential. And not only during the creating segmentation masks stage but also for the final result validation. In the end, we give user three photos: an original image with inpainted defects, a colorized photo with inpainted defects and a simply colorized one in case the algorithm for defect search and inpainting got it wrong.

第二个结论-评估员至关重要。而且不仅在创建分割蒙版阶段，而且还在最终结果验证中。最后，我们为用户提供了三张照片：带有缺陷修复的原始图像，具有缺陷修复的彩色照片以及为缺陷搜索和修复算法出错而仅着色的简单图像。

We took some pictures from the War Album project and processed them over these neuronets. Here are the results we got:

我们从War Album项目中拍摄了一些照片，并通过这些神经网络对其进行了处理。这是我们得到的结果：

Moreover, here you can take a closer look at the original images and all the processing stages.

此外，在这里您可以仔细查看原始图像和所有处理阶段。

翻译自: https://habr.com/en/company/mailru/blog/459696/

怎么用ai恢复老照片

【本文地址】

怎么用ai恢复老照片

怎么用ai恢复老照片

今日新闻

推荐新闻