开源项目复现

2024-02-28 01:47| 来源: 网络整理| 查看: 265

目录 DreamTalk一、安装二、下载检查点三、开始推理四、运行推理代码时会报的错1、连接不到huggingface2、No audio I/O backend is available3、没有找到jonatasgrosman/wav2vec2-large-xlsr-53-english相关文件五、查看结果六、初步测试结果

DreamTalk

项目简介：让静态头像说话、唱歌。

项目地址：https://github.com/ali-vilab/dreamtalk

一、安装 conda create -n dreamtalk python=3.7.0 conda activate dreamtalk pip install -r requirements.txt conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge conda update ffmpeg pip install urllib3==1.26.6 pip install transformers==4.28.1 pip install dlib 二、下载检查点

1、进入：https://modelscope.cn/models/damo/dreamtalk/files 2、点击 checkpoints 文件夹，会看到checkpoints 文件夹下的两个文件，分别点击这两个文件，进入详情页，下载到本地在这里插入图片描述

在这里插入图片描述把这两个文件放到 checkpoints 文件夹中

三、开始推理

官方示例-英文

python inference_for_demo_video.py --wav_path data/audio/acknowledgement_english.m4a --style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/uncropped/male_face.png --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_neutral_level1_001@male_face

官方示例-中文

python inference_for_demo_video.py --wav_path data/audio/acknowledgement_chinese.m4a --style_clip_path data/style_clip/3DMM/M030_front_surprised_level3_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/cropped/zp1.png --disable_img_crop --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_surprised_level3_001@zp1 四、运行推理代码时会报的错 1、连接不到huggingface OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like jonatasgrosman/wav2vec2-large-xlsr-53-english is not the path to a directory containing a file named preprocessor_config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

解决方案：连上VPN就好了

2、No audio I/O backend is available RuntimeError: No audio I/O backend is available.

解决方案：没有可用的音频 I/O 后端，就安装。 windows电脑安装这个包即可：pip install soundfile

参考：https://github.com/ali-vilab/dreamtalk/issues/2 pip install soundfile (win) pip install sox (linux)

3、没有找到jonatasgrosman/wav2vec2-large-xlsr-53-english相关文件 OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like jonatasgrosman/wav2vec2-large-xlsr-53-english is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

解决方案： 1）进入：https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main 2）下载这四个文件在这里插入图片描述 3）在项目根目录下创建文件夹“jonatasgrosman/wav2vec2-large-xlsr-53-english”，把这4个文件放进去

五、查看结果

最后几行是这样的，表示运行成功了：

... video:183kB audio:142kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.590172% [libx264 @ 000002851b21c840] frame I:2 Avg QP:18.42 size: 6248 [libx264 @ 000002851b21c840] frame P:111 Avg QP:22.18 size: 1185 [libx264 @ 000002851b21c840] frame B:301 Avg QP:25.02 size: 143 [libx264 @ 000002851b21c840] consecutive B-frames: 0.5% 7.2% 1.4% 90.8% [libx264 @ 000002851b21c840] mb I I16..4: 6.1% 67.0% 27.0% [libx264 @ 000002851b21c840] mb P I16..4: 0.0% 0.3% 0.0% P16..4: 38.7% 20.8% 9.5% 0.0% 0.0% skip:30.7% [libx264 @ 000002851b21c840] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 30.0% 1.6% 0.2% direct: 0.2% skip:68.0% L0:40.6% L1:55.1% BI: 4.3% [libx264 @ 000002851b21c840] 8x8 transform intra:68.3% inter:67.6% [libx264 @ 000002851b21c840] coded y,uvDC,uvAC intra: 77.6% 78.6% 48.3% inter: 6.8% 4.4% 0.1% [libx264 @ 000002851b21c840] i16 v,h,dc,p: 44% 11% 31% 13% [libx264 @ 000002851b21c840] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 21% 18% 4% 7% 8% 6% 7% 6% [libx264 @ 000002851b21c840] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 21% 10% 6% 8% 7% 8% 6% 8% [libx264 @ 000002851b21c840] i8c dc,h,v,p: 47% 20% 23% 10% [libx264 @ 000002851b21c840] Weighted P-Frames: Y:0.0% UV:0.0% [libx264 @ 000002851b21c840] ref P L0: 59.0% 17.7% 16.2% 7.1% [libx264 @ 000002851b21c840] ref B L0: 88.6% 8.1% 3.3% [libx264 @ 000002851b21c840] ref B L1: 97.3% 2.7% [libx264 @ 000002851b21c840] kb/s:90.41 [aac @ 000002851b2b4380] Qavg: 52903.660

去根目录下的output_video文件夹查看你生成的视频。

六、初步测试结果

试了几个，效果不太好，包括官方示例。只能上传256x256的正面人像，不上传256x256也行，它会给你截，我上传了一个半身照会报错找不到人脸。这还好，不用半身就行。但是照片是侧脸、或者就是正面头像，脸部也会变形、模糊不像照片里的样子，我试的照片就完全换了一个人。口型有时候对有时候不对。任重道远，加油。

【本文地址】

开源项目复现

开源项目复现

今日新闻

推荐新闻