GitHub

2024-06-01 12:57| 来源: 网络整理| 查看: 265

Open In Colab Hugging Face Spaces sd webui-colab Replicate Discord

Wenxuan Zhang *,1,2 Xiaodong Cun *,2 Xuan Wang 3 Yong Zhang 2 Xi Shen 2 Yu Guo1 Ying Shan 2 Fei Wang 1 1 Xi'an Jiaotong University 2 Tencent AI Lab 3 Ant Group CVPR 2023

sadtalker

TL;DR: single portrait image 🙎‍♂️ + audio 🎤 = talking head video 🎞.

Highlights

The license has been updated to Apache 2.0, and we've removed the non-commercial restriction

SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join: Discord

We've published a stable-diffusion-webui extension. Check out more details here. Demo Video

Full image mode is now available! More details...

still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815 still_e_n.mp4 full_body_2.bus_chinese_enhanced.mp4

Several new modes (Still, reference, and resize modes) are now available!

We're happy to see more community demos on bilibili, YouTube and X (#sadtalker).

Changelog

The previous changelog can be found here.

[2023.06.12]: Added more new features in WebUI extension, see the discussion here.

[2023.06.05]: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.

[2023.04.15]: Added a WebUI Colab notebook by @camenduru: sd webui-colab

[2023.04.12]: Added a more detailed WebUI installation document and fixed a problem when reinstalling.

[2023.04.12]: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in sd-webui-extension.

[2023.04.08]: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. This watermark has since been removed in a later release.

[2023.04.08]: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.

To-Do

We're tracking new updates in issue #280.

Troubleshooting

If you have any problems, please read our FAQs before opening an issue.

1. Installation.

Community tutorials: 中文Windows教程 (Chinese Windows tutorial) | 日本語コース (Japanese tutorial).

Linux/Unix

Install Anaconda, Python and git.

Creating the env and install the requirements.

git clone https://github.com/OpenTalker/SadTalker.git cd SadTalker conda create -n sadtalker python=3.8 conda activate sadtalker pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113 conda install ffmpeg pip install -r requirements.txt ### Coqui TTS is optional for gradio demo. ### pip install TTS Windows

A video tutorial in chinese is available here. You can also follow the following instructions:

Install Python 3.8 and check "Add Python to PATH". Install git manually or using Scoop: scoop install git. Install ffmpeg, following this tutorial or using scoop: scoop install ffmpeg. Download the SadTalker repository by running git clone https://github.com/Winfredy/SadTalker.git. Download the checkpoints and gfpgan models in the downloads section. Run start.bat from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started. macOS

A tutorial on installing SadTalker on macOS can be found here.

Docker, WSL, etc

Please check out additional tutorials here.

2. Download Models

You can run the following script on Linux/macOS to automatically download all the models:

bash scripts/download_models.sh

We also provide an offline patch (gfpgan/), so no model will be downloaded when generating.

Pre-Trained Models Google Drive GitHub Releases Baidu (百度云盘) (Password: sadt) GFPGAN Offline Patch Google Drive GitHub Releases Baidu (百度云盘) (Password: sadt) Model Details

Model explains:

New version Model Description checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker. checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker. checkpoints/SadTalker_V0.0.2_256.safetensors packaged sadtalker checkpoints of old version, 256 face render). checkpoints/SadTalker_V0.0.2_512.safetensors packaged sadtalker checkpoints of old version, 512 face render). gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan. Old version Model Description checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker. checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker. checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker. checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker. checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid. checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction. checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip. checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb. checkpoints/BFM 3DMM library file. checkpoints/hub Face detection models used in face alignment. gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.

The final folder will be shown as:

3. Quick Start

Please read our document on best practices and configuration tips

WebUI Demos

Online Demo: HuggingFace | SDWebUI-Colab | Colab

Local WebUI extension: Please refer to WebUI docs.

Local gradio demo (recommanded): A Gradio instance similar to our Hugging Face demo can be run locally:

## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced. python app_sadtalker.py

You can also start it more easily:

windows: just double click webui.bat, the requirements will be installed automatically. Linux/Mac OS: run bash webui.sh to start the webui. CLI usage Animating a portrait image from default config: python inference.py --driven_audio \ --source_image \ --enhancer gfpgan

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Full body/image Generation:

Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

python inference.py --driven_audio \ --source_image \ --result_dir \ --still \ --preprocess full \ --enhancer gfpgan

More examples and configuration and tips can be founded in the >>> best practice documents

【本文地址】

GitHub

GitHub

今日新闻

推荐新闻