Bert-vits2-v2.2新版本本地训练推理整合包(原神八重神子英文模型miko)

by Liu Yue/2023-12-18

    近日,Bert-vits2-v2.2如约更新,该新版本v2.2主要把Emotion 模型换用CLAP多模态模型,推理支持输入text prompt提示词和audio prompt提示语音来进行引导风格化合成,让推理音色更具情感特色,并且推出了新的预处理webuI,操作上更加亲民和接地气。

    更多情报请参见Bert-vits2官网:

https://github.com/fishaudio/Bert-VITS2/releases/tag/v2.2

    与此同时,基于FastApi的推理web界面项目也同步适配了Bert-vits2-v2.2版本,官网如下:

https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI

    本次我们基于此两个项目来克隆原神角色八重神子的英文语音模型miko。

    Bert-vits2-v2.2新的底模和情感模型

    首先克隆Bert-vits2-v2.2官方项目:

git clone https://github.com/fishaudio/Bert-VITS2/tree/v2.2

    安装依赖:

pip3 install -r requirements.txt

    这里注意是v2.2的tag分支,因为官方随时都在更新,主分支可能会存在bug。

    进入项目的目录:

cd /Bert-VITS2

    随后下载新的底模和情感模型,下载地址:

https://openi.pcl.ac.cn/Stardust_minus/Bert-VITS2/modelmanage/show_model

    

    将新的情感模型clap-hatsat-fused放入到项目的emotional目录,结构如下:

E:\work\Bert-VITS2-v22\emotional>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
├───clap-htsat-fused
│ .gitattributes
│ config.json
│ merges.txt
│ preprocessor_config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│ vocab.json

└───wav2vec2-large-robust-12-ft-emotion-msp-dim
.gitattributes
config.json
LICENSE
preprocessor_config.json
pytorch_model.bin
README.md
vocab.json

    注意,wav2vec2-large-robust-12-ft-emotion-msp-dim是Bert-vits2-v2.1的情感模型,也需要保留,具体请移步:义无反顾马督工,Bert-vits2V210复刻马督工实践(Python3.10) , 这里不再赘述。

    至此,新模型就配置好了。

    Bert-vits2-v2.2模型训练

    首先下载训练集,以原神角色八重神子的英文配音为例子,数据集下载地址:

https://github.com/AI-Hobbyist/Genshin_Datasets

    随后新建miko角色目录

mkdir miko

    将语音标注文件以esd.list命名,放入miko目录。

    同时将分片语音素材放入raw目录。

    最后新建miko/configs/config.json配置文件:

{
"train": {
"log_interval": 50,
"eval_interval": 50,
"seed": 42,
"epochs": 1000,
"learning_rate": 0.0002,
"betas": [
0.8,
0.99
],
"eps": 1e-09,
"batch_size": 6,
"fp16_run": false,
"lr_decay": 0.99995,
"segment_size": 16384,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0,
"skip_optimizer": false,
"freeze_ZH_bert": false,
"freeze_JP_bert": false,
"freeze_EN_bert": false
},
"data": {
"training_files": "data/miko/train.list",
"validation_files": "data/miko/val.list",
"max_wav_value": 32768.0,
"sampling_rate": 44100,
"filter_length": 2048,
"hop_length": 512,
"win_length": 2048,
"n_mel_channels": 128,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 1,
"cleaned_text": true,
"spk2id": {
"miko": 0
}
},
"model": {
"use_spk_conditioned_encoder": true,
"use_noise_scaled_mas": true,
"use_mel_posterior_encoder": false,
"use_duration_discriminator": true,
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [
3,
7,
11
],
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"upsample_rates": [
8,
8,
2,
2,
2
],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [
16,
16,
8,
2,
2
],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
},
"version": "2.2"
}

    这里注意"version": "2.2",即版本号为最新的v2.2。

    其他参数根据当前的设备环境酌情调整即可。

    随后启动预处理页面:

python3 webui_preprocess.py

    访问http://127.0.0.1:7860/:


    按照页面的步骤进行操作即可,简单且方便。

    操作完之后,运行训练命令:

python3 train_ms.py

    训练好的模型放在data/miko/models目录,结构如下:

E:\work\Bert-VITS2-v22\Data\miko\models>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
│ DUR_0.pth
│ DUR_100.pth
│ DUR_150.pth
│ DUR_50.pth
│ D_0.pth
│ D_100.pth
│ D_150.pth
│ D_50.pth
│ events.out.tfevents.1702457087.ly.13044.0
│ events.out.tfevents.1702458207.ly.12416.0
│ githash
│ G_0.pth
│ G_100.pth
│ G_150.pth
│ G_50.pth
│ train.log

└───eval
events.out.tfevents.1702457087.ly.13044.1
events.out.tfevents.1702458207.ly.12416.1

    至此,训练环节结束。

    Bert-vits2-v2.2模型推理

    推理我们使用Bert-vits2-UI项目的页面,克隆web项目:

git clone https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI

    将Web项目放入Bert-vits2-v2.2的根目录中,目录结构如下:

E:\work\Bert-VITS2-v22_lilith\Web>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
│ index.html

├───assets
│ index-21bc6a28.css
│ index-402c0217.js

└───img
helps1.png
helps2.png
Hiyori.ico

    这里包含主页面、样式文件以及JS文件,基于Hiyori。

    随后启动推理页面:

python3 server_fastapi.py

    访问:http://127.0.0.1:5000/:


    加载模型进行推理即可。

    此外,还可以基于FastAPI的接口进行推理,换句话说,发送http请求即可获取推理音频,接口参数如下:

{
"openapi": "3.1.0",
"info": {
"title": "FastAPI",
"version": "0.1.0"
},
"paths": {
"/": {
"get": {
"summary": "Index",
"operationId": "index__get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/voice": {
"post": {
"summary": "Voice",
"description": "语音接口,若需要上传参考音频请仅使用post请求",
"operationId": "voice_voice_post",
"parameters": [
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "模型ID",
"title": "Model Id"
},
"description": "模型ID"
},
{
"name": "speaker_name",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "说话人名",
"title": "Speaker Name"
},
"description": "说话人名"
},
{
"name": "speaker_id",
"in": "query",
"required": false,
"schema": {
"type": "integer",
"description": "说话人id,与speaker_name二选一",
"title": "Speaker Id"
},
"description": "说话人id,与speaker_name二选一"
},
{
"name": "sdp_ratio",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "SDP/DP混合比",
"default": 0.2,
"title": "Sdp Ratio"
},
"description": "SDP/DP混合比"
},
{
"name": "noise",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "感情",
"default": 0.2,
"title": "Noise"
},
"description": "感情"
},
{
"name": "noisew",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "音素长度",
"default": 0.9,
"title": "Noisew"
},
"description": "音素长度"
},
{
"name": "length",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "语速",
"default": 1,
"title": "Length"
},
"description": "语速"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "语言",
"title": "Language"
},
"description": "语言"
},
{
"name": "auto_translate",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自动翻译",
"default": false,
"title": "Auto Translate"
},
"description": "自动翻译"
},
{
"name": "auto_split",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自动切分",
"default": false,
"title": "Auto Split"
},
"description": "自动切分"
},
{
"name": "emotion",
"in": "query",
"required": false,
"schema": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string"
},
{
"type": "null"
}
],
"description": "emo",
"title": "Emotion"
},
"description": "emo"
}
],
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"$ref": "#/components/schemas/Body_voice_voice_post"
}
}
}
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
},
"get": {
"summary": "Voice",
"description": "语音接口",
"operationId": "voice_voice_get",
"parameters": [
{
"name": "text",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "输入文字",
"title": "Text"
},
"description": "输入文字"
},
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "模型ID",
"title": "Model Id"
},
"description": "模型ID"
},
{
"name": "speaker_name",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "说话人名",
"title": "Speaker Name"
},
"description": "说话人名"
},
{
"name": "speaker_id",
"in": "query",
"required": false,
"schema": {
"type": "integer",
"description": "说话人id,与speaker_name二选一",
"title": "Speaker Id"
},
"description": "说话人id,与speaker_name二选一"
},
{
"name": "sdp_ratio",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "SDP/DP混合比",
"default": 0.2,
"title": "Sdp Ratio"
},
"description": "SDP/DP混合比"
},
{
"name": "noise",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "感情",
"default": 0.2,
"title": "Noise"
},
"description": "感情"
},
{
"name": "noisew",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "音素长度",
"default": 0.9,
"title": "Noisew"
},
"description": "音素长度"
},
{
"name": "length",
"in": "query",
"required": false,
"schema": {
"type": "number",
"description": "语速",
"default": 1,
"title": "Length"
},
"description": "语速"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "语言",
"title": "Language"
},
"description": "语言"
},
{
"name": "auto_translate",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自动翻译",
"default": false,
"title": "Auto Translate"
},
"description": "自动翻译"
},
{
"name": "auto_split",
"in": "query",
"required": false,
"schema": {
"type": "boolean",
"description": "自动切分",
"default": false,
"title": "Auto Split"
},
"description": "自动切分"
},
{
"name": "emotion",
"in": "query",
"required": false,
"schema": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string"
},
{
"type": "null"
}
],
"description": "emo",
"title": "Emotion"
},
"description": "emo"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/info": {
"get": {
"summary": "Get Loaded Models Info",
"description": "获取已加载模型信息",
"operationId": "get_loaded_models_info_models_info_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/models/delete": {
"get": {
"summary": "Delete Model",
"description": "删除指定模型",
"operationId": "delete_model_models_delete_get",
"parameters": [
{
"name": "model_id",
"in": "query",
"required": true,
"schema": {
"type": "integer",
"description": "删除模型id",
"title": "Model Id"
},
"description": "删除模型id"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/add": {
"get": {
"summary": "Add Model",
"description": "添加指定模型:允许重复添加相同路径模型,且不重复占用内存",
"operationId": "add_model_models_add_get",
"parameters": [
{
"name": "model_path",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "添加模型路径",
"title": "Model Path"
},
"description": "添加模型路径"
},
{
"name": "config_path",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "添加模型配置文件路径,不填则使用./config.json或../config.json",
"title": "Config Path"
},
"description": "添加模型配置文件路径,不填则使用./config.json或../config.json"
},
{
"name": "device",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "推理使用设备",
"default": "cuda",
"title": "Device"
},
"description": "推理使用设备"
},
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "模型默认语言",
"default": "ZH",
"title": "Language"
},
"description": "模型默认语言"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_unloaded": {
"get": {
"summary": "Get Unloaded Models Info",
"description": "获取未加载模型",
"operationId": "get_unloaded_models_info_models_get_unloaded_get",
"parameters": [
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜索根目录",
"default": "Data",
"title": "Root Dir"
},
"description": "搜索根目录"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/models/get_local": {
"get": {
"summary": "Get Local Models Info",
"description": "获取全部本地模型",
"operationId": "get_local_models_info_models_get_local_get",
"parameters": [
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜索根目录",
"default": "Data",
"title": "Root Dir"
},
"description": "搜索根目录"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/status": {
"get": {
"summary": "Get Status",
"description": "获取电脑运行状态",
"operationId": "get_status_status_get",
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
}
}
}
},
"/tools/translate": {
"get": {
"summary": "Translate",
"description": "翻译",
"operationId": "translate_tools_translate_get",
"parameters": [
{
"name": "texts",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "待翻译文本",
"title": "Texts"
},
"description": "待翻译文本"
},
{
"name": "to_language",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "翻译目标语言",
"title": "To Language"
},
"description": "翻译目标语言"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/random_example": {
"get": {
"summary": "Random Example",
"description": "获取一个随机音频+文本,用于对比,音频会从本地目录随机选择。",
"operationId": "random_example_tools_random_example_get",
"parameters": [
{
"name": "language",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "指定语言,未指定则随机返回",
"title": "Language"
},
"description": "指定语言,未指定则随机返回"
},
{
"name": "root_dir",
"in": "query",
"required": false,
"schema": {
"type": "string",
"description": "搜索根目录",
"default": "Data",
"title": "Root Dir"
},
"description": "搜索根目录"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
},
"/tools/get_audio": {
"get": {
"summary": "Get Audio",
"operationId": "get_audio_tools_get_audio_get",
"parameters": [
{
"name": "path",
"in": "query",
"required": true,
"schema": {
"type": "string",
"description": "本地音频路径",
"title": "Path"
},
"description": "本地音频路径"
}
],
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
}
}
}
},
"components": {
"schemas": {
"Body_voice_voice_post": {
"properties": {
"text": {
"type": "string",
"title": "Text"
},
"reference_audio": {
"type": "string",
"format": "binary",
"title": "Reference Audio"
}
},
"type": "object",
"required": [
"text"
],
"title": "Body_voice_voice_post"
},
"HTTPValidationError": {
"properties": {
"detail": {
"items": {
"$ref": "#/components/schemas/ValidationError"
},
"type": "array",
"title": "Detail"
}
},
"type": "object",
"title": "HTTPValidationError"
},
"ValidationError": {
"properties": {
"loc": {
"items": {
"anyOf": [
{
"type": "string"
},
{
"type": "integer"
}
]
},
"type": "array",
"title": "Location"
},
"msg": {
"type": "string",
"title": "Message"
},
"type": {
"type": "string",
"title": "Error Type"
}
},
"type": "object",
"required": [
"loc",
"msg",
"type"
],
"title": "ValidationError"
}
}
}
}

    最后奉上Bert-vits2-v2.2本地训练推理整合包:

https://pan.baidu.com/s/1OVX9seRwZR6bZ-xsE_nRLg?pwd=v3uc

    与众乡亲同飨。