voice-changer/demo/MMVC_Trainer/notebook/03_MMVC_Interface.ipynb
2022-12-09 13:15:52 +09:00

331 lines
9.6 KiB
Plaintext
Executable File

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "U7-O9IRzogIx"
},
"source": [
"# モデルの精度を確認するためのインターフェース\n",
"\n",
"ver.2022/08/10"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dK2UHlpmoyLW"
},
"source": [
"## 1 概要\n",
"「Train_MMVC.ipynb」で学習したモデルでTTSと非リアルタイムのVCを行い、モデルの精度を検証します。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uRK7k2o7pL5f"
},
"source": [
"## 2 Google Drive をマウント\n",
"**Google Drive にアップロードした MMVC_Trainer を参照できるように、設定します。**\n",
"\n",
"「このノートブックに Google ドライブのファイルへのアクセスを許可しますか?」\n",
"\n",
"といったポップアップが表示されるので、「Google ドライブに接続」を押下し、google アカウントを選択して、「許可」を選択してください。\n",
"\n",
"成功すれば、下記メッセージが出ます。\n",
"```\n",
"Mounted at /content/drive/\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8s8Ozg6regVi"
},
"outputs": [],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6mD03f7LpR97"
},
"source": [
"cdコマンドを実行して、マウントしたGoogle Drive のMMVC_Trainerディレクトリに移動します。\n",
"\n",
"%cd 「MMVC_Trainerをgoogle driveにパップロードしたパス」\n",
"\n",
"としてください。\n",
"\n",
"正しいパスが指定されていれば\n",
"\n",
"-rw------- 1 root root 11780 Mar 4 16:53 attentions.py\n",
"\n",
"-rw------- 1 root root 4778 Mar 4 16:53 commons.py\n",
"\n",
"drwx------ 2 root root 4096 Mar 5 15:20 configs\n",
"\n",
"...といった感じに表示されるはずです。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yaNipgu-enJo"
},
"outputs": [],
"source": [
"%cd /content/drive/MyDrive/\n",
"!ls -la"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2CP8NcKZpZNA"
},
"source": [
"## 3 必要なライブラリのインストール\n",
"\n",
"何も考えず実行してください。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oOWo15aRewCk"
},
"outputs": [],
"source": [
"!apt-get install espeak\n",
"!pip install Cython==0.29.21\n",
"!pip install librosa==0.8.0\n",
"!pip install matplotlib==3.3.1\n",
"!pip install numpy==1.18.5\n",
"!pip install phonemizer==2.2.1\n",
"!pip install scipy==1.5.2\n",
"!pip install tensorboard==2.3.0\n",
"!pip install torch==1.8.0\n",
"!pip install torchvision==0.9.0\n",
"!pip install torchaudio==0.8.0\n",
"!pip install Unidecode==1.1.1\n",
"!pip install retry\n",
"!pip install resampy==0.2.2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "P3QYLvY4e38A"
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import IPython.display as ipd\n",
"\n",
"import os\n",
"import json\n",
"import math\n",
"import torch\n",
"from torch import nn\n",
"from torch.nn import functional as F\n",
"from torch.utils.data import DataLoader\n",
"\n",
"import commons\n",
"import utils\n",
"from data_utils import TextAudioLoader, TextAudioCollate, TextAudioSpeakerLoader, TextAudioSpeakerCollate\n",
"from models import SynthesizerTrn\n",
"from text.symbols import symbols\n",
"from text import text_to_sequence\n",
"\n",
"from scipy.io.wavfile import write\n",
"\n",
"\n",
"def get_text(text, hps):\n",
" text_norm = text_to_sequence(text, hps.data.text_cleaners)\n",
" if hps.data.add_blank:\n",
" text_norm = commons.intersperse(text_norm, 0)\n",
" text_norm = torch.LongTensor(text_norm)\n",
" return text_norm"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eAdIexFDoeym"
},
"source": [
"## 4 学習したモデルを読み込む"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "faOeDsFXpql9"
},
"source": [
"CONFIG_PATH に学習に利用したjsonファイルを`「./configs/****.json」`のように指定し、 \n",
"NET_PATHに学習したモデルを`「./configs/xxxx/G_*****.pth」`のように指定してください。\n",
"\n",
"\n",
"CONFIG_PATH = \"./configs/train_config_zundamon.json\" \n",
"CONFIG_PATH = \"./configs/train_config.json\"\n",
"\n",
"\n",
"\n",
"特に設定をいじっていない場合、CONFIG_PATHはどちらかになると思います。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Rm-3oWmarsZt"
},
"outputs": [],
"source": [
"CONFIG_PATH = \"./configs/train_config_zundamon.json\"\n",
"NET_PATH = \"./logs/20220306_24000/G_xxxxx.pth\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9EAsizUNsGAw"
},
"source": [
"指定したファイルをもとにモデルの読み込みを行います。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ecUDV8_ee8OP"
},
"outputs": [],
"source": [
"hps = utils.get_hparams_from_file(CONFIG_PATH)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UYrcO66SfCqD"
},
"outputs": [],
"source": [
"net_g = SynthesizerTrn(\n",
" len(symbols),\n",
" hps.data.filter_length // 2 + 1,\n",
" hps.train.segment_size // hps.data.hop_length,\n",
" n_speakers=hps.data.n_speakers,\n",
" **hps.model)\n",
"_ = net_g.eval()\n",
"\n",
"_ = utils.load_checkpoint(NET_PATH, net_g, None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "m-ho5133vpFi"
},
"source": [
"## 5 学習したモデルで非リアルタイムVCを行う"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uRtrz7GIwyGq"
},
"source": [
"非リアルタイムのVCを行います。\n",
"\n",
"ソース話者のIDとその話者の音声ファイルのパス、変換ターゲットの話者のIDを指定してください。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uEqm8yA6v9xz"
},
"outputs": [],
"source": [
"SOURCE_WAVFILE = \"dataset/textful/00_myvoice/wav/VOICEACTRESS100_001.wav\"\n",
"SOURCE_SPEAKER_ID = 107\n",
"TARGET_ID = 100"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NHy-FQAYxLOR"
},
"source": [
"実際にVCを行います。\n",
"\n",
"ここでの性能が悪い場合、学習不足か他に問題があります。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2vkotLtNY_s4"
},
"outputs": [],
"source": [
"with torch.no_grad():\n",
" dataset = TextAudioSpeakerLoader(hps.data.validation_files_notext, hps.data)\n",
" data = dataset.get_audio_text_speaker_pair([SOURCE_WAVFILE, SOURCE_SPEAKER_ID, \"a\"])\n",
" data = TextAudioSpeakerCollate()([data])\n",
" x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x for x in data]\n",
" sid_tgt1 = torch.LongTensor([TARGET_ID])\n",
" audio1 = net_g.voice_conversion(spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_tgt1)[0][0,0].data.cpu().float().numpy()\n",
"print(\"Original SID: %d\" % sid_src.item())\n",
"ipd.display(ipd.Audio(y[0].cpu().numpy(), rate=hps.data.sampling_rate))\n",
"print(\"Converted SID: %d\" % sid_tgt1.item())\n",
"ipd.display(ipd.Audio(audio1, rate=hps.data.sampling_rate))"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "03_MMVC_Interface.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3.9.6 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.6"
},
"vscode": {
"interpreter": {
"hash": "d3394867249fd41ee68869925f4586b97ae8a94f3c93a4c25403e9e75f272611"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}