Merge pull request #80 from w-okada/dev

Dev
2025-03-13 19:34:02 +03:00 · 2022-11-03 19:23:06 +09:00 · 2022-11-03 19:23:06 +09:00 · af0ebaccd1
commit af0ebaccd1
parent b540c76524 5f540cd162
16 changed files with 1512 additions and 107 deletions
--- a/VoiceChangerDemo_Simple.ipynb
+++ b/VoiceChangerDemo_Simple.ipynb
@ -0,0 +1,463 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "collapsed_sections": [],
+      "authorship_tag": "ABX9TyN7lDdQ3iB8T1SI4BKFzkWz",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "gpuClass": "standard"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/w-okada/voice-changer/blob/dev/VoiceChangerDemo_Simple.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Voice Changer Simple (デモ版)\n",
+        "---\n",
+        "\n",
+        "このノートはVoice ChangerをColab上で動かすデモ版です。\n",
+        "\n",
+        "正式版はローカルPCのDocker上で動かすアプリケーションです。\n",
+        "\n",
+        "正式版は、多くの場合より少ないタイムラグで滑らかに音声を変換できます。\n",
+        "\n",
+        "詳細な使用方法はこちらの[リポジトリ](https://github.com/w-okada/voice-changer)からご確認ください。\n"
+      ],
+      "metadata": {
+        "id": "Lbbmx_Vjl0zo"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# GPUを確認\n",
+        "GPUを用いたほうが高速に処理が行えます。\n",
+        "\n",
+        "下記のコマンドでGPUが確認できない場合は、上のメニューから\n",
+        "\n",
+        "「ランタイム」→「ランタイムの変更」→「ハードウェア アクセラレータ」\n",
+        "\n",
+        "でGPUを選択してください。"
+      ],
+      "metadata": {
+        "id": "oUKi1NYMmXrr"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (1) GPUの確認\n",
+        "!nvidia-smi"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "vV1t7PBRm-o6",
+        "outputId": "2ab5d79e-0fe1-4e48-9fb4-8a61399e0b60"
+      },
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Sun Oct 30 10:03:39 2022       \n",
+            "+-----------------------------------------------------------------------------+\n",
+            "| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |\n",
+            "|-------------------------------+----------------------+----------------------+\n",
+            "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+            "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
+            "|                               |                      |               MIG M. |\n",
+            "|===============================+======================+======================|\n",
+            "|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |\n",
+            "| N/A   35C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |\n",
+            "|                               |                      |                  N/A |\n",
+            "+-------------------------------+----------------------+----------------------+\n",
+            "                                                                               \n",
+            "+-----------------------------------------------------------------------------+\n",
+            "| Processes:                                                                  |\n",
+            "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
+            "|        ID   ID                                                   Usage      |\n",
+            "|=============================================================================|\n",
+            "|  No running processes found                                                 |\n",
+            "+-----------------------------------------------------------------------------+\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# 使用するモデルとコンフィグファイルの指定\n",
+        "\n",
+        "使用するトレーニング済みのモデルと、トレーニングで使用したコンフィグファイルのパスを指定してください。\n",
+        "\n",
+        "多くの場合はGoogle Driveに格納されているファイルを使用すると思います。その場合は、下の(2-2)のセルを実行してドライブをマウントしてください"
+      ],
+      "metadata": {
+        "id": "mHvGrgaWnIPA"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# # (2-1) 使用するモデルとコンフィグファイルの指定\n",
+        "# CONFIG=\"/content/drive/MyDrive/VoiceChanger/config.json\"\n",
+        "# MODEL=\"/content/drive/MyDrive/VoiceChanger/G_326000.pth\""
+      ],
+      "metadata": {
+        "id": "nSXATMWYb4Ik"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "2wxD-gRSMU5R",
+        "outputId": "dabd982a-87c7-44d1-b9e8-986691190771"
+      },
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Mounted at /content/drive\n"
+          ]
+        }
+      ],
+      "source": [
+        "# # (2-2) Google Driveのマウント\n",
+        "# from google.colab import drive\n",
+        "# drive.mount('/content/drive')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# リポジトリのクローン\n",
+        "リポジトリをクローンします"
+      ],
+      "metadata": {
+        "id": "sLBfykjBnjWc"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (3) リポジトリのクローン\n",
+        "!git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3 /MMVC_Trainer\n",
+        "!git clone --depth 1 https://github.com/w-okada/voice-changer.git -b dev\n",
+        "%cd voice-changer/demo/\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "86wTFmqsNMnD",
+        "outputId": "a52d5b0e-826e-445d-cd3a-4a42cbd52212"
+      },
+      "execution_count": 36,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "fatal: destination path '/MMVC_Trainer' already exists and is not an empty directory.\n",
+            "Cloning into 'voice-changer'...\n",
+            "remote: Enumerating objects: 88, done.\u001b[K\n",
+            "remote: Counting objects: 100% (88/88), done.\u001b[K\n",
+            "remote: Compressing objects: 100% (74/74), done.\u001b[K\n",
+            "remote: Total 88 (delta 14), reused 57 (delta 6), pack-reused 0\u001b[K\n",
+            "Unpacking objects: 100% (88/88), done.\n",
+            "/content/voice-changer/demo/voice-changer/demo/voice-changer/demo/voice-changer/demo/voice-changer/demo\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# ファイルの配置\n",
+        "アプリケーションの挙動を記した設定ファイルをコピーします(4-1)。(4-2)はコピーした設定ファイルを表示しています。もしかしたらうまく動かないときに役立つかもしれません。"
+      ],
+      "metadata": {
+        "id": "jmDY8W_fnuSi"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (4-1) 設定ファイルの配置\n",
+        "!cp ../template/setting_mmvc_colab.json ../frontend/dist/assets/setting.json\n"
+      ],
+      "metadata": {
+        "id": "Bn4kV8TgXp8i"
+      },
+      "execution_count": 37,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (4-2) 設定ファイルの確認\n",
+        "!cat ../frontend/dist/assets/setting.json\n"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "pjxPsOOaXXTj",
+        "outputId": "425a36dd-fbdc-4f55-825e-a2c7026f2aab"
+      },
+      "execution_count": 38,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "{\n",
+            "    \"app_title\": \"voice-changer\",\n",
+            "    \"majar_mode\": \"colab\",\n",
+            "    \"voice_changer_server_url\": \"/test\",\n",
+            "    \"sample_rate\": 48000,\n",
+            "    \"buffer_size\": 1024,\n",
+            "    \"prefix_chunk_size\": 36,\n",
+            "    \"chunk_size\": 36,\n",
+            "    \"speaker_ids\": [100, 107, 101, 102, 103],\n",
+            "    \"speaker_names\": [\"ずんだもん\", \"user\", \"そら\", \"めたん\", \"つむぎ\"],\n",
+            "    \"src_id\": 107,\n",
+            "    \"dst_id\": 100,\n",
+            "    \"vf_enable\": true,\n",
+            "    \"voice_changer_mode\": \"realtime\",\n",
+            "    \"gpu\": 0,\n",
+            "    \"available_gpus\": [-1, 0, 1, 2, 3, 4],\n",
+            "    \"avatar\": {\n",
+            "        \"enable_avatar\": true,        \n",
+            "        \"motion_capture_face\": true,\n",
+            "        \"motion_capture_upperbody\": true,\n",
+            "        \"lip_overwrite_with_voice\": true,\n",
+            "        \"avatar_url\": \"./assets/vrm/zundamon/zundamon.vrm\",\n",
+            "        \"backgournd_image_url\": \"./assets/images/bg_natural_sougen.jpg\",\n",
+            "        \"background_color\": \"#0000dd\",\n",
+            "        \"chroma_key\": \"#0000dd\",\n",
+            "        \"avatar_canvas_size\": [1280, 720],\n",
+            "        \"screen_canvas_size\": [1280, 720]\n",
+            "    },\n",
+            "    \"advance\": {\n",
+            "        \"avatar_draw_skip_rate\": 3,\n",
+            "        \"screen_draw_skip_rate\": 3,\n",
+            "        \"visualizer_draw_skip_rate\": 3,\n",
+            "        \"cross_fade_lower_value\": 0.1,\n",
+            "        \"cross_fade_offset_rate\": 0.3,\n",
+            "        \"cross_fade_end_rate\": 0.6,\n",
+            "        \"cross_fade_type\": 2\n",
+            "    }\n",
+            "}\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# モジュールのインストール\n",
+        "\n",
+        "必要なモジュールをインストールします。"
+      ],
+      "metadata": {
+        "id": "8Na2PbLZSWgZ"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (5) 設定ファイルの確認\n",
+        "!apt-get install -y espeak libsndfile1-dev &> /dev/null\n",
+        "!pip install unidecode &> /dev/null\n",
+        "!pip install phonemizer &> /dev/null\n",
+        "!pip install retry &> /dev/null\n",
+        "!pip install python-socketio &> /dev/null\n",
+        "!pip install fastapi &> /dev/null\n",
+        "!pip install python-multipart &> /dev/null\n",
+        "!pip install uvicorn &> /dev/null\n",
+        "!pip install websockets &> /dev/null\n",
+        "!pip install pyOpenSSL &> /dev/null\n"
+      ],
+      "metadata": {
+        "id": "LwZAAuqxX7yY"
+      },
+      "execution_count": 44,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# サーバの起動\n",
+        "\n",
+        "サーバを起動します。(6-1)\n",
+        "\n",
+        "サーバの起動状況を確認します。(6-2) \n",
+        "\n",
+        "このセルは繰り返し実行することになるのでCtrl+Retでセルを実行してください。\n",
+        "\n",
+        "アクセスできるようになるまで、１～２分かかるようです。コーヒーでも飲みに行きましょう。\n",
+        "\n",
+        "下記のようなテキストが表示されたら起動完了です。\n",
+        "\n",
+        "**`DEBUG:asyncio:Using selector: EpollSelector`**\n",
+        "\n",
+        "```\n",
+        "    Phase name:__main__\n",
+        "    PHASE3:__main__\n",
+        "    PHASE1:__main__\n",
+        "Start MMVC SocketIO Server\n",
+        "    CONFIG:None, MODEL:None\n",
+        "DEBUG:asyncio:Using selector: EpollSelector\n",
+        "```\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "-_2OcN9Borke"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (6-1) サーバの起動\n",
+        "import random\n",
+        "PORT = 10000 + random.randint(1, 9999)\n",
+        "LOG_FILE = f\"LOG_FILE_{PORT}\"\n",
+        "\n",
+        "get_ipython().system_raw(f'python3 MMVCServerSIO.py -p {PORT} --colab True >{LOG_FILE} 2>&1 &')\n",
+        "#print(f\"PORT:{PORT}, LOG_FILE:{LOG_FILE}\")"
+      ],
+      "metadata": {
+        "id": "G-nMdPxEW1rc",
+        "outputId": "ed5fc2d9-f1c5-4aa3-df8d-e306de2e2a30",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        }
+      },
+      "execution_count": 40,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "PORT:19751, LOG_FILE:LOG_FILE_19751\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (6-2) サーバの起動確認 (Ctrl+Retで実行)\n",
+        "!tail -20 {LOG_FILE}"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "chu06KpAjEK6",
+        "outputId": "e6b67606-1279-49aa-e276-4e2bb83284c1"
+      },
+      "execution_count": 45,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "\u001b[32m    Phase name:__main__\u001b[0m\n",
+            "\u001b[32m    PHASE3:__main__\u001b[0m\n",
+            "\u001b[32m    PHASE1:__main__\u001b[0m\n",
+            "\u001b[17mStart MMVC SocketIO Server\u001b[0m\n",
+            "\u001b[34m    CONFIG:None, MODEL:None\u001b[0m\n",
+            "DEBUG:asyncio:Using selector: EpollSelector\n",
+            "\u001b[32m    Phase name:MMVCServerSIO\u001b[0m\n",
+            "\u001b[32m    PHASE3:MMVCServerSIO\u001b[0m\n",
+            "File saved to: G_326000.pth\n",
+            "Load: config.json, G_326000.pth\n",
+            "INFO:root:Loaded checkpoint 'model_upload_dir/G_326000.pth' (iteration 1136)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# プロキシを起動\n",
+        "ウェブサーバへのアクセスをするためのプロキシを起動します。\n",
+        "\n",
+        "表示されたURLをクリックして開くと別タブでアプリが開きます。\n",
+        "\n",
+        "Colabなので、ロードにある程度時間がかかります(30秒くらい)。"
+      ],
+      "metadata": {
+        "id": "WhxcFLQEpctq"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# (7) プロキシを起動\n",
+        "from google.colab.output import eval_js\n",
+        "proxy = eval_js( \"google.colab.kernel.proxyPort(\" + str(PORT) + \")\" )\n",
+        "print(f\"{proxy}front/\")"
+      ],
+      "metadata": {
+        "id": "nkRjZm95l87C",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 34
+        },
+        "outputId": "bbc830e9-209a-4b71-891d-8cf78cf3077d"
+      },
+      "execution_count": 43,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "https://w6x1mbngbj-496ff2e9c6d22116-19751-colab.googleusercontent.com/front/\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "axkt5BjhoiPV"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/demo/MMVCServerSIO.py
+++ b/demo/MMVCServerSIO.py
@ -0,0 +1,379 @@
+import sys, os, struct, argparse, logging, shutil,  base64, traceback
+sys.path.append("/MMVC_Trainer")
+sys.path.append("/MMVC_Trainer/text")
+
+import uvicorn
+from fastapi import FastAPI, UploadFile, File, Form
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+from fastapi.encoders import jsonable_encoder
+from fastapi import FastAPI, HTTPException
+from fastapi.staticfiles import StaticFiles
+from pydantic import BaseModel
+
+from scipy.io.wavfile import write, read
+
+import socketio
+from distutils.util import strtobool
+from datetime import datetime
+
+import torch
+import numpy as np
+
+from mods.ssl import create_self_signed_cert
+from mods.VoiceChanger import VoiceChanger
+# from mods.Whisper import Whisper
+
+class UvicornSuppressFilter(logging.Filter):
+    def filter(self, record):
+        return False
+
+logger = logging.getLogger("uvicorn.error")
+logger.addFilter(UvicornSuppressFilter())
+# logger.propagate = False
+logger = logging.getLogger("multipart.multipart")
+logger.propagate = False
+
+
+
+class VoiceModel(BaseModel):
+    gpu: int
+    srcId: int
+    dstId: int
+    timestamp: int
+    prefixChunkSize: int
+    buffer: str
+
+
+class MyCustomNamespace(socketio.AsyncNamespace): 
+    def __init__(self, namespace):
+        super().__init__(namespace)
+
+    def loadModel(self, config, model):
+        if hasattr(self, 'voiceChanger') == True:
+            self.voiceChanger.destroy()
+        self.voiceChanger = VoiceChanger(config, model)
+
+    # def loadWhisperModel(self, model):
+    #     self.whisper = Whisper()
+    #     self.whisper.loadModel("tiny")
+    #     print("load")
+
+    def changeVoice(self, gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData):
+        # if hasattr(self, 'whisper') == True:
+        #     self.whisper.addData(unpackedData)
+
+        return self.voiceChanger.on_request(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
+
+    # def transcribe(self):
+    #     if hasattr(self, 'whisper') == True:
+    #         self.whisper.transcribe(0)
+    #     else:
+    #         print("whisper not found")
+
+
+    def on_connect(self, sid, environ):
+        # print('[{}] connet sid : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , sid))
+        pass
+
+    async def on_request_message(self, sid, msg): 
+        # print("on_request_message", torch.cuda.memory_allocated())
+        gpu = int(msg[0])
+        srcId = int(msg[1])
+        dstId = int(msg[2])
+        timestamp = int(msg[3])
+        prefixChunkSize = int(msg[4])
+        data = msg[5]
+        # print(srcId, dstId, timestamp)
+        unpackedData = np.array(struct.unpack('<%sh'%(len(data) // struct.calcsize('<h') ), data))
+        audio1 = self.changeVoice(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
+
+        bin = struct.pack('<%sh'%len(audio1), *audio1)
+
+        await self.emit('response',[timestamp, bin])
+
+    def on_disconnect(self, sid):
+        # print('[{}] disconnect'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
+        pass;
+
+
+def setupArgParser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-p", type=int, default=8080, help="port")
+    parser.add_argument("-c", type=str, help="path for the config.json")
+    parser.add_argument("-m", type=str, help="path for the model file")
+    parser.add_argument("--https", type=strtobool, default=False, help="use https")
+    parser.add_argument("--httpsKey", type=str, default="ssl.key", help="path for the key of https")
+    parser.add_argument("--httpsCert", type=str, default="ssl.cert", help="path for the cert of https")
+    parser.add_argument("--httpsSelfSigned", type=strtobool, default=True, help="generate self-signed certificate")
+    parser.add_argument("--colab", type=strtobool, default=False, help="run on colab")
+    return parser
+
+def printMessage(message, level=0):
+    if level == 0:
+        print(f"\033[17m{message}\033[0m")
+    elif level == 1:
+        print(f"\033[34m    {message}\033[0m")
+    elif level == 2:
+        print(f"\033[32m    {message}\033[0m")
+    else:
+        print(f"\033[47m    {message}\033[0m")
+
+global app_socketio
+global app_fastapi
+
+parser = setupArgParser()
+args = parser.parse_args()
+
+printMessage(f"Phase name:{__name__}", level=2)
+thisFilename = os.path.basename(__file__)[:-3]
+
+
+if __name__ == thisFilename or args.colab == True:
+    printMessage(f"PHASE3:{__name__}", level=2)
+    PORT = args.p
+    CONFIG = args.c
+    MODEL  = args.m
+
+    app_fastapi = FastAPI()
+    app_fastapi.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+
+    app_fastapi.mount("/front", StaticFiles(directory="../frontend/dist", html=True), name="static")
+
+    sio = socketio.AsyncServer(
+        async_mode='asgi',
+        cors_allowed_origins='*'
+    )
+    namespace = MyCustomNamespace('/test')
+    sio.register_namespace(namespace) 
+    if CONFIG and MODEL:
+        namespace.loadModel(CONFIG, MODEL)
+    # namespace.loadWhisperModel("base")
+    
+    
+    app_socketio = socketio.ASGIApp(
+        sio, 
+        other_asgi_app=app_fastapi,
+        static_files={
+            '/assets/icons/github.svg': {
+                'filename':'../frontend/dist/assets/icons/github.svg',
+                'content_type':'image/svg+xml'
+                },
+            '': '../frontend/dist',
+            '/': '../frontend/dist/index.html',
+        }
+    )
+
+    @app_fastapi.get("/api/hello")
+    async def index():
+        return {"result": "Index"}
+    
+
+    UPLOAD_DIR = "model_upload_dir"
+    os.makedirs(UPLOAD_DIR, exist_ok=True)
+    # Can colab receive post request "ONLY" at root path?
+    @app_fastapi.post("/upload_model_file")
+    async def upload_file(configFile:UploadFile = File(...), modelFile: UploadFile = File(...)):
+        if configFile and modelFile:
+            for file in [modelFile, configFile]:
+                filename = file.filename
+                fileobj = file.file
+                upload_dir = open(os.path.join(UPLOAD_DIR, filename),'wb+')
+                shutil.copyfileobj(fileobj, upload_dir)
+                upload_dir.close()
+            namespace.loadModel(os.path.join(UPLOAD_DIR, configFile.filename), os.path.join(UPLOAD_DIR, modelFile.filename))                
+            return {"uploaded files": f"{configFile.filename}, {modelFile.filename} "}
+        return {"Error": "uploaded file is not found."}
+
+
+    @app_fastapi.post("/upload_file")
+    async def post_upload_file(
+        file:UploadFile = File(...), 
+        filename: str = Form(...)
+        ):
+
+        if file and filename:
+            fileobj = file.file
+            upload_dir = open(os.path.join(UPLOAD_DIR, filename),'wb+')
+            shutil.copyfileobj(fileobj, upload_dir)
+            upload_dir.close()
+            return {"uploaded files": f"{filename} "}
+        return {"Error": "uploaded file is not found."}
+
+    @app_fastapi.post("/load_model")
+    async def post_load_model(
+        modelFilename: str = Form(...),
+        modelFilenameChunkNum: int = Form(...),
+        configFilename: str = Form(...)
+        ):
+
+        target_file_name = modelFilename
+        with open(os.path.join(UPLOAD_DIR, target_file_name), "ab") as target_file:
+            for i in range(modelFilenameChunkNum):
+                filename = f"{modelFilename}_{i}"
+                chunk_file_path = os.path.join(UPLOAD_DIR,filename)
+                stored_chunk_file = open(chunk_file_path, 'rb')
+                target_file.write(stored_chunk_file.read())
+                stored_chunk_file.close()
+                os.unlink(chunk_file_path)
+        target_file.close()
+        print(f'File saved to: {target_file_name}')
+
+        print(f'Load: {configFilename}, {target_file_name}')
+        namespace.loadModel(os.path.join(UPLOAD_DIR, configFilename), os.path.join(UPLOAD_DIR, target_file_name))
+        return {"File saved to": f"{target_file_name}"}
+
+
+
+    @app_fastapi.get("/transcribe")
+    def get_transcribe():
+        try:
+            namespace.transcribe()
+        except Exception as e:
+            print("TRANSCRIBE PROCESSING!!!! EXCEPTION!!!", e)
+            print(traceback.format_exc())
+            return str(e) 
+
+    @app_fastapi.post("/test")
+    async def post_test(voice:VoiceModel):
+        try:
+            # print("POST REQUEST PROCESSING....")
+            gpu = voice.gpu
+            srcId = voice.srcId
+            dstId = voice.dstId
+            timestamp = voice.timestamp
+            prefixChunkSize = voice.prefixChunkSize
+            buffer = voice.buffer
+            wav = base64.b64decode(buffer)
+
+            if wav==0:
+                samplerate, data=read("dummy.wav")
+                unpackedData = data
+            else:
+                unpackedData = np.array(struct.unpack('<%sh'%(len(wav) // struct.calcsize('<h') ), wav))
+                write("logs/received_data.wav", 24000, unpackedData.astype(np.int16))
+
+            changedVoice = namespace.changeVoice(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
+            changedVoiceBase64 = base64.b64encode(changedVoice).decode('utf-8')
+
+            data = {
+                "gpu":gpu,
+                "srcId":srcId,
+                "dstId":dstId,
+                "timestamp":timestamp,
+                "prefixChunkSize":prefixChunkSize,
+                "changedVoiceBase64":changedVoiceBase64
+            }
+
+            json_compatible_item_data = jsonable_encoder(data)
+            
+            return JSONResponse(content=json_compatible_item_data)
+        except Exception as e:
+            print("REQUEST PROCESSING!!!! EXCEPTION!!!", e)
+            print(traceback.format_exc())
+            return str(e)
+
+
+if __name__ == '__mp_main__':
+    printMessage(f"PHASE2:{__name__}", level=2)
+
+
+if __name__ == '__main__':
+    printMessage(f"PHASE1:{__name__}", level=2)
+
+    PORT = args.p
+    CONFIG = args.c
+    MODEL  = args.m
+
+    printMessage(f"Start MMVC SocketIO Server", level=0)
+    printMessage(f"CONFIG:{CONFIG}, MODEL:{MODEL}", level=1)
+
+    if args.colab == False:
+      if os.getenv("EX_PORT"):
+          EX_PORT = os.environ["EX_PORT"]
+          printMessage(f"External_Port:{EX_PORT} Internal_Port:{PORT}", level=1)
+      else:
+          printMessage(f"Internal_Port:{PORT}", level=1)
+
+      if os.getenv("EX_IP"):
+          EX_IP = os.environ["EX_IP"]
+          printMessage(f"External_IP:{EX_IP}", level=1)
+
+      # HTTPS key/cert作成
+      if args.https and args.httpsSelfSigned == 1:
+          # HTTPS(おれおれ証明書生成) 
+          os.makedirs("./key", exist_ok=True)
+          key_base_name = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+          keyname = f"{key_base_name}.key"
+          certname = f"{key_base_name}.cert"
+          create_self_signed_cert(certname, keyname, certargs=
+                              {"Country": "JP",
+                                  "State": "Tokyo",
+                                  "City": "Chuo-ku",
+                                  "Organization": "F",
+                                  "Org. Unit": "F"}, cert_dir="./key")
+          key_path = os.path.join("./key", keyname)
+          cert_path = os.path.join("./key", certname)
+          printMessage(f"protocol: HTTPS(self-signed), key:{key_path}, cert:{cert_path}", level=1)
+      elif args.https and args.httpsSelfSigned == 0:
+          # HTTPS 
+          key_path = args.httpsKey
+          cert_path = args.httpsCert
+          printMessage(f"protocol: HTTPS, key:{key_path}, cert:{cert_path}", level=1)
+      else:
+          # HTTP
+          printMessage(f"protocol: HTTP", level=1)
+
+      # アドレス表示
+      if args.https == 1:
+          printMessage(f"open https://<IP>:<PORT>/ with your browser.", level=0)
+      else:
+          printMessage(f"open http://<IP>:<PORT>/ with your browser.", level=0)
+      
+      if EX_PORT and EX_IP and args.https == 1:
+          printMessage(f"In many cases it is one of the following", level=1)
+          printMessage(f"https://localhost:{EX_PORT}/", level=1)
+          for ip in EX_IP.strip().split(" "):
+              printMessage(f"https://{ip}:{EX_PORT}/", level=1)
+      elif EX_PORT and EX_IP and args.https == 0:
+          printMessage(f"In many cases it is one of the following", level=1)
+          printMessage(f"http://localhost:{EX_PORT}/", level=1)
+
+
+    # サーバ起動
+    if args.https:
+        # HTTPS サーバ起動 
+        uvicorn.run(
+            f"{os.path.basename(__file__)[:-3]}:app_socketio", 
+            host="0.0.0.0", 
+            port=int(PORT), 
+            reload=True, 
+            ssl_keyfile = key_path,
+            ssl_certfile = cert_path,
+            log_level="critical"
+        )
+    else:
+        # HTTP サーバ起動
+        if args.colab == True:
+          uvicorn.run(
+              f"{os.path.basename(__file__)[:-3]}:app_fastapi", 
+              host="0.0.0.0", 
+              port=int(PORT), 
+              log_level="critical"
+              )
+        else:
+          uvicorn.run(
+              f"{os.path.basename(__file__)[:-3]}:app_socketio", 
+              host="0.0.0.0", 
+              port=int(PORT), 
+              reload=True,
+              log_level="critical"
+          )
+
+
--- a/demo/mods/VoiceChanger.py
+++ b/demo/mods/VoiceChanger.py
@ -0,0 +1,87 @@
+import torch
+from scipy.io.wavfile import write, read
+import numpy as np
+import struct, traceback
+
+import utils 
+import commons 
+from models import SynthesizerTrn
+from text.symbols import symbols
+from data_utils import TextAudioSpeakerLoader, TextAudioSpeakerCollate
+from mel_processing import spectrogram_torch
+from text import text_to_sequence, cleaned_text_to_sequence
+
+
+class VoiceChanger():
+    def __init__(self, config, model):
+        self.hps = utils.get_hparams_from_file(config)
+        self.net_g = SynthesizerTrn(
+                len(symbols),
+                self.hps.data.filter_length // 2 + 1,
+                self.hps.train.segment_size // self.hps.data.hop_length,
+                n_speakers=self.hps.data.n_speakers,
+                **self.hps.model)
+        self.net_g.eval()
+        self.gpu_num = torch.cuda.device_count()
+        utils.load_checkpoint( model, self.net_g, None)
+
+        text_norm = text_to_sequence("a", self.hps.data.text_cleaners)
+        text_norm = commons.intersperse(text_norm, 0)
+        self.text_norm = torch.LongTensor(text_norm)
+        self.audio_buffer = torch.zeros(1, 0)
+        self.prev_audio = np.zeros(1)
+
+        print(f"VoiceChanger Initialized (GPU_NUM:{self.gpu_num})")
+
+    def destroy(self):
+        del self.net_g
+
+    def on_request(self, gpu, srcId, dstId, timestamp, prefixChunkSize, wav): 
+        unpackedData = wav
+        convertSize = unpackedData.shape[0] + (prefixChunkSize * 512)
+
+        try:
+
+            audio = torch.FloatTensor(unpackedData.astype(np.float32))
+            audio_norm = audio /self.hps.data.max_wav_value
+            audio_norm = audio_norm.unsqueeze(0)
+            self.audio_buffer = torch.cat([self.audio_buffer, audio_norm], axis=1)
+            audio_norm = self.audio_buffer[:,-convertSize:]
+            self.audio_buffer = audio_norm
+
+            spec = spectrogram_torch(audio_norm, self.hps.data.filter_length,
+                    self.hps.data.sampling_rate, self.hps.data.hop_length, self.hps.data.win_length,
+                    center=False)
+            spec = torch.squeeze(spec, 0)
+            sid = torch.LongTensor([int(srcId)])
+            
+            data =  (self.text_norm , spec, audio_norm, sid)
+            data = TextAudioSpeakerCollate()([data])
+
+            if gpu<0 or self.gpu_num==0 :
+                with torch.no_grad():
+                    x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x.cpu() for x in data]
+                    sid_tgt1 = torch.LongTensor([dstId]).cpu()
+                    audio1 = (self.net_g.cpu().voice_conversion(spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_tgt1)[0][0,0].data * self.hps.data.max_wav_value).cpu().float().numpy()
+            else:
+                with torch.no_grad():
+                    x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x.cuda(gpu) for x in data]
+                    sid_tgt1 = torch.LongTensor([dstId]).cuda(gpu)
+                    audio1 = (self.net_g.cuda(gpu).voice_conversion(spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_tgt1)[0][0,0].data * self.hps.data.max_wav_value).cpu().float().numpy()
+
+            # if len(self.prev_audio) > unpackedData.shape[0]:
+            #     prevLastFragment = self.prev_audio[-unpackedData.shape[0]:]
+            #     curSecondLastFragment = audio1[-unpackedData.shape[0]*2:-unpackedData.shape[0]]
+            #     print("prev, cur", prevLastFragment.shape, curSecondLastFragment.shape)
+            # self.prev_audio = audio1
+            # print("self.prev_audio", self.prev_audio.shape)
+
+            audio1 = audio1[-unpackedData.shape[0]*2:]
+
+
+        except Exception as e:
+            print("VC PROCESSING!!!! EXCEPTION!!!", e)
+            print(traceback.format_exc())
+        
+        audio1 = audio1.astype(np.int16)
+        return audio1
--- a/demo/mods/Whisper.py
+++ b/demo/mods/Whisper.py
@ -0,0 +1,36 @@
+import whisper
+import numpy as np
+import torchaudio
+from scipy.io.wavfile import write
+
+_MODELS = {
+    "tiny": "/whisper/tiny.pt",
+    "base": "/whisper/base.pt",
+    "small": "/whisper/small.pt",
+    "medium": "/whisper/medium.pt",
+}
+
+
+class Whisper():
+    def __init__(self):
+        self.storedSizeFromTry = 0
+
+    def loadModel(self, model):
+        # self.model = whisper.load_model(_MODELS[model], device="cpu")
+        self.model = whisper.load_model(_MODELS[model])
+        self.data = np.zeros(1).astype(np.float)
+    
+    def addData(self, unpackedData):
+        self.data = np.concatenate([self.data, unpackedData], 0)
+
+    def transcribe(self, audio):
+        received_data_file = "received_data.wav"
+        write(received_data_file, 24000, self.data.astype(np.int16))
+        source, sr = torchaudio.load(received_data_file) 
+        target = torchaudio.functional.resample(source, 24000, 16000)
+        result = self.model.transcribe(received_data_file)
+        print("WHISPER1:::", result["text"])
+        print("WHISPER2:::", result["segments"])
+        self.data = np.zeros(1).astype(np.float)
+        return result["text"]
+
--- a/demo/mods/ssl.py
+++ b/demo/mods/ssl.py
@ -0,0 +1,24 @@
+import os
+from OpenSSL import crypto
+
+def create_self_signed_cert(certfile, keyfile, certargs, cert_dir="."):
+    C_F = os.path.join(cert_dir, certfile)
+    K_F = os.path.join(cert_dir, keyfile)
+    if not os.path.exists(C_F) or not os.path.exists(K_F):
+        k = crypto.PKey()
+        k.generate_key(crypto.TYPE_RSA, 2048)
+        cert = crypto.X509()
+        cert.get_subject().C = certargs["Country"]
+        cert.get_subject().ST = certargs["State"]
+        cert.get_subject().L = certargs["City"]
+        cert.get_subject().O = certargs["Organization"]
+        cert.get_subject().OU = certargs["Org. Unit"]
+        cert.get_subject().CN = 'Example'
+        cert.set_serial_number(1000)
+        cert.gmtime_adj_notBefore(0)
+        cert.gmtime_adj_notAfter(315360000)
+        cert.set_issuer(cert.get_subject())
+        cert.set_pubkey(k)
+        cert.sign(k, 'sha1')
+        open(C_F, "wb").write(crypto.dump_certificate(crypto.FILETYPE_PEM, cert))
+        open(K_F, "wb").write(crypto.dump_privatekey(crypto.FILETYPE_PEM, k))
--- a/demo/serverSIO.py
+++ b/demo/serverSIO.py
@ -22,7 +22,12 @@ from mel_processing import spectrogram_torch
 from text import text_to_sequence, cleaned_text_to_sequence

 class MyCustomNamespace(socketio.Namespace): 
-    def __init__(self, namespace, config, model):
+    def __init__(self, namespace):
+        super().__init__(namespace)
+        self.gpu_num = torch.cuda.device_count()
+        print("GPU_NUM:",self.gpu_num)
+
+    def __init__old(self, namespace, config, model):
        super().__init__(namespace)
        self.hps =utils.get_hparams_from_file(config)
        self.net_g = SynthesizerTrn(
@ -36,12 +41,37 @@ class MyCustomNamespace(socketio.Namespace):
        print("GPU_NUM:",self.gpu_num)
        utils.load_checkpoint( model, self.net_g, None)

+    def loadModel(self, config, model):
+        self.hps =utils.get_hparams_from_file(config)
+        print("before DELETE:", torch.cuda.memory_allocated())
+        if hasattr(self, 'net_g') == True:
+            print("DELETE MODEL:", torch.cuda.memory_allocated())
+            del self.net_g
+        print("before load", torch.cuda.memory_allocated())
+        self.net_g = SynthesizerTrn(
+                len(symbols),
+                self.hps.data.filter_length // 2 + 1,
+                self.hps.train.segment_size // self.hps.data.hop_length,
+                n_speakers=self.hps.data.n_speakers,
+                **self.hps.model)
+        self.net_g.eval()
+        utils.load_checkpoint( model, self.net_g, None)
+        print(torch.cuda.memory_allocated())
+        print("after load", torch.cuda.memory_allocated())
+
+
+
    def on_connect(self, sid, environ):
        print('[{}] connet sid : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , sid))
        # print('[{}] connet env : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , environ))

+    def on_load_model(self, sid, msg):
+        print("on_load_model")
+        print(msg)
+        pass
+
    def on_request_message(self, sid, msg): 
-        # print("MESSGaa", msg)
+        print("on_request_message", torch.cuda.memory_allocated())
        gpu = int(msg[0])
        srcId = int(msg[1])
        dstId = int(msg[2])
@ -223,7 +253,17 @@ if __name__ == '__main__':

    # SocketIOセットアップ
    sio = socketio.Server(cors_allowed_origins='*') 
-    sio.register_namespace(MyCustomNamespace('/test', CONFIG, MODEL)) 
+    namespace = MyCustomNamespace('/test')
+    sio.register_namespace(namespace) 
+    print("loadmodel1:")
+    namespace.loadModel(CONFIG, MODEL)
+    print("loadmodel2:")
+    namespace.loadModel(CONFIG, MODEL)
+    print("loadmodel3:")
+    namespace.loadModel(CONFIG, MODEL)
+    print("loadmodel4:")
+    namespace.loadModel(CONFIG, MODEL)
+    print("loadmodel5:")
    app = socketio.WSGIApp(sio,static_files={
        '': '../frontend/dist',
        '/': '../frontend/dist/index.html',
--- a/demo/setup.sh
+++ b/demo/setup.sh
@ -12,32 +12,17 @@ if [[ -e ./setting.json ]]; then
  echo "カスタムセッティングを使用"
  cp ./setting.json ../frontend/dist/assets/setting.json
 else
-  if [ "${TYPE}" = "SOFT_VC" ] ; then
-    cp ../frontend/dist/assets/setting_softvc.json ../frontend/dist/assets/setting.json
-  elif [ "${TYPE}" = "SOFT_VC_FAST_API" ] ; then
-    cp ../frontend/dist/assets/setting_softvc_colab.json ../frontend/dist/assets/setting.json
-  else
-    cp ../frontend/dist/assets/setting_mmvc.json ../frontend/dist/assets/setting.json  
-  fi
+  cp ../frontend/dist/assets/setting_mmvc.json ../frontend/dist/assets/setting.json  
 fi


 # 起動
-if [ "${TYPE}" = "SOFT_VC" ] ; then
-  echo "SOFT_VCを起動します"
-  python3 SoftVcServerSIO.py $PARAMS 2>stderr.txt
-elif [ "${TYPE}" = "SOFT_VC_VERBOSE" ] ; then
-  echo "SOFT_VCを起動します(verbose)"
-  python3 SoftVcServerSIO.py $PARAMS 
-elif [ "${TYPE}" = "SOFT_VC_FAST_API" ] ; then
-  echo "SOFT_VC_FAST_APIを起動します"
-  python3 SoftVcServerFastAPI.py 8080 docker
-elif [ "${TYPE}" = "MMVC" ] ; then
+if [ "${TYPE}" = "MMVC" ] ; then
  echo "MMVCを起動します"
-  python3 serverSIO.py $PARAMS 2>stderr.txt
+  python3 MMVCServerSIO.py $PARAMS 2>stderr.txt
 elif [ "${TYPE}" = "MMVC_VERBOSE" ] ; then
  echo "MMVCを起動します(verbose)"
-  python3 serverSIO.py $PARAMS
+  python3 MMVCServerSIO.py $PARAMS
 fi


--- a/frontend/dist/assets/setting.json
+++ b/frontend/dist/assets/setting.json
@ -6,21 +6,44 @@
    "buffer_size": 1024,
    "prefix_chunk_size": 24,
    "chunk_size": 24,
-    "speaker_ids": [100, 107, 101, 102, 103],
-    "speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
+    "speakers": [
+        {
+            "id": 100,
+            "name": "ずんだもん"
+        },
+        {
+            "id": 107,
+            "name": "user"
+        },
+        {
+            "id": 101,
+            "name": "そら"
+        },
+        {
+            "id": 102,
+            "name": "めたん"
+        },
+        {
+            "id": 103,
+            "name": "つむぎ"
+        }
+    ],
    "src_id": 107,
    "dst_id": 100,
    "vf_enable": true,
    "voice_changer_mode": "realtime",
    "gpu": 0,
    "available_gpus": [-1, 0, 1, 2, 3, 4],
+    "screen": {
+        "enable_screen": true,
+        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
+    },
    "avatar": {
-        "enable_avatar": true,
-        "motion_capture_face": true,
-        "motion_capture_upperbody": true,
-        "lip_overwrite_with_voice": true,
+        "enable_avatar": false,
+        "motion_capture_face": false,
+        "motion_capture_upperbody": false,
+        "lip_overwrite_with_voice": false,
        "avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
-        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
        "background_color": "#0000dd",
        "chroma_key": "#0000dd",
        "avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
        "cross_fade_offset_rate": 0.3,
        "cross_fade_end_rate": 0.6,
        "cross_fade_type": 2
+    },
+    "transcribe": {
+        "lang": "日本語(ja-JP)",
+        "expire_time": 5
    }
 }
--- a/frontend/dist/assets/setting_mmvc.json
+++ b/frontend/dist/assets/setting_mmvc.json
@ -6,21 +6,44 @@
    "buffer_size": 1024,
    "prefix_chunk_size": 24,
    "chunk_size": 24,
-    "speaker_ids": [100, 107, 101, 102, 103],
-    "speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
+    "speakers": [
+        {
+            "id": 100,
+            "name": "ずんだもん"
+        },
+        {
+            "id": 107,
+            "name": "user"
+        },
+        {
+            "id": 101,
+            "name": "そら"
+        },
+        {
+            "id": 102,
+            "name": "めたん"
+        },
+        {
+            "id": 103,
+            "name": "つむぎ"
+        }
+    ],
    "src_id": 107,
    "dst_id": 100,
    "vf_enable": true,
    "voice_changer_mode": "realtime",
    "gpu": 0,
    "available_gpus": [-1, 0, 1, 2, 3, 4],
+    "screen": {
+        "enable_screen": true,
+        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
+    },
    "avatar": {
-        "enable_avatar": true,
-        "motion_capture_face": true,
-        "motion_capture_upperbody": true,
-        "lip_overwrite_with_voice": true,
+        "enable_avatar": false,
+        "motion_capture_face": false,
+        "motion_capture_upperbody": false,
+        "lip_overwrite_with_voice": false,
        "avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
-        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
        "background_color": "#0000dd",
        "chroma_key": "#0000dd",
        "avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
        "cross_fade_offset_rate": 0.3,
        "cross_fade_end_rate": 0.6,
        "cross_fade_type": 2
+    },
+    "transcribe": {
+        "lang": "日本語(ja-JP)",
+        "expire_time": 5
    }
 }
--- a/frontend/dist/index.js
+++ b/frontend/dist/index.js
--- a/start2.sh
+++ b/start2.sh
@ -1,7 +1,7 @@
 #!/bin/bash
 set -eu

-DOCKER_IMAGE=dannadori/voice-changer:20221028_220714
+DOCKER_IMAGE=dannadori/voice-changer:20221103_180651
 #DOCKER_IMAGE=voice-changer


@ -75,28 +75,11 @@ elif [ "${MODE}" = "MMVC" ]; then
        # -p ${EX_PORT}:8080 $DOCKER_IMAGE /bin/bash

    fi
-
-elif [ "${MODE}" = "SOFT_VC" ]; then
-    if [ "${USE_GPU}" = "on" ]; then
-        echo "Start Soft-vc"
-
-        docker run -it --gpus all --shm-size=128M \
-        -v `pwd`/vc_resources:/resources \
-        -e LOCAL_UID=$(id -u $USER) \
-        -e LOCAL_GID=$(id -g $USER) \
-        -e EX_IP="`hostname -I`" \
-        -e EX_PORT=${EX_PORT} \
-        -e VERBOSE=${VERBOSE} \
-        -p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
-    else
-        echo "Start Soft-vc withou GPU is not supported"
-    fi
-
 else
    echo "
 usage: 
    $0 <MODE> <params...>
-    MODE: select one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
+    MODE: select one of ['MMVC_TRAIN', 'MMVC']
 " >&2
 fi

--- a/start_v0.1.sh
+++ b/start_v0.1.sh
@ -0,0 +1,314 @@
+#!/bin/bash
+set -eu
+
+DOCKER_IMAGE=dannadori/voice-changer:20221028_220714
+#DOCKER_IMAGE=voice-changer
+
+
+MODE=$1
+PARAMS=${@:2:($#-1)}
+
+### DEFAULT VAR ###
+DEFAULT_EX_PORT=18888
+DEFAULT_USE_GPU=on # on|off
+DEFAULT_VERBOSE=off # on|off
+
+### ENV VAR ###
+EX_PORT=${EX_PORT:-${DEFAULT_EX_PORT}}
+USE_GPU=${USE_GPU:-${DEFAULT_USE_GPU}}
+VERBOSE=${VERBOSE:-${DEFAULT_VERBOSE}}
+
+#echo $EX_PORT $USE_GPU $VERBOSE
+
+### INTERNAL SETTING ###
+TENSORBOARD_PORT=6006
+SIO_PORT=8080
+
+
+### 
+if [ "${MODE}" = "MMVC_TRAIN" ]; then
+    echo "トレーニングを開始します"
+
+    docker run -it --gpus all --shm-size=128M \
+        -v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
+        -v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
+        -v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
+        -v `pwd`/vc_resources:/resources \
+        -e LOCAL_UID=$(id -u $USER) \
+        -e LOCAL_GID=$(id -g $USER) \
+        -e EX_IP="`hostname -I`" \
+        -e EX_PORT=${EX_PORT} \
+        -e VERBOSE=${VERBOSE} \        
+        -p ${EX_PORT}:6006 $DOCKER_IMAGE "$@"
+
+elif [ "${MODE}" = "MMVC" ]; then
+    if [ "${USE_GPU}" = "on" ]; then
+        echo "MMVCを起動します(with gpu)"
+
+        docker run -it --gpus all --shm-size=128M \
+        -v `pwd`/vc_resources:/resources \
+        -e LOCAL_UID=$(id -u $USER) \
+        -e LOCAL_GID=$(id -g $USER) \
+        -e EX_IP="`hostname -I`" \
+        -e EX_PORT=${EX_PORT} \
+        -e VERBOSE=${VERBOSE} \
+        -p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
+    else
+        echo "MMVCを起動します(only cpu)"
+        docker run -it --shm-size=128M \
+        -v `pwd`/vc_resources:/resources \
+        -e LOCAL_UID=$(id -u $USER) \
+        -e LOCAL_GID=$(id -g $USER) \
+        -e EX_IP="`hostname -I`" \
+        -e EX_PORT=${EX_PORT} \
+        -e VERBOSE=${VERBOSE} \
+        -p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
+
+        # docker run -it --shm-size=128M \
+        # -v `pwd`/vc_resources:/resources \
+        # -e LOCAL_UID=$(id -u $USER) \
+        # -e LOCAL_GID=$(id -g $USER) \
+        # -e EX_IP="`hostname -I`" \
+        # -e EX_PORT=${EX_PORT} \
+        # -e VERBOSE=${VERBOSE} \
+        # --entrypoint="" \
+        # -p ${EX_PORT}:8080 $DOCKER_IMAGE /bin/bash
+
+    fi
+
+elif [ "${MODE}" = "SOFT_VC" ]; then
+    if [ "${USE_GPU}" = "on" ]; then
+        echo "Start Soft-vc"
+
+        docker run -it --gpus all --shm-size=128M \
+        -v `pwd`/vc_resources:/resources \
+        -e LOCAL_UID=$(id -u $USER) \
+        -e LOCAL_GID=$(id -g $USER) \
+        -e EX_IP="`hostname -I`" \
+        -e EX_PORT=${EX_PORT} \
+        -e VERBOSE=${VERBOSE} \
+        -p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
+    else
+        echo "Start Soft-vc withou GPU is not supported"
+    fi
+
+else
+    echo "
+usage: 
+    $0 <MODE> <params...>
+    MODE: select one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
+" >&2
+fi
+
+
+
+
+# echo $EX_PORT
+
+
+# echo "------"
+# echo "$@"
+# echo "------"
+
+# # usage() {
+# #     echo "
+# # usage: 
+# #     For training
+# #         $0 [-t] -n <exp_name> [-b batch_size] [-r] 
+# #             -t: トレーニングモードで実行する場合に指定してください。(train)
+# #             -n: トレーニングの名前です。(name)
+# #             -b: バッチサイズです。(batchsize)
+# #             -r: トレーニング再開の場合に指定してください。(resume)
+# #     For changing voice
+# #         $0 [-v] [-c config] [-m model] [-g on/off]
+# #             -v: ボイスチェンジャーモードで実行する場合に指定してください。(voice changer)
+# #             -c: トレーニングで使用したConfigのファイル名です。(config)
+# #             -m: トレーニング済みのモデルのファイル名です。(model)
+# #             -g: GPU使用/不使用。デフォルトはonなのでGPUを使う場合は指定不要。(gpu)
+# #             -p: port番号
+# #     For help
+# #         $0 [-h]
+# #             -h: show this help
+# # " >&2
+# # }
+# # warn () {
+# #     echo "! ! ! $1 ! ! !"
+# #     exit 1
+# # }
+
+
+# # training_flag=false
+# # name=999_exp
+# # batch_size=10
+# # resume_flag=false
+
+# # voice_change_flag=false
+# # config=
+# # model=
+# # gpu=on
+# # port=8080
+# # escape_flag=false
+
+# # # オプション解析
+# # while getopts tn:b:rvc:m:g:p:hx OPT; do
+# #     case $OPT in
+# #     t) 
+# #         training_flag=true
+# #         ;;
+# #     n) 
+# #         name="$OPTARG"
+# #         ;;
+# #     b) 
+# #         batch_size="$OPTARG"
+# #         ;;
+# #     r) 
+# #         resume_flag=true
+# #         ;;
+# #     v) 
+# #         voice_change_flag=true
+# #         ;;
+# #     c) 
+# #         config="$OPTARG"
+# #         ;;
+# #     m) 
+# #         model="$OPTARG"
+# #         ;;
+# #     g) 
+# #         gpu="$OPTARG"
+# #         ;;
+# #     p) 
+# #         port="$OPTARG"
+# #         ;;
+# #     h | \?) 
+# #         usage && exit 1
+# #         ;;
+# #     x)
+# #         escape_flag=true
+# #     esac
+# # done
+
+
+# # # モード解析
+# # if $training_flag && $voice_change_flag; then
+# #     warn "-t（トレーニングモード） と -v（ボイチェンモード）は同時に指定できません。"
+# # elif $training_flag; then
+# #     echo "■■■  ト レ ー ニ ン グ モ ー ド   ■■■"
+# # elif $voice_change_flag; then
+# #     echo "■■■  ボ イ チ ェ ン モ ー ド  ■■■"
+# # elif $escape_flag; then
+# #     /bin/bash
+# # else
+# #     warn "-t（トレーニングモード） と -v（ボイチェンモード）のいずれかを指定してください。"
+# # fi
+
+# if [ "${MODE}" = "MMVC_TRAIN_INITIAL" ]; then
+#     echo "トレーニングを開始します"
+# elif [ "${MODE}" = "MMVC" ]; then
+#     echo "MMVCを起動します"
+
+#     docker run -it --gpus all --shm-size=128M \
+#     -v `pwd`/vc_resources:/resources \
+#     -e LOCAL_UID=$(id -u $USER) \
+#     -e LOCAL_GID=$(id -g $USER) \
+#     -e EX_IP="`hostname -I`" \
+#     -e EX_PORT=${port} \
+#     -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
+
+# elif [ "${MODE}" = "MMVC_VERBOSE" ]; then
+#     echo "MMVCを起動します(verbose)"
+# elif [ "${MODE}" = "MMVC_CPU" ]; then
+#     echo "MMVCを起動します(CPU)"
+# elif [ "${MODE}" = "MMVC_CPU_VERBOSE" ]; then
+#     echo "MMVCを起動します(CPU)(verbose)"
+# elif [ "${MODE}" = "SOFT_VC" ]; then
+#     echo "Start Soft-vc"
+# elif [ "${MODE}" = "SOFT_VC_VERBOSE" ]; then
+#     echo "Start Soft-vc(verbose)"
+# else
+#     echo "
+# usage: 
+#     $0 <MODE> <params...>
+#     EX_PORT: 
+#     MODE: one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
+
+#     For 'MMVC_TRAIN':
+#         $0 MMVC_TRAIN_INITIAL -n <exp_name> [-b batch_size] [-r] 
+#             -n: トレーニングの名前です。(name)
+#             -b: バッチサイズです。(batchsize)
+#             -r: トレーニング再開の場合に指定してください。(resume)
+#     For 'MMVC'
+#         $0 MMVC [-c config] [-m model] [-g on/off] [-p port] [-v]
+#             -c: トレーニングで使用したConfigのファイル名です。(config)
+#             -m: トレーニング済みのモデルのファイル名です。(model)
+#             -g: GPU使用/不使用。デフォルトはonなのでGPUを使う場合は指定不要。(gpu)
+#             -p: Docker からExposeするport番号
+#             -v: verbose
+#     For 'SOFT_VC'
+#         $0 SOFT_VC [-c config] [-m model] [-g on/off]
+#             -p: port exposed from docker container.
+#             -v: verbose
+# " >&2
+# fi
+
+
+
+# # if $training_flag; then
+# #     if $resume_flag; then
+# #         echo "トレーニングを再開します"
+# #         docker run -it --gpus all --shm-size=128M \
+# #             -v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
+# #             -v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
+# #             -v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
+# #             -v `pwd`/vc_resources:/resources \
+# #             -e LOCAL_UID=$(id -u $USER) \
+# #             -e LOCAL_GID=$(id -g $USER) \
+# #             -p ${TENSORBOARD_PORT}:6006 $DOCKER_IMAGE -t -b ${batch_size} -r
+# #     else
+# #         echo "トレーニングを開始します"
+# #         docker run -it --gpus all --shm-size=128M \
+# #             -v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
+# #             -v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
+# #             -v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
+# #             -v `pwd`/vc_resources:/resources \
+# #             -e LOCAL_UID=$(id -u $USER) \
+# #             -e LOCAL_GID=$(id -g $USER) \
+# #             -p ${TENSORBOARD_PORT}:6006 $DOCKER_IMAGE -t -b ${batch_size}
+# #     fi
+# # fi
+
+# # if $voice_change_flag; then
+# #     if [[ -z "$config" ]]; then
+# #         warn "コンフィグファイル(-c)を指定してください"
+# #     fi
+# #     if [[ -z "$model" ]]; then
+# #         warn "モデルファイル(-m)を指定してください"
+# #     fi
+# #     if [ "${gpu}" = "on" ]; then
+# #         echo "GPUをマウントして起動します。"
+
+# #         docker run -it --gpus all --shm-size=128M \
+# #         -v `pwd`/vc_resources:/resources \
+# #         -e LOCAL_UID=$(id -u $USER) \
+# #         -e LOCAL_GID=$(id -g $USER) \
+# #         -e EX_IP="`hostname -I`" \
+# #         -e EX_PORT=${port} \
+# #         -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
+# #     elif [ "${gpu}" = "off" ]; then
+# #         echo "CPUのみで稼働します。GPUは使用できません。"
+# #         docker run -it --shm-size=128M \
+# #         -v `pwd`/vc_resources:/resources \
+# #         -e LOCAL_UID=$(id -u $USER) \
+# #         -e LOCAL_GID=$(id -g $USER) \
+# #         -e EX_IP="`hostname -I`" \
+# #         -e EX_PORT=${port} \
+# #         -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
+# #     else
+# #         echo ${gpu}
+# #         warn "-g は onかoffで指定して下さい。"
+        
+# #     fi
+
+
+# # fi
+
+
--- a/template/setting_mmvc.json
+++ b/template/setting_mmvc.json
@ -1,26 +1,49 @@
 {
    "app_title": "voice-changer",
    "majar_mode": "docker",
-    "voice_changer_server_url": "./test",
+    "voice_changer_server_url": "/test",
    "sample_rate": 48000,
    "buffer_size": 1024,
    "prefix_chunk_size": 24,
    "chunk_size": 24,
-    "speaker_ids": [100, 107, 101, 102, 103],
-    "speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
+    "speakers": [
+        {
+            "id": 100,
+            "name": "ずんだもん"
+        },
+        {
+            "id": 107,
+            "name": "user"
+        },
+        {
+            "id": 101,
+            "name": "そら"
+        },
+        {
+            "id": 102,
+            "name": "めたん"
+        },
+        {
+            "id": 103,
+            "name": "つむぎ"
+        }
+    ],
    "src_id": 107,
    "dst_id": 100,
    "vf_enable": true,
    "voice_changer_mode": "realtime",
    "gpu": 0,
    "available_gpus": [-1, 0, 1, 2, 3, 4],
+    "screen": {
+        "enable_screen": true,
+        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
+    },
    "avatar": {
-        "enable_avatar": true,        
-        "motion_capture_face": true,
-        "motion_capture_upperbody": true,
-        "lip_overwrite_with_voice": true,
+        "enable_avatar": false,
+        "motion_capture_face": false,
+        "motion_capture_upperbody": false,
+        "lip_overwrite_with_voice": false,
        "avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
-        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
        "background_color": "#0000dd",
        "chroma_key": "#0000dd",
        "avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
        "cross_fade_offset_rate": 0.3,
        "cross_fade_end_rate": 0.6,
        "cross_fade_type": 2
+    },
+    "transcribe": {
+        "lang": "日本語(ja-JP)",
+        "expire_time": 5
    }
 }
--- a/template/setting_mmvc_colab.json
+++ b/template/setting_mmvc_colab.json
@ -4,23 +4,46 @@
    "voice_changer_server_url": "/test",
    "sample_rate": 48000,
    "buffer_size": 1024,
-    "prefix_chunk_size": 36,
-    "chunk_size": 36,
-    "speaker_ids": [100, 107, 101, 102, 103],
-    "speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
+    "prefix_chunk_size": 24,
+    "chunk_size": 24,
+    "speakers": [
+        {
+            "id": 100,
+            "name": "ずんだもん"
+        },
+        {
+            "id": 107,
+            "name": "user"
+        },
+        {
+            "id": 101,
+            "name": "そら"
+        },
+        {
+            "id": 102,
+            "name": "めたん"
+        },
+        {
+            "id": 103,
+            "name": "つむぎ"
+        }
+    ],
    "src_id": 107,
    "dst_id": 100,
    "vf_enable": true,
    "voice_changer_mode": "realtime",
    "gpu": 0,
    "available_gpus": [-1, 0, 1, 2, 3, 4],
+    "screen": {
+        "enable_screen": true,
+        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
+    },
    "avatar": {
-        "enable_avatar": true,        
-        "motion_capture_face": true,
-        "motion_capture_upperbody": true,
-        "lip_overwrite_with_voice": true,
+        "enable_avatar": false,
+        "motion_capture_face": false,
+        "motion_capture_upperbody": false,
+        "lip_overwrite_with_voice": false,
        "avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
-        "backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
        "background_color": "#0000dd",
        "chroma_key": "#0000dd",
        "avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
        "cross_fade_offset_rate": 0.3,
        "cross_fade_end_rate": 0.6,
        "cross_fade_type": 2
+    },
+    "transcribe": {
+        "lang": "日本語(ja-JP)",
+        "expire_time": 5
    }
 }
--- a/trainer/Dockerfile
+++ b/trainer/Dockerfile
@ -1,4 +1,4 @@
-FROM dannadori/voice-changer-internal:20221028_220538 as front
+FROM dannadori/voice-changer-internal:20221103_180551 as front
 FROM debian:bullseye-slim as base

 ARG DEBIAN_FRONTEND=noninteractive
@ -8,7 +8,7 @@ RUN apt-get install -y python3-pip git
 RUN apt-get install -y espeak
 RUN apt-get install -y cmake

-RUN git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3
+#RUN git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3

 RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

@ -24,17 +24,20 @@ RUN pip install tqdm==4.64.0
 RUN pip install retry==0.9.2
 RUN pip install psutil==5.9.1
 RUN pip install python-socketio==5.7.1
-RUN pip install eventlet==0.33.1
+RUN pip install matplotlib==3.5.3
+
+RUN pip install fastapi==0.85.0
+RUN pip install python-multipart==0.0.5
+RUN pip install uvicorn==0.18.3
+RUN pip install websockets==10.4
+RUN pip install pyOpenSSL==22.0.0

 RUN pip install pyopenjtalk==0.2.0
 RUN pip install tensorboard==2.10.0
-RUN pip install matplotlib==3.5.3

-RUN pip install pyOpenSSL==22.0.0
-
-WORKDIR /MMVC_Trainer/monotonic_align
-RUN cythonize -3 -i core.pyx \
- && mv core.cpython-39-x86_64-linux-gnu.so monotonic_align/
+# WORKDIR /MMVC_Trainer/monotonic_align
+# RUN cythonize -3 -i core.pyx \
+#  && mv core.cpython-39-x86_64-linux-gnu.so monotonic_align/


 FROM debian:bullseye-slim
@ -64,12 +67,11 @@ COPY --from=front --chmod=777 /voice-changer-internal/frontend/dist /voice-chang
 COPY --from=front --chmod=777 /voice-changer-internal/voice-change-service /voice-changer-internal/voice-change-service
 RUN chmod 0777 /voice-changer-internal/voice-change-service 

-##### Soft VC
-COPY --from=front /hubert /hubert
-COPY --from=front /acoustic-model /acoustic-model
-COPY --from=front /hifigan /hifigan
-
-COPY --from=front /models /models 
+# ##### Soft VC
+# COPY --from=front /hubert /hubert
+# COPY --from=front /acoustic-model /acoustic-model
+# COPY --from=front /hifigan /hifigan
+# COPY --from=front /models /models 


 ENTRYPOINT ["/bin/bash", "setup.sh"]
--- a/trainer/exec.sh
+++ b/trainer/exec.sh
@ -17,23 +17,7 @@ echo "------"


 # 起動
-if [ "${MODE}" = "SOFT_VC" ] ; then
-    cd /voice-changer-internal/voice-change-service
-
-    cp -r /resources/* .
-    if [[ -e ./setting.json ]]; then
-        cp ./setting.json ../frontend/dist/assets/setting.json
-    else
-        cp ../frontend/dist/assets/setting_softvc.json ../frontend/dist/assets/setting.json
-    fi
-    if [ "${VERBOSE}" = "on" ]; then
-        echo "SOFT_VCを起動します(verbose)"
-        python3 SoftVcServerSIO.py $PARAMS 
-    else
-        echo "SOFT_VCを起動します"
-        python3 SoftVcServerSIO.py $PARAMS 2>stderr.txt
-    fi
-elif [ "${MODE}" = "MMVC" ] ; then
+if  [ "${MODE}" = "MMVC" ] ; then
    cd /voice-changer-internal/voice-change-service

    cp -r /resources/* .
@ -45,10 +29,10 @@ elif [ "${MODE}" = "MMVC" ] ; then

    if [ "${VERBOSE}" = "on" ]; then
        echo "MMVCを起動します(verbose)"
-        python3 serverSIO.py $PARAMS
+        python3 MMVCServerSIO.py $PARAMS
    else
        echo "MMVCを起動します"
-        python3 serverSIO.py $PARAMS 2>stderr.txt
+        python3 MMVCServerSIO.py $PARAMS $PARAMS 2>stderr.txt
    fi
 elif [ "${MODE}" = "MMVC_TRAIN" ] ; then
    python3 create_dataset_jtalk.py -f train_config -s 24000 -m dataset/multi_speaker_correspondence.txt