Merge pull request #80 from w-okada/dev

Dev
This commit is contained in:
w-okada 2022-11-03 19:23:06 +09:00 committed by GitHub
commit af0ebaccd1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
16 changed files with 1512 additions and 107 deletions

View File

@ -0,0 +1,463 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyN7lDdQ3iB8T1SI4BKFzkWz",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"accelerator": "GPU",
"gpuClass": "standard"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/w-okada/voice-changer/blob/dev/VoiceChangerDemo_Simple.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Voice Changer Simple (デモ版)\n",
"---\n",
"\n",
"このートはVoice ChangerをColab上で動かすデモ版です。\n",
"\n",
"正式版はローカルPCのDocker上で動かすアプリケーションです。\n",
"\n",
"正式版は、多くの場合より少ないタイムラグで滑らかに音声を変換できます。\n",
"\n",
"詳細な使用方法はこちらの[リポジトリ](https://github.com/w-okada/voice-changer)からご確認ください。\n"
],
"metadata": {
"id": "Lbbmx_Vjl0zo"
}
},
{
"cell_type": "markdown",
"source": [
"# GPUを確認\n",
"GPUを用いたほうが高速に処理が行えます。\n",
"\n",
"下記のコマンドでGPUが確認できない場合は、上のメニューから\n",
"\n",
"「ランタイム」→「ランタイムの変更」→「ハードウェア アクセラレータ」\n",
"\n",
"でGPUを選択してください。"
],
"metadata": {
"id": "oUKi1NYMmXrr"
}
},
{
"cell_type": "code",
"source": [
"# (1) GPUの確認\n",
"!nvidia-smi"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vV1t7PBRm-o6",
"outputId": "2ab5d79e-0fe1-4e48-9fb4-8a61399e0b60"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Sun Oct 30 10:03:39 2022 \n",
"+-----------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |\n",
"|-------------------------------+----------------------+----------------------+\n",
"| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|===============================+======================+======================|\n",
"| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
"| N/A 35C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |\n",
"| | | N/A |\n",
"+-------------------------------+----------------------+----------------------+\n",
" \n",
"+-----------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=============================================================================|\n",
"| No running processes found |\n",
"+-----------------------------------------------------------------------------+\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# 使用するモデルとコンフィグファイルの指定\n",
"\n",
"使用するトレーニング済みのモデルと、トレーニングで使用したコンフィグファイルのパスを指定してください。\n",
"\n",
"多くの場合はGoogle Driveに格納されているファイルを使用すると思います。その場合は、下の(2-2)のセルを実行してドライブをマウントしてください"
],
"metadata": {
"id": "mHvGrgaWnIPA"
}
},
{
"cell_type": "code",
"source": [
"# # (2-1) 使用するモデルとコンフィグファイルの指定\n",
"# CONFIG=\"/content/drive/MyDrive/VoiceChanger/config.json\"\n",
"# MODEL=\"/content/drive/MyDrive/VoiceChanger/G_326000.pth\""
],
"metadata": {
"id": "nSXATMWYb4Ik"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2wxD-gRSMU5R",
"outputId": "dabd982a-87c7-44d1-b9e8-986691190771"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /content/drive\n"
]
}
],
"source": [
"# # (2-2) Google Driveのマウント\n",
"# from google.colab import drive\n",
"# drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"source": [
"# リポジトリのクローン\n",
"リポジトリをクローンします"
],
"metadata": {
"id": "sLBfykjBnjWc"
}
},
{
"cell_type": "code",
"source": [
"# (3) リポジトリのクローン\n",
"!git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3 /MMVC_Trainer\n",
"!git clone --depth 1 https://github.com/w-okada/voice-changer.git -b dev\n",
"%cd voice-changer/demo/\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "86wTFmqsNMnD",
"outputId": "a52d5b0e-826e-445d-cd3a-4a42cbd52212"
},
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"fatal: destination path '/MMVC_Trainer' already exists and is not an empty directory.\n",
"Cloning into 'voice-changer'...\n",
"remote: Enumerating objects: 88, done.\u001b[K\n",
"remote: Counting objects: 100% (88/88), done.\u001b[K\n",
"remote: Compressing objects: 100% (74/74), done.\u001b[K\n",
"remote: Total 88 (delta 14), reused 57 (delta 6), pack-reused 0\u001b[K\n",
"Unpacking objects: 100% (88/88), done.\n",
"/content/voice-changer/demo/voice-changer/demo/voice-changer/demo/voice-changer/demo/voice-changer/demo\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# ファイルの配置\n",
"アプリケーションの挙動を記した設定ファイルをコピーします(4-1)。(4-2)はコピーした設定ファイルを表示しています。もしかしたらうまく動かないときに役立つかもしれません。"
],
"metadata": {
"id": "jmDY8W_fnuSi"
}
},
{
"cell_type": "code",
"source": [
"# (4-1) 設定ファイルの配置\n",
"!cp ../template/setting_mmvc_colab.json ../frontend/dist/assets/setting.json\n"
],
"metadata": {
"id": "Bn4kV8TgXp8i"
},
"execution_count": 37,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# (4-2) 設定ファイルの確認\n",
"!cat ../frontend/dist/assets/setting.json\n"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pjxPsOOaXXTj",
"outputId": "425a36dd-fbdc-4f55-825e-a2c7026f2aab"
},
"execution_count": 38,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{\n",
" \"app_title\": \"voice-changer\",\n",
" \"majar_mode\": \"colab\",\n",
" \"voice_changer_server_url\": \"/test\",\n",
" \"sample_rate\": 48000,\n",
" \"buffer_size\": 1024,\n",
" \"prefix_chunk_size\": 36,\n",
" \"chunk_size\": 36,\n",
" \"speaker_ids\": [100, 107, 101, 102, 103],\n",
" \"speaker_names\": [\"ずんだもん\", \"user\", \"そら\", \"めたん\", \"つむぎ\"],\n",
" \"src_id\": 107,\n",
" \"dst_id\": 100,\n",
" \"vf_enable\": true,\n",
" \"voice_changer_mode\": \"realtime\",\n",
" \"gpu\": 0,\n",
" \"available_gpus\": [-1, 0, 1, 2, 3, 4],\n",
" \"avatar\": {\n",
" \"enable_avatar\": true, \n",
" \"motion_capture_face\": true,\n",
" \"motion_capture_upperbody\": true,\n",
" \"lip_overwrite_with_voice\": true,\n",
" \"avatar_url\": \"./assets/vrm/zundamon/zundamon.vrm\",\n",
" \"backgournd_image_url\": \"./assets/images/bg_natural_sougen.jpg\",\n",
" \"background_color\": \"#0000dd\",\n",
" \"chroma_key\": \"#0000dd\",\n",
" \"avatar_canvas_size\": [1280, 720],\n",
" \"screen_canvas_size\": [1280, 720]\n",
" },\n",
" \"advance\": {\n",
" \"avatar_draw_skip_rate\": 3,\n",
" \"screen_draw_skip_rate\": 3,\n",
" \"visualizer_draw_skip_rate\": 3,\n",
" \"cross_fade_lower_value\": 0.1,\n",
" \"cross_fade_offset_rate\": 0.3,\n",
" \"cross_fade_end_rate\": 0.6,\n",
" \"cross_fade_type\": 2\n",
" }\n",
"}\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# モジュールのインストール\n",
"\n",
"必要なモジュールをインストールします。"
],
"metadata": {
"id": "8Na2PbLZSWgZ"
}
},
{
"cell_type": "code",
"source": [
"# (5) 設定ファイルの確認\n",
"!apt-get install -y espeak libsndfile1-dev &> /dev/null\n",
"!pip install unidecode &> /dev/null\n",
"!pip install phonemizer &> /dev/null\n",
"!pip install retry &> /dev/null\n",
"!pip install python-socketio &> /dev/null\n",
"!pip install fastapi &> /dev/null\n",
"!pip install python-multipart &> /dev/null\n",
"!pip install uvicorn &> /dev/null\n",
"!pip install websockets &> /dev/null\n",
"!pip install pyOpenSSL &> /dev/null\n"
],
"metadata": {
"id": "LwZAAuqxX7yY"
},
"execution_count": 44,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# サーバの起動\n",
"\n",
"サーバを起動します。(6-1)\n",
"\n",
"サーバの起動状況を確認します。(6-2) \n",
"\n",
"このセルは繰り返し実行することになるのでCtrl+Retでセルを実行してください。\n",
"\n",
"アクセスできるようになるまで、1~2分かかるようです。コーヒーでも飲みに行きましょう。\n",
"\n",
"下記のようなテキストが表示されたら起動完了です。\n",
"\n",
"**`DEBUG:asyncio:Using selector: EpollSelector`**\n",
"\n",
"```\n",
" Phase name:__main__\n",
" PHASE3:__main__\n",
" PHASE1:__main__\n",
"Start MMVC SocketIO Server\n",
" CONFIG:None, MODEL:None\n",
"DEBUG:asyncio:Using selector: EpollSelector\n",
"```\n",
"\n"
],
"metadata": {
"id": "-_2OcN9Borke"
}
},
{
"cell_type": "code",
"source": [
"# (6-1) サーバの起動\n",
"import random\n",
"PORT = 10000 + random.randint(1, 9999)\n",
"LOG_FILE = f\"LOG_FILE_{PORT}\"\n",
"\n",
"get_ipython().system_raw(f'python3 MMVCServerSIO.py -p {PORT} --colab True >{LOG_FILE} 2>&1 &')\n",
"#print(f\"PORT:{PORT}, LOG_FILE:{LOG_FILE}\")"
],
"metadata": {
"id": "G-nMdPxEW1rc",
"outputId": "ed5fc2d9-f1c5-4aa3-df8d-e306de2e2a30",
"colab": {
"base_uri": "https://localhost:8080/"
}
},
"execution_count": 40,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"PORT:19751, LOG_FILE:LOG_FILE_19751\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# (6-2) サーバの起動確認 (Ctrl+Retで実行)\n",
"!tail -20 {LOG_FILE}"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "chu06KpAjEK6",
"outputId": "e6b67606-1279-49aa-e276-4e2bb83284c1"
},
"execution_count": 45,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\u001b[32m Phase name:__main__\u001b[0m\n",
"\u001b[32m PHASE3:__main__\u001b[0m\n",
"\u001b[32m PHASE1:__main__\u001b[0m\n",
"\u001b[17mStart MMVC SocketIO Server\u001b[0m\n",
"\u001b[34m CONFIG:None, MODEL:None\u001b[0m\n",
"DEBUG:asyncio:Using selector: EpollSelector\n",
"\u001b[32m Phase name:MMVCServerSIO\u001b[0m\n",
"\u001b[32m PHASE3:MMVCServerSIO\u001b[0m\n",
"File saved to: G_326000.pth\n",
"Load: config.json, G_326000.pth\n",
"INFO:root:Loaded checkpoint 'model_upload_dir/G_326000.pth' (iteration 1136)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"# プロキシを起動\n",
"ウェブサーバへのアクセスをするためのプロキシを起動します。\n",
"\n",
"表示されたURLをクリックして開くと別タブでアプリが開きます。\n",
"\n",
"Colabなので、ロードにある程度時間がかかります(30秒くらい)。"
],
"metadata": {
"id": "WhxcFLQEpctq"
}
},
{
"cell_type": "code",
"source": [
"# (7) プロキシを起動\n",
"from google.colab.output import eval_js\n",
"proxy = eval_js( \"google.colab.kernel.proxyPort(\" + str(PORT) + \")\" )\n",
"print(f\"{proxy}front/\")"
],
"metadata": {
"id": "nkRjZm95l87C",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"outputId": "bbc830e9-209a-4b71-891d-8cf78cf3077d"
},
"execution_count": 43,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"https://w6x1mbngbj-496ff2e9c6d22116-19751-colab.googleusercontent.com/front/\n"
]
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "axkt5BjhoiPV"
},
"execution_count": null,
"outputs": []
}
]
}

379
demo/MMVCServerSIO.py Executable file
View File

@ -0,0 +1,379 @@
import sys, os, struct, argparse, logging, shutil, base64, traceback
sys.path.append("/MMVC_Trainer")
sys.path.append("/MMVC_Trainer/text")
import uvicorn
from fastapi import FastAPI, UploadFile, File, Form
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder
from fastapi import FastAPI, HTTPException
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from scipy.io.wavfile import write, read
import socketio
from distutils.util import strtobool
from datetime import datetime
import torch
import numpy as np
from mods.ssl import create_self_signed_cert
from mods.VoiceChanger import VoiceChanger
# from mods.Whisper import Whisper
class UvicornSuppressFilter(logging.Filter):
def filter(self, record):
return False
logger = logging.getLogger("uvicorn.error")
logger.addFilter(UvicornSuppressFilter())
# logger.propagate = False
logger = logging.getLogger("multipart.multipart")
logger.propagate = False
class VoiceModel(BaseModel):
gpu: int
srcId: int
dstId: int
timestamp: int
prefixChunkSize: int
buffer: str
class MyCustomNamespace(socketio.AsyncNamespace):
def __init__(self, namespace):
super().__init__(namespace)
def loadModel(self, config, model):
if hasattr(self, 'voiceChanger') == True:
self.voiceChanger.destroy()
self.voiceChanger = VoiceChanger(config, model)
# def loadWhisperModel(self, model):
# self.whisper = Whisper()
# self.whisper.loadModel("tiny")
# print("load")
def changeVoice(self, gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData):
# if hasattr(self, 'whisper') == True:
# self.whisper.addData(unpackedData)
return self.voiceChanger.on_request(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
# def transcribe(self):
# if hasattr(self, 'whisper') == True:
# self.whisper.transcribe(0)
# else:
# print("whisper not found")
def on_connect(self, sid, environ):
# print('[{}] connet sid : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , sid))
pass
async def on_request_message(self, sid, msg):
# print("on_request_message", torch.cuda.memory_allocated())
gpu = int(msg[0])
srcId = int(msg[1])
dstId = int(msg[2])
timestamp = int(msg[3])
prefixChunkSize = int(msg[4])
data = msg[5]
# print(srcId, dstId, timestamp)
unpackedData = np.array(struct.unpack('<%sh'%(len(data) // struct.calcsize('<h') ), data))
audio1 = self.changeVoice(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
bin = struct.pack('<%sh'%len(audio1), *audio1)
await self.emit('response',[timestamp, bin])
def on_disconnect(self, sid):
# print('[{}] disconnect'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
pass;
def setupArgParser():
parser = argparse.ArgumentParser()
parser.add_argument("-p", type=int, default=8080, help="port")
parser.add_argument("-c", type=str, help="path for the config.json")
parser.add_argument("-m", type=str, help="path for the model file")
parser.add_argument("--https", type=strtobool, default=False, help="use https")
parser.add_argument("--httpsKey", type=str, default="ssl.key", help="path for the key of https")
parser.add_argument("--httpsCert", type=str, default="ssl.cert", help="path for the cert of https")
parser.add_argument("--httpsSelfSigned", type=strtobool, default=True, help="generate self-signed certificate")
parser.add_argument("--colab", type=strtobool, default=False, help="run on colab")
return parser
def printMessage(message, level=0):
if level == 0:
print(f"\033[17m{message}\033[0m")
elif level == 1:
print(f"\033[34m {message}\033[0m")
elif level == 2:
print(f"\033[32m {message}\033[0m")
else:
print(f"\033[47m {message}\033[0m")
global app_socketio
global app_fastapi
parser = setupArgParser()
args = parser.parse_args()
printMessage(f"Phase name:{__name__}", level=2)
thisFilename = os.path.basename(__file__)[:-3]
if __name__ == thisFilename or args.colab == True:
printMessage(f"PHASE3:{__name__}", level=2)
PORT = args.p
CONFIG = args.c
MODEL = args.m
app_fastapi = FastAPI()
app_fastapi.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app_fastapi.mount("/front", StaticFiles(directory="../frontend/dist", html=True), name="static")
sio = socketio.AsyncServer(
async_mode='asgi',
cors_allowed_origins='*'
)
namespace = MyCustomNamespace('/test')
sio.register_namespace(namespace)
if CONFIG and MODEL:
namespace.loadModel(CONFIG, MODEL)
# namespace.loadWhisperModel("base")
app_socketio = socketio.ASGIApp(
sio,
other_asgi_app=app_fastapi,
static_files={
'/assets/icons/github.svg': {
'filename':'../frontend/dist/assets/icons/github.svg',
'content_type':'image/svg+xml'
},
'': '../frontend/dist',
'/': '../frontend/dist/index.html',
}
)
@app_fastapi.get("/api/hello")
async def index():
return {"result": "Index"}
UPLOAD_DIR = "model_upload_dir"
os.makedirs(UPLOAD_DIR, exist_ok=True)
# Can colab receive post request "ONLY" at root path?
@app_fastapi.post("/upload_model_file")
async def upload_file(configFile:UploadFile = File(...), modelFile: UploadFile = File(...)):
if configFile and modelFile:
for file in [modelFile, configFile]:
filename = file.filename
fileobj = file.file
upload_dir = open(os.path.join(UPLOAD_DIR, filename),'wb+')
shutil.copyfileobj(fileobj, upload_dir)
upload_dir.close()
namespace.loadModel(os.path.join(UPLOAD_DIR, configFile.filename), os.path.join(UPLOAD_DIR, modelFile.filename))
return {"uploaded files": f"{configFile.filename}, {modelFile.filename} "}
return {"Error": "uploaded file is not found."}
@app_fastapi.post("/upload_file")
async def post_upload_file(
file:UploadFile = File(...),
filename: str = Form(...)
):
if file and filename:
fileobj = file.file
upload_dir = open(os.path.join(UPLOAD_DIR, filename),'wb+')
shutil.copyfileobj(fileobj, upload_dir)
upload_dir.close()
return {"uploaded files": f"{filename} "}
return {"Error": "uploaded file is not found."}
@app_fastapi.post("/load_model")
async def post_load_model(
modelFilename: str = Form(...),
modelFilenameChunkNum: int = Form(...),
configFilename: str = Form(...)
):
target_file_name = modelFilename
with open(os.path.join(UPLOAD_DIR, target_file_name), "ab") as target_file:
for i in range(modelFilenameChunkNum):
filename = f"{modelFilename}_{i}"
chunk_file_path = os.path.join(UPLOAD_DIR,filename)
stored_chunk_file = open(chunk_file_path, 'rb')
target_file.write(stored_chunk_file.read())
stored_chunk_file.close()
os.unlink(chunk_file_path)
target_file.close()
print(f'File saved to: {target_file_name}')
print(f'Load: {configFilename}, {target_file_name}')
namespace.loadModel(os.path.join(UPLOAD_DIR, configFilename), os.path.join(UPLOAD_DIR, target_file_name))
return {"File saved to": f"{target_file_name}"}
@app_fastapi.get("/transcribe")
def get_transcribe():
try:
namespace.transcribe()
except Exception as e:
print("TRANSCRIBE PROCESSING!!!! EXCEPTION!!!", e)
print(traceback.format_exc())
return str(e)
@app_fastapi.post("/test")
async def post_test(voice:VoiceModel):
try:
# print("POST REQUEST PROCESSING....")
gpu = voice.gpu
srcId = voice.srcId
dstId = voice.dstId
timestamp = voice.timestamp
prefixChunkSize = voice.prefixChunkSize
buffer = voice.buffer
wav = base64.b64decode(buffer)
if wav==0:
samplerate, data=read("dummy.wav")
unpackedData = data
else:
unpackedData = np.array(struct.unpack('<%sh'%(len(wav) // struct.calcsize('<h') ), wav))
write("logs/received_data.wav", 24000, unpackedData.astype(np.int16))
changedVoice = namespace.changeVoice(gpu, srcId, dstId, timestamp, prefixChunkSize, unpackedData)
changedVoiceBase64 = base64.b64encode(changedVoice).decode('utf-8')
data = {
"gpu":gpu,
"srcId":srcId,
"dstId":dstId,
"timestamp":timestamp,
"prefixChunkSize":prefixChunkSize,
"changedVoiceBase64":changedVoiceBase64
}
json_compatible_item_data = jsonable_encoder(data)
return JSONResponse(content=json_compatible_item_data)
except Exception as e:
print("REQUEST PROCESSING!!!! EXCEPTION!!!", e)
print(traceback.format_exc())
return str(e)
if __name__ == '__mp_main__':
printMessage(f"PHASE2:{__name__}", level=2)
if __name__ == '__main__':
printMessage(f"PHASE1:{__name__}", level=2)
PORT = args.p
CONFIG = args.c
MODEL = args.m
printMessage(f"Start MMVC SocketIO Server", level=0)
printMessage(f"CONFIG:{CONFIG}, MODEL:{MODEL}", level=1)
if args.colab == False:
if os.getenv("EX_PORT"):
EX_PORT = os.environ["EX_PORT"]
printMessage(f"External_Port:{EX_PORT} Internal_Port:{PORT}", level=1)
else:
printMessage(f"Internal_Port:{PORT}", level=1)
if os.getenv("EX_IP"):
EX_IP = os.environ["EX_IP"]
printMessage(f"External_IP:{EX_IP}", level=1)
# HTTPS key/cert作成
if args.https and args.httpsSelfSigned == 1:
# HTTPS(おれおれ証明書生成)
os.makedirs("./key", exist_ok=True)
key_base_name = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}"
keyname = f"{key_base_name}.key"
certname = f"{key_base_name}.cert"
create_self_signed_cert(certname, keyname, certargs=
{"Country": "JP",
"State": "Tokyo",
"City": "Chuo-ku",
"Organization": "F",
"Org. Unit": "F"}, cert_dir="./key")
key_path = os.path.join("./key", keyname)
cert_path = os.path.join("./key", certname)
printMessage(f"protocol: HTTPS(self-signed), key:{key_path}, cert:{cert_path}", level=1)
elif args.https and args.httpsSelfSigned == 0:
# HTTPS
key_path = args.httpsKey
cert_path = args.httpsCert
printMessage(f"protocol: HTTPS, key:{key_path}, cert:{cert_path}", level=1)
else:
# HTTP
printMessage(f"protocol: HTTP", level=1)
# アドレス表示
if args.https == 1:
printMessage(f"open https://<IP>:<PORT>/ with your browser.", level=0)
else:
printMessage(f"open http://<IP>:<PORT>/ with your browser.", level=0)
if EX_PORT and EX_IP and args.https == 1:
printMessage(f"In many cases it is one of the following", level=1)
printMessage(f"https://localhost:{EX_PORT}/", level=1)
for ip in EX_IP.strip().split(" "):
printMessage(f"https://{ip}:{EX_PORT}/", level=1)
elif EX_PORT and EX_IP and args.https == 0:
printMessage(f"In many cases it is one of the following", level=1)
printMessage(f"http://localhost:{EX_PORT}/", level=1)
# サーバ起動
if args.https:
# HTTPS サーバ起動
uvicorn.run(
f"{os.path.basename(__file__)[:-3]}:app_socketio",
host="0.0.0.0",
port=int(PORT),
reload=True,
ssl_keyfile = key_path,
ssl_certfile = cert_path,
log_level="critical"
)
else:
# HTTP サーバ起動
if args.colab == True:
uvicorn.run(
f"{os.path.basename(__file__)[:-3]}:app_fastapi",
host="0.0.0.0",
port=int(PORT),
log_level="critical"
)
else:
uvicorn.run(
f"{os.path.basename(__file__)[:-3]}:app_socketio",
host="0.0.0.0",
port=int(PORT),
reload=True,
log_level="critical"
)

87
demo/mods/VoiceChanger.py Executable file
View File

@ -0,0 +1,87 @@
import torch
from scipy.io.wavfile import write, read
import numpy as np
import struct, traceback
import utils
import commons
from models import SynthesizerTrn
from text.symbols import symbols
from data_utils import TextAudioSpeakerLoader, TextAudioSpeakerCollate
from mel_processing import spectrogram_torch
from text import text_to_sequence, cleaned_text_to_sequence
class VoiceChanger():
def __init__(self, config, model):
self.hps = utils.get_hparams_from_file(config)
self.net_g = SynthesizerTrn(
len(symbols),
self.hps.data.filter_length // 2 + 1,
self.hps.train.segment_size // self.hps.data.hop_length,
n_speakers=self.hps.data.n_speakers,
**self.hps.model)
self.net_g.eval()
self.gpu_num = torch.cuda.device_count()
utils.load_checkpoint( model, self.net_g, None)
text_norm = text_to_sequence("a", self.hps.data.text_cleaners)
text_norm = commons.intersperse(text_norm, 0)
self.text_norm = torch.LongTensor(text_norm)
self.audio_buffer = torch.zeros(1, 0)
self.prev_audio = np.zeros(1)
print(f"VoiceChanger Initialized (GPU_NUM:{self.gpu_num})")
def destroy(self):
del self.net_g
def on_request(self, gpu, srcId, dstId, timestamp, prefixChunkSize, wav):
unpackedData = wav
convertSize = unpackedData.shape[0] + (prefixChunkSize * 512)
try:
audio = torch.FloatTensor(unpackedData.astype(np.float32))
audio_norm = audio /self.hps.data.max_wav_value
audio_norm = audio_norm.unsqueeze(0)
self.audio_buffer = torch.cat([self.audio_buffer, audio_norm], axis=1)
audio_norm = self.audio_buffer[:,-convertSize:]
self.audio_buffer = audio_norm
spec = spectrogram_torch(audio_norm, self.hps.data.filter_length,
self.hps.data.sampling_rate, self.hps.data.hop_length, self.hps.data.win_length,
center=False)
spec = torch.squeeze(spec, 0)
sid = torch.LongTensor([int(srcId)])
data = (self.text_norm , spec, audio_norm, sid)
data = TextAudioSpeakerCollate()([data])
if gpu<0 or self.gpu_num==0 :
with torch.no_grad():
x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x.cpu() for x in data]
sid_tgt1 = torch.LongTensor([dstId]).cpu()
audio1 = (self.net_g.cpu().voice_conversion(spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_tgt1)[0][0,0].data * self.hps.data.max_wav_value).cpu().float().numpy()
else:
with torch.no_grad():
x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x.cuda(gpu) for x in data]
sid_tgt1 = torch.LongTensor([dstId]).cuda(gpu)
audio1 = (self.net_g.cuda(gpu).voice_conversion(spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_tgt1)[0][0,0].data * self.hps.data.max_wav_value).cpu().float().numpy()
# if len(self.prev_audio) > unpackedData.shape[0]:
# prevLastFragment = self.prev_audio[-unpackedData.shape[0]:]
# curSecondLastFragment = audio1[-unpackedData.shape[0]*2:-unpackedData.shape[0]]
# print("prev, cur", prevLastFragment.shape, curSecondLastFragment.shape)
# self.prev_audio = audio1
# print("self.prev_audio", self.prev_audio.shape)
audio1 = audio1[-unpackedData.shape[0]*2:]
except Exception as e:
print("VC PROCESSING!!!! EXCEPTION!!!", e)
print(traceback.format_exc())
audio1 = audio1.astype(np.int16)
return audio1

36
demo/mods/Whisper.py Executable file
View File

@ -0,0 +1,36 @@
import whisper
import numpy as np
import torchaudio
from scipy.io.wavfile import write
_MODELS = {
"tiny": "/whisper/tiny.pt",
"base": "/whisper/base.pt",
"small": "/whisper/small.pt",
"medium": "/whisper/medium.pt",
}
class Whisper():
def __init__(self):
self.storedSizeFromTry = 0
def loadModel(self, model):
# self.model = whisper.load_model(_MODELS[model], device="cpu")
self.model = whisper.load_model(_MODELS[model])
self.data = np.zeros(1).astype(np.float)
def addData(self, unpackedData):
self.data = np.concatenate([self.data, unpackedData], 0)
def transcribe(self, audio):
received_data_file = "received_data.wav"
write(received_data_file, 24000, self.data.astype(np.int16))
source, sr = torchaudio.load(received_data_file)
target = torchaudio.functional.resample(source, 24000, 16000)
result = self.model.transcribe(received_data_file)
print("WHISPER1:::", result["text"])
print("WHISPER2:::", result["segments"])
self.data = np.zeros(1).astype(np.float)
return result["text"]

24
demo/mods/ssl.py Executable file
View File

@ -0,0 +1,24 @@
import os
from OpenSSL import crypto
def create_self_signed_cert(certfile, keyfile, certargs, cert_dir="."):
C_F = os.path.join(cert_dir, certfile)
K_F = os.path.join(cert_dir, keyfile)
if not os.path.exists(C_F) or not os.path.exists(K_F):
k = crypto.PKey()
k.generate_key(crypto.TYPE_RSA, 2048)
cert = crypto.X509()
cert.get_subject().C = certargs["Country"]
cert.get_subject().ST = certargs["State"]
cert.get_subject().L = certargs["City"]
cert.get_subject().O = certargs["Organization"]
cert.get_subject().OU = certargs["Org. Unit"]
cert.get_subject().CN = 'Example'
cert.set_serial_number(1000)
cert.gmtime_adj_notBefore(0)
cert.gmtime_adj_notAfter(315360000)
cert.set_issuer(cert.get_subject())
cert.set_pubkey(k)
cert.sign(k, 'sha1')
open(C_F, "wb").write(crypto.dump_certificate(crypto.FILETYPE_PEM, cert))
open(K_F, "wb").write(crypto.dump_privatekey(crypto.FILETYPE_PEM, k))

View File

@ -22,7 +22,12 @@ from mel_processing import spectrogram_torch
from text import text_to_sequence, cleaned_text_to_sequence
class MyCustomNamespace(socketio.Namespace):
def __init__(self, namespace, config, model):
def __init__(self, namespace):
super().__init__(namespace)
self.gpu_num = torch.cuda.device_count()
print("GPU_NUM:",self.gpu_num)
def __init__old(self, namespace, config, model):
super().__init__(namespace)
self.hps =utils.get_hparams_from_file(config)
self.net_g = SynthesizerTrn(
@ -36,12 +41,37 @@ class MyCustomNamespace(socketio.Namespace):
print("GPU_NUM:",self.gpu_num)
utils.load_checkpoint( model, self.net_g, None)
def loadModel(self, config, model):
self.hps =utils.get_hparams_from_file(config)
print("before DELETE:", torch.cuda.memory_allocated())
if hasattr(self, 'net_g') == True:
print("DELETE MODEL:", torch.cuda.memory_allocated())
del self.net_g
print("before load", torch.cuda.memory_allocated())
self.net_g = SynthesizerTrn(
len(symbols),
self.hps.data.filter_length // 2 + 1,
self.hps.train.segment_size // self.hps.data.hop_length,
n_speakers=self.hps.data.n_speakers,
**self.hps.model)
self.net_g.eval()
utils.load_checkpoint( model, self.net_g, None)
print(torch.cuda.memory_allocated())
print("after load", torch.cuda.memory_allocated())
def on_connect(self, sid, environ):
print('[{}] connet sid : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , sid))
# print('[{}] connet env : {}'.format(datetime.now().strftime('%Y-%m-%d %H:%M:%S') , environ))
def on_load_model(self, sid, msg):
print("on_load_model")
print(msg)
pass
def on_request_message(self, sid, msg):
# print("MESSGaa", msg)
print("on_request_message", torch.cuda.memory_allocated())
gpu = int(msg[0])
srcId = int(msg[1])
dstId = int(msg[2])
@ -223,7 +253,17 @@ if __name__ == '__main__':
# SocketIOセットアップ
sio = socketio.Server(cors_allowed_origins='*')
sio.register_namespace(MyCustomNamespace('/test', CONFIG, MODEL))
namespace = MyCustomNamespace('/test')
sio.register_namespace(namespace)
print("loadmodel1:")
namespace.loadModel(CONFIG, MODEL)
print("loadmodel2:")
namespace.loadModel(CONFIG, MODEL)
print("loadmodel3:")
namespace.loadModel(CONFIG, MODEL)
print("loadmodel4:")
namespace.loadModel(CONFIG, MODEL)
print("loadmodel5:")
app = socketio.WSGIApp(sio,static_files={
'': '../frontend/dist',
'/': '../frontend/dist/index.html',

View File

@ -12,32 +12,17 @@ if [[ -e ./setting.json ]]; then
echo "カスタムセッティングを使用"
cp ./setting.json ../frontend/dist/assets/setting.json
else
if [ "${TYPE}" = "SOFT_VC" ] ; then
cp ../frontend/dist/assets/setting_softvc.json ../frontend/dist/assets/setting.json
elif [ "${TYPE}" = "SOFT_VC_FAST_API" ] ; then
cp ../frontend/dist/assets/setting_softvc_colab.json ../frontend/dist/assets/setting.json
else
cp ../frontend/dist/assets/setting_mmvc.json ../frontend/dist/assets/setting.json
fi
cp ../frontend/dist/assets/setting_mmvc.json ../frontend/dist/assets/setting.json
fi
# 起動
if [ "${TYPE}" = "SOFT_VC" ] ; then
echo "SOFT_VCを起動します"
python3 SoftVcServerSIO.py $PARAMS 2>stderr.txt
elif [ "${TYPE}" = "SOFT_VC_VERBOSE" ] ; then
echo "SOFT_VCを起動します(verbose)"
python3 SoftVcServerSIO.py $PARAMS
elif [ "${TYPE}" = "SOFT_VC_FAST_API" ] ; then
echo "SOFT_VC_FAST_APIを起動します"
python3 SoftVcServerFastAPI.py 8080 docker
elif [ "${TYPE}" = "MMVC" ] ; then
if [ "${TYPE}" = "MMVC" ] ; then
echo "MMVCを起動します"
python3 serverSIO.py $PARAMS 2>stderr.txt
python3 MMVCServerSIO.py $PARAMS 2>stderr.txt
elif [ "${TYPE}" = "MMVC_VERBOSE" ] ; then
echo "MMVCを起動します(verbose)"
python3 serverSIO.py $PARAMS
python3 MMVCServerSIO.py $PARAMS
fi

View File

@ -6,21 +6,44 @@
"buffer_size": 1024,
"prefix_chunk_size": 24,
"chunk_size": 24,
"speaker_ids": [100, 107, 101, 102, 103],
"speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
"speakers": [
{
"id": 100,
"name": "ずんだもん"
},
{
"id": 107,
"name": "user"
},
{
"id": 101,
"name": "そら"
},
{
"id": 102,
"name": "めたん"
},
{
"id": 103,
"name": "つむぎ"
}
],
"src_id": 107,
"dst_id": 100,
"vf_enable": true,
"voice_changer_mode": "realtime",
"gpu": 0,
"available_gpus": [-1, 0, 1, 2, 3, 4],
"screen": {
"enable_screen": true,
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
},
"avatar": {
"enable_avatar": true,
"motion_capture_face": true,
"motion_capture_upperbody": true,
"lip_overwrite_with_voice": true,
"enable_avatar": false,
"motion_capture_face": false,
"motion_capture_upperbody": false,
"lip_overwrite_with_voice": false,
"avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
"background_color": "#0000dd",
"chroma_key": "#0000dd",
"avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
"cross_fade_offset_rate": 0.3,
"cross_fade_end_rate": 0.6,
"cross_fade_type": 2
},
"transcribe": {
"lang": "日本語(ja-JP)",
"expire_time": 5
}
}

View File

@ -6,21 +6,44 @@
"buffer_size": 1024,
"prefix_chunk_size": 24,
"chunk_size": 24,
"speaker_ids": [100, 107, 101, 102, 103],
"speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
"speakers": [
{
"id": 100,
"name": "ずんだもん"
},
{
"id": 107,
"name": "user"
},
{
"id": 101,
"name": "そら"
},
{
"id": 102,
"name": "めたん"
},
{
"id": 103,
"name": "つむぎ"
}
],
"src_id": 107,
"dst_id": 100,
"vf_enable": true,
"voice_changer_mode": "realtime",
"gpu": 0,
"available_gpus": [-1, 0, 1, 2, 3, 4],
"screen": {
"enable_screen": true,
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
},
"avatar": {
"enable_avatar": true,
"motion_capture_face": true,
"motion_capture_upperbody": true,
"lip_overwrite_with_voice": true,
"enable_avatar": false,
"motion_capture_face": false,
"motion_capture_upperbody": false,
"lip_overwrite_with_voice": false,
"avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
"background_color": "#0000dd",
"chroma_key": "#0000dd",
"avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
"cross_fade_offset_rate": 0.3,
"cross_fade_end_rate": 0.6,
"cross_fade_type": 2
},
"transcribe": {
"lang": "日本語(ja-JP)",
"expire_time": 5
}
}

File diff suppressed because one or more lines are too long

View File

@ -1,7 +1,7 @@
#!/bin/bash
set -eu
DOCKER_IMAGE=dannadori/voice-changer:20221028_220714
DOCKER_IMAGE=dannadori/voice-changer:20221103_180651
#DOCKER_IMAGE=voice-changer
@ -75,28 +75,11 @@ elif [ "${MODE}" = "MMVC" ]; then
# -p ${EX_PORT}:8080 $DOCKER_IMAGE /bin/bash
fi
elif [ "${MODE}" = "SOFT_VC" ]; then
if [ "${USE_GPU}" = "on" ]; then
echo "Start Soft-vc"
docker run -it --gpus all --shm-size=128M \
-v `pwd`/vc_resources:/resources \
-e LOCAL_UID=$(id -u $USER) \
-e LOCAL_GID=$(id -g $USER) \
-e EX_IP="`hostname -I`" \
-e EX_PORT=${EX_PORT} \
-e VERBOSE=${VERBOSE} \
-p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
else
echo "Start Soft-vc withou GPU is not supported"
fi
else
echo "
usage:
$0 <MODE> <params...>
MODE: select one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
MODE: select one of ['MMVC_TRAIN', 'MMVC']
" >&2
fi

314
start_v0.1.sh Normal file
View File

@ -0,0 +1,314 @@
#!/bin/bash
set -eu
DOCKER_IMAGE=dannadori/voice-changer:20221028_220714
#DOCKER_IMAGE=voice-changer
MODE=$1
PARAMS=${@:2:($#-1)}
### DEFAULT VAR ###
DEFAULT_EX_PORT=18888
DEFAULT_USE_GPU=on # on|off
DEFAULT_VERBOSE=off # on|off
### ENV VAR ###
EX_PORT=${EX_PORT:-${DEFAULT_EX_PORT}}
USE_GPU=${USE_GPU:-${DEFAULT_USE_GPU}}
VERBOSE=${VERBOSE:-${DEFAULT_VERBOSE}}
#echo $EX_PORT $USE_GPU $VERBOSE
### INTERNAL SETTING ###
TENSORBOARD_PORT=6006
SIO_PORT=8080
###
if [ "${MODE}" = "MMVC_TRAIN" ]; then
echo "トレーニングを開始します"
docker run -it --gpus all --shm-size=128M \
-v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
-v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
-v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
-v `pwd`/vc_resources:/resources \
-e LOCAL_UID=$(id -u $USER) \
-e LOCAL_GID=$(id -g $USER) \
-e EX_IP="`hostname -I`" \
-e EX_PORT=${EX_PORT} \
-e VERBOSE=${VERBOSE} \
-p ${EX_PORT}:6006 $DOCKER_IMAGE "$@"
elif [ "${MODE}" = "MMVC" ]; then
if [ "${USE_GPU}" = "on" ]; then
echo "MMVCを起動します(with gpu)"
docker run -it --gpus all --shm-size=128M \
-v `pwd`/vc_resources:/resources \
-e LOCAL_UID=$(id -u $USER) \
-e LOCAL_GID=$(id -g $USER) \
-e EX_IP="`hostname -I`" \
-e EX_PORT=${EX_PORT} \
-e VERBOSE=${VERBOSE} \
-p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
else
echo "MMVCを起動します(only cpu)"
docker run -it --shm-size=128M \
-v `pwd`/vc_resources:/resources \
-e LOCAL_UID=$(id -u $USER) \
-e LOCAL_GID=$(id -g $USER) \
-e EX_IP="`hostname -I`" \
-e EX_PORT=${EX_PORT} \
-e VERBOSE=${VERBOSE} \
-p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
# docker run -it --shm-size=128M \
# -v `pwd`/vc_resources:/resources \
# -e LOCAL_UID=$(id -u $USER) \
# -e LOCAL_GID=$(id -g $USER) \
# -e EX_IP="`hostname -I`" \
# -e EX_PORT=${EX_PORT} \
# -e VERBOSE=${VERBOSE} \
# --entrypoint="" \
# -p ${EX_PORT}:8080 $DOCKER_IMAGE /bin/bash
fi
elif [ "${MODE}" = "SOFT_VC" ]; then
if [ "${USE_GPU}" = "on" ]; then
echo "Start Soft-vc"
docker run -it --gpus all --shm-size=128M \
-v `pwd`/vc_resources:/resources \
-e LOCAL_UID=$(id -u $USER) \
-e LOCAL_GID=$(id -g $USER) \
-e EX_IP="`hostname -I`" \
-e EX_PORT=${EX_PORT} \
-e VERBOSE=${VERBOSE} \
-p ${EX_PORT}:8080 $DOCKER_IMAGE "$@"
else
echo "Start Soft-vc withou GPU is not supported"
fi
else
echo "
usage:
$0 <MODE> <params...>
MODE: select one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
" >&2
fi
# echo $EX_PORT
# echo "------"
# echo "$@"
# echo "------"
# # usage() {
# # echo "
# # usage:
# # For training
# # $0 [-t] -n <exp_name> [-b batch_size] [-r]
# # -t: トレーニングモードで実行する場合に指定してください。(train)
# # -n: トレーニングの名前です。(name)
# # -b: バッチサイズです。(batchsize)
# # -r: トレーニング再開の場合に指定してください。(resume)
# # For changing voice
# # $0 [-v] [-c config] [-m model] [-g on/off]
# # -v: ボイスチェンジャーモードで実行する場合に指定してください。(voice changer)
# # -c: トレーニングで使用したConfigのファイル名です。(config)
# # -m: トレーニング済みのモデルのファイル名です。(model)
# # -g: GPU使用/不使用。デフォルトはonなのでGPUを使う場合は指定不要。(gpu)
# # -p: port番号
# # For help
# # $0 [-h]
# # -h: show this help
# # " >&2
# # }
# # warn () {
# # echo "! ! ! $1 ! ! !"
# # exit 1
# # }
# # training_flag=false
# # name=999_exp
# # batch_size=10
# # resume_flag=false
# # voice_change_flag=false
# # config=
# # model=
# # gpu=on
# # port=8080
# # escape_flag=false
# # # オプション解析
# # while getopts tn:b:rvc:m:g:p:hx OPT; do
# # case $OPT in
# # t)
# # training_flag=true
# # ;;
# # n)
# # name="$OPTARG"
# # ;;
# # b)
# # batch_size="$OPTARG"
# # ;;
# # r)
# # resume_flag=true
# # ;;
# # v)
# # voice_change_flag=true
# # ;;
# # c)
# # config="$OPTARG"
# # ;;
# # m)
# # model="$OPTARG"
# # ;;
# # g)
# # gpu="$OPTARG"
# # ;;
# # p)
# # port="$OPTARG"
# # ;;
# # h | \?)
# # usage && exit 1
# # ;;
# # x)
# # escape_flag=true
# # esac
# # done
# # # モード解析
# # if $training_flag && $voice_change_flag; then
# # warn "-tトレーニングモード と -vボイチェンモードは同時に指定できません。"
# # elif $training_flag; then
# # echo "■■■ ト レ ー ニ ン グ モ ー ド ■■■"
# # elif $voice_change_flag; then
# # echo "■■■ ボ イ チ ェ ン モ ー ド ■■■"
# # elif $escape_flag; then
# # /bin/bash
# # else
# # warn "-tトレーニングモード と -vボイチェンモードのいずれかを指定してください。"
# # fi
# if [ "${MODE}" = "MMVC_TRAIN_INITIAL" ]; then
# echo "トレーニングを開始します"
# elif [ "${MODE}" = "MMVC" ]; then
# echo "MMVCを起動します"
# docker run -it --gpus all --shm-size=128M \
# -v `pwd`/vc_resources:/resources \
# -e LOCAL_UID=$(id -u $USER) \
# -e LOCAL_GID=$(id -g $USER) \
# -e EX_IP="`hostname -I`" \
# -e EX_PORT=${port} \
# -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
# elif [ "${MODE}" = "MMVC_VERBOSE" ]; then
# echo "MMVCを起動します(verbose)"
# elif [ "${MODE}" = "MMVC_CPU" ]; then
# echo "MMVCを起動します(CPU)"
# elif [ "${MODE}" = "MMVC_CPU_VERBOSE" ]; then
# echo "MMVCを起動します(CPU)(verbose)"
# elif [ "${MODE}" = "SOFT_VC" ]; then
# echo "Start Soft-vc"
# elif [ "${MODE}" = "SOFT_VC_VERBOSE" ]; then
# echo "Start Soft-vc(verbose)"
# else
# echo "
# usage:
# $0 <MODE> <params...>
# EX_PORT:
# MODE: one of ['MMVC_TRAIN', 'MMVC', 'SOFT_VC']
# For 'MMVC_TRAIN':
# $0 MMVC_TRAIN_INITIAL -n <exp_name> [-b batch_size] [-r]
# -n: トレーニングの名前です。(name)
# -b: バッチサイズです。(batchsize)
# -r: トレーニング再開の場合に指定してください。(resume)
# For 'MMVC'
# $0 MMVC [-c config] [-m model] [-g on/off] [-p port] [-v]
# -c: トレーニングで使用したConfigのファイル名です。(config)
# -m: トレーニング済みのモデルのファイル名です。(model)
# -g: GPU使用/不使用。デフォルトはonなのでGPUを使う場合は指定不要。(gpu)
# -p: Docker からExposeするport番号
# -v: verbose
# For 'SOFT_VC'
# $0 SOFT_VC [-c config] [-m model] [-g on/off]
# -p: port exposed from docker container.
# -v: verbose
# " >&2
# fi
# # if $training_flag; then
# # if $resume_flag; then
# # echo "トレーニングを再開します"
# # docker run -it --gpus all --shm-size=128M \
# # -v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
# # -v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
# # -v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
# # -v `pwd`/vc_resources:/resources \
# # -e LOCAL_UID=$(id -u $USER) \
# # -e LOCAL_GID=$(id -g $USER) \
# # -p ${TENSORBOARD_PORT}:6006 $DOCKER_IMAGE -t -b ${batch_size} -r
# # else
# # echo "トレーニングを開始します"
# # docker run -it --gpus all --shm-size=128M \
# # -v `pwd`/exp/${name}/dataset:/MMVC_Trainer/dataset \
# # -v `pwd`/exp/${name}/logs:/MMVC_Trainer/logs \
# # -v `pwd`/exp/${name}/filelists:/MMVC_Trainer/filelists \
# # -v `pwd`/vc_resources:/resources \
# # -e LOCAL_UID=$(id -u $USER) \
# # -e LOCAL_GID=$(id -g $USER) \
# # -p ${TENSORBOARD_PORT}:6006 $DOCKER_IMAGE -t -b ${batch_size}
# # fi
# # fi
# # if $voice_change_flag; then
# # if [[ -z "$config" ]]; then
# # warn "コンフィグファイル(-c)を指定してください"
# # fi
# # if [[ -z "$model" ]]; then
# # warn "モデルファイル(-m)を指定してください"
# # fi
# # if [ "${gpu}" = "on" ]; then
# # echo "GPUをマウントして起動します。"
# # docker run -it --gpus all --shm-size=128M \
# # -v `pwd`/vc_resources:/resources \
# # -e LOCAL_UID=$(id -u $USER) \
# # -e LOCAL_GID=$(id -g $USER) \
# # -e EX_IP="`hostname -I`" \
# # -e EX_PORT=${port} \
# # -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
# # elif [ "${gpu}" = "off" ]; then
# # echo "CPUのみで稼働します。GPUは使用できません。"
# # docker run -it --shm-size=128M \
# # -v `pwd`/vc_resources:/resources \
# # -e LOCAL_UID=$(id -u $USER) \
# # -e LOCAL_GID=$(id -g $USER) \
# # -e EX_IP="`hostname -I`" \
# # -e EX_PORT=${port} \
# # -p ${port}:8080 $DOCKER_IMAGE -v -c ${config} -m ${model}
# # else
# # echo ${gpu}
# # warn "-g は onかoffで指定して下さい。"
# # fi
# # fi

View File

@ -1,26 +1,49 @@
{
"app_title": "voice-changer",
"majar_mode": "docker",
"voice_changer_server_url": "./test",
"voice_changer_server_url": "/test",
"sample_rate": 48000,
"buffer_size": 1024,
"prefix_chunk_size": 24,
"chunk_size": 24,
"speaker_ids": [100, 107, 101, 102, 103],
"speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
"speakers": [
{
"id": 100,
"name": "ずんだもん"
},
{
"id": 107,
"name": "user"
},
{
"id": 101,
"name": "そら"
},
{
"id": 102,
"name": "めたん"
},
{
"id": 103,
"name": "つむぎ"
}
],
"src_id": 107,
"dst_id": 100,
"vf_enable": true,
"voice_changer_mode": "realtime",
"gpu": 0,
"available_gpus": [-1, 0, 1, 2, 3, 4],
"screen": {
"enable_screen": true,
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
},
"avatar": {
"enable_avatar": true,
"motion_capture_face": true,
"motion_capture_upperbody": true,
"lip_overwrite_with_voice": true,
"enable_avatar": false,
"motion_capture_face": false,
"motion_capture_upperbody": false,
"lip_overwrite_with_voice": false,
"avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
"background_color": "#0000dd",
"chroma_key": "#0000dd",
"avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
"cross_fade_offset_rate": 0.3,
"cross_fade_end_rate": 0.6,
"cross_fade_type": 2
},
"transcribe": {
"lang": "日本語(ja-JP)",
"expire_time": 5
}
}

View File

@ -4,23 +4,46 @@
"voice_changer_server_url": "/test",
"sample_rate": 48000,
"buffer_size": 1024,
"prefix_chunk_size": 36,
"chunk_size": 36,
"speaker_ids": [100, 107, 101, 102, 103],
"speaker_names": ["ずんだもん", "user", "そら", "めたん", "つむぎ"],
"prefix_chunk_size": 24,
"chunk_size": 24,
"speakers": [
{
"id": 100,
"name": "ずんだもん"
},
{
"id": 107,
"name": "user"
},
{
"id": 101,
"name": "そら"
},
{
"id": 102,
"name": "めたん"
},
{
"id": 103,
"name": "つむぎ"
}
],
"src_id": 107,
"dst_id": 100,
"vf_enable": true,
"voice_changer_mode": "realtime",
"gpu": 0,
"available_gpus": [-1, 0, 1, 2, 3, 4],
"screen": {
"enable_screen": true,
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg"
},
"avatar": {
"enable_avatar": true,
"motion_capture_face": true,
"motion_capture_upperbody": true,
"lip_overwrite_with_voice": true,
"enable_avatar": false,
"motion_capture_face": false,
"motion_capture_upperbody": false,
"lip_overwrite_with_voice": false,
"avatar_url": "./assets/vrm/zundamon/zundamon.vrm",
"backgournd_image_url": "./assets/images/bg_natural_sougen.jpg",
"background_color": "#0000dd",
"chroma_key": "#0000dd",
"avatar_canvas_size": [1280, 720],
@ -34,5 +57,9 @@
"cross_fade_offset_rate": 0.3,
"cross_fade_end_rate": 0.6,
"cross_fade_type": 2
},
"transcribe": {
"lang": "日本語(ja-JP)",
"expire_time": 5
}
}

View File

@ -1,4 +1,4 @@
FROM dannadori/voice-changer-internal:20221028_220538 as front
FROM dannadori/voice-changer-internal:20221103_180551 as front
FROM debian:bullseye-slim as base
ARG DEBIAN_FRONTEND=noninteractive
@ -8,7 +8,7 @@ RUN apt-get install -y python3-pip git
RUN apt-get install -y espeak
RUN apt-get install -y cmake
RUN git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3
#RUN git clone --depth 1 https://github.com/isletennos/MMVC_Trainer.git -b v1.3.1.3
RUN pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
@ -24,17 +24,20 @@ RUN pip install tqdm==4.64.0
RUN pip install retry==0.9.2
RUN pip install psutil==5.9.1
RUN pip install python-socketio==5.7.1
RUN pip install eventlet==0.33.1
RUN pip install matplotlib==3.5.3
RUN pip install fastapi==0.85.0
RUN pip install python-multipart==0.0.5
RUN pip install uvicorn==0.18.3
RUN pip install websockets==10.4
RUN pip install pyOpenSSL==22.0.0
RUN pip install pyopenjtalk==0.2.0
RUN pip install tensorboard==2.10.0
RUN pip install matplotlib==3.5.3
RUN pip install pyOpenSSL==22.0.0
WORKDIR /MMVC_Trainer/monotonic_align
RUN cythonize -3 -i core.pyx \
&& mv core.cpython-39-x86_64-linux-gnu.so monotonic_align/
# WORKDIR /MMVC_Trainer/monotonic_align
# RUN cythonize -3 -i core.pyx \
# && mv core.cpython-39-x86_64-linux-gnu.so monotonic_align/
FROM debian:bullseye-slim
@ -64,12 +67,11 @@ COPY --from=front --chmod=777 /voice-changer-internal/frontend/dist /voice-chang
COPY --from=front --chmod=777 /voice-changer-internal/voice-change-service /voice-changer-internal/voice-change-service
RUN chmod 0777 /voice-changer-internal/voice-change-service
##### Soft VC
COPY --from=front /hubert /hubert
COPY --from=front /acoustic-model /acoustic-model
COPY --from=front /hifigan /hifigan
COPY --from=front /models /models
# ##### Soft VC
# COPY --from=front /hubert /hubert
# COPY --from=front /acoustic-model /acoustic-model
# COPY --from=front /hifigan /hifigan
# COPY --from=front /models /models
ENTRYPOINT ["/bin/bash", "setup.sh"]

View File

@ -17,23 +17,7 @@ echo "------"
# 起動
if [ "${MODE}" = "SOFT_VC" ] ; then
cd /voice-changer-internal/voice-change-service
cp -r /resources/* .
if [[ -e ./setting.json ]]; then
cp ./setting.json ../frontend/dist/assets/setting.json
else
cp ../frontend/dist/assets/setting_softvc.json ../frontend/dist/assets/setting.json
fi
if [ "${VERBOSE}" = "on" ]; then
echo "SOFT_VCを起動します(verbose)"
python3 SoftVcServerSIO.py $PARAMS
else
echo "SOFT_VCを起動します"
python3 SoftVcServerSIO.py $PARAMS 2>stderr.txt
fi
elif [ "${MODE}" = "MMVC" ] ; then
if [ "${MODE}" = "MMVC" ] ; then
cd /voice-changer-internal/voice-change-service
cp -r /resources/* .
@ -45,10 +29,10 @@ elif [ "${MODE}" = "MMVC" ] ; then
if [ "${VERBOSE}" = "on" ]; then
echo "MMVCを起動します(verbose)"
python3 serverSIO.py $PARAMS
python3 MMVCServerSIO.py $PARAMS
else
echo "MMVCを起動します"
python3 serverSIO.py $PARAMS 2>stderr.txt
python3 MMVCServerSIO.py $PARAMS $PARAMS 2>stderr.txt
fi
elif [ "${MODE}" = "MMVC_TRAIN" ] ; then
python3 create_dataset_jtalk.py -f train_config -s 24000 -m dataset/multi_speaker_correspondence.txt