diff --git a/VoiceRecorder.ipynb b/VoiceRecorder.ipynb new file mode 100644 index 00000000..30ec9d9d --- /dev/null +++ b/VoiceRecorder.ipynb @@ -0,0 +1,646 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "collapsed_sections": [], + "authorship_tag": "ABX9TyNuz5ToQB/hiwJTFCBOyGT/", + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "gpuClass": "standard" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Voice Recorder\n", + "---\n", + "\n", + "このノートでは、MMVCのトレーニング用の音声を録画するアプリ\"Voice Recorder\"をColab上から起動します。\n", + "\n", + "録音された音声はこのノートを通してGoogle Drive上にアップロードすることができます。\n", + "\n", + "また、従来のVoice Recorderと同様にローカルPCにダウンロードすることもできます。\n", + "\n", + "録音後にブラウザとcolab上のサーバ間でやり取りを行うので、更新に少しタイムラグが発生します。\n", + "\n", + "ご自身のPCでトレーニングを行う予定の場合は、colab上のサーバで録音するメリットはほぼありませんので、より快適な録音をするために[こちらのgithub上のVoice Recorder](https://w-okada.github.io/voice-changer/)をご使用ください。\n", + "\n", + "\n", + "より詳細な情報はこちらの[リポジトリ](https://github.com/w-okada/voice-changer)からご確認いただけます。\n" + ], + "metadata": { + "id": "Lbbmx_Vjl0zo" + } + }, + { + "cell_type": "markdown", + "source": [ + "# 録音データを格納するフォルダを指定\n", + "\n", + "フォルダは次の二つを指定する必要があります。\n", + "1. 録音アプリ用のキャッシュデータ格納フォルダ\n", + "2. トレーニングデータの格納フォルダ\n", + "\n", + "通常、録音データはGoogle Drive上のフォルダに格納すると思います。\n", + "\n", + "まずは(1-1)を実行してドライブをマウントしてください。\n", + "\n", + "その後、(1-2)で上記の格納フォルダを指定してください。" + ], + "metadata": { + "id": "mHvGrgaWnIPA" + } + }, + { + "cell_type": "code", + "source": [ + "# (1-1) Google Driveのマウント\n", + "from google.colab import drive\n", + "drive.mount('/content/drive')" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Eihm8H2X-7wm", + "outputId": "e51016e6-7f6e-4b95-8822-a4713017a6a6" + }, + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mounted at /content/drive\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# (1-2) 使用するモデルとコンフィグファイルの指定\n", + "RECORDER_DATA_DIR=\"/content/drive/MyDrive/VoiceChanger/voice_data\"\n", + "MMVC_DATA_DIR=\"/content/drive/MyDrive/VoiceChanger/dataset\"\n" + ], + "metadata": { + "id": "nSXATMWYb4Ik" + }, + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# リポジトリのクローン\n", + "リポジトリをクローンします" + ], + "metadata": { + "id": "sLBfykjBnjWc" + } + }, + { + "cell_type": "code", + "source": [ + "# (2) リポジトリのクローン\n", + "!git clone https://github.com/w-okada/voice-changer.git\n", + "%cd voice-changer/docs/\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "86wTFmqsNMnD", + "outputId": "63e02151-2e55-49f3-8219-ba16cbb28233" + }, + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cloning into 'voice-changer'...\n", + "remote: Enumerating objects: 499, done.\u001b[K\n", + "remote: Counting objects: 100% (83/83), done.\u001b[K\n", + "remote: Compressing objects: 100% (65/65), done.\u001b[K\n", + "remote: Total 499 (delta 26), reused 30 (delta 18), pack-reused 416\u001b[K\n", + "Receiving objects: 100% (499/499), 21.10 MiB | 13.43 MiB/s, done.\n", + "Resolving deltas: 100% (253/253), done.\n", + "/content/voice-changer/docs\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# ファイルの配置\n", + "アプリケーションの挙動を記した設定ファイルをコピーします(3-1)。(3-2)はコピーした設定ファイルを表示しています。もしかしたらうまく動かないときに役立つかもしれません。" + ], + "metadata": { + "id": "jmDY8W_fnuSi" + } + }, + { + "cell_type": "code", + "source": [ + "# (3-1) 設定ファイルのコピー\n", + "!cp ../template/setting_recorder_colab.json assets/setting.json" + ], + "metadata": { + "id": "ow88ZaubluOJ" + }, + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# (3-2) 設定ファイルの内容確認\n", + "\n", + "!cat assets/setting.json" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rpWUobjlBCNF", + "outputId": "0dd8bbc1-dd1e-47fe-fef6-fbc22540dc7a" + }, + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{\n", + " \"app_title\": \"voice-recorder\",\n", + " \"storage_type\":\"server\",\n", + " \"use_mel_spectrogram\":true,\n", + " \"text\": [\n", + " {\n", + " \"title\": \"ITA-emotion\",\n", + " \"wavPrefix\": \"emotion\",\n", + " \"file\": \"./assets/text/ITA_emotion_all.txt\",\n", + " \"file_hira\": \"./assets/text/ITA_emotion_all_hira.txt\"\n", + " },\n", + " {\n", + " \"title\": \"ITA-recitation\",\n", + " \"wavPrefix\": \"recitation\",\n", + " \"file\": \"./assets/text/ITA_recitation_all.txt\",\n", + " \"file_hira\": \"./assets/text/ITA_recitation_all_hira.txt\"\n", + " },\n", + " {\n", + " \"title\": \"wagahaiwa\",\n", + " \"wavPrefix\": \"wagahaiwa\",\n", + " \"file\": \"./assets/text/wagahaiwa.txt\",\n", + " \"file_hira\": \"./assets/text/wagahaiwa_hira.txt\"\n", + " }\n", + " ]\n", + "}\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# モジュールのインストール\n", + "\n", + "必要なモジュールをインストールします。" + ], + "metadata": { + "id": "8Na2PbLZSWgZ" + } + }, + { + "cell_type": "code", + "source": [ + "# (4) 設定ファイルの確認\n", + "!pip install flask\n", + "!pip install flask_cors\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LwZAAuqxX7yY", + "outputId": "627e09e8-bc64-4110-ce0a-5b3f84e8bf1d" + }, + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Requirement already satisfied: flask in /usr/local/lib/python3.7/dist-packages (1.1.4)\n", + "Requirement already satisfied: click<8.0,>=5.1 in /usr/local/lib/python3.7/dist-packages (from flask) (7.1.2)\n", + "Requirement already satisfied: itsdangerous<2.0,>=0.24 in /usr/local/lib/python3.7/dist-packages (from flask) (1.1.0)\n", + "Requirement already satisfied: Werkzeug<2.0,>=0.15 in /usr/local/lib/python3.7/dist-packages (from flask) (1.0.1)\n", + "Requirement already satisfied: Jinja2<3.0,>=2.10.1 in /usr/local/lib/python3.7/dist-packages (from flask) (2.11.3)\n", + "Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2<3.0,>=2.10.1->flask) (2.0.1)\n", + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Collecting flask_cors\n", + " Downloading Flask_Cors-3.0.10-py2.py3-none-any.whl (14 kB)\n", + "Requirement already satisfied: Flask>=0.9 in /usr/local/lib/python3.7/dist-packages (from flask_cors) (1.1.4)\n", + "Requirement already satisfied: Six in /usr/local/lib/python3.7/dist-packages (from flask_cors) (1.15.0)\n", + "Requirement already satisfied: itsdangerous<2.0,>=0.24 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.9->flask_cors) (1.1.0)\n", + "Requirement already satisfied: Werkzeug<2.0,>=0.15 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.9->flask_cors) (1.0.1)\n", + "Requirement already satisfied: click<8.0,>=5.1 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.9->flask_cors) (7.1.2)\n", + "Requirement already satisfied: Jinja2<3.0,>=2.10.1 in /usr/local/lib/python3.7/dist-packages (from Flask>=0.9->flask_cors) (2.11.3)\n", + "Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2<3.0,>=2.10.1->Flask>=0.9->flask_cors) (2.0.1)\n", + "Installing collected packages: flask-cors\n", + "Successfully installed flask-cors-3.0.10\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# サーバの起動\n", + "\n", + "サーバを起動します。(5-1) \n", + "\n", + "サーバの起動状況を確認します。(5-2) \n", + "\n", + "このセルは繰り返し実行することになるのでCtrl+Retでセルを実行してください。\n", + "\n", + "アクセスできるようになるまで、数秒かかります。\n", + "\n", + "下記のようなテキストが表示されたら起動完了です。\n", + "\n", + "```\n", + "[2022-09-13 22:20:49,936] INFO in recorderServer: START APP\n", + " * Serving Flask app \"recorderServer\" (lazy loading)\n", + " * Environment: production\n", + " WARNING: This is a development server. Do not use it in a production deployment.\n", + " Use a production WSGI server instead.\n", + " * Debug mode: on\n", + "[2022-09-13 22:20:49,946] INFO in _internal: * Running on http://0.0.0.0:8018/ (Press CTRL+C to quit)\n", + "[2022-09-13 22:20:49,947] INFO in _internal: * Restarting with stat\n", + "[2022-09-13 22:20:50,166] INFO in recorderServer: START APP\n", + "[2022-09-13 22:20:50,174] WARNING in _internal: * Debugger is active!\n", + "[2022-09-13 22:20:50,200] INFO in _internal: * Debugger PIN: 334-166-753\n", + "```\n", + "\n" + ], + "metadata": { + "id": "-_2OcN9Borke" + } + }, + { + "cell_type": "code", + "source": [ + "# (5-1) サーバの起動\n", + "PORT=8018\n", + "get_ipython().system_raw(f'python3 recorderServer.py {PORT} {RECORDER_DATA_DIR} >foo 2>&1 &')" + ], + "metadata": { + "id": "iNOAB7zISI6J" + }, + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# (5-2) サーバの起動確認 (Ctrl+Retで実行)\n", + "!cat foo" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "chu06KpAjEK6", + "outputId": "a42873c4-2826-4b54-f497-01adb1683875" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[2022-09-13 22:45:22,054] INFO in recorderServer: START APP\n", + " * Serving Flask app \"recorderServer\" (lazy loading)\n", + " * Environment: production\n", + " WARNING: This is a development server. Do not use it in a production deployment.\n", + " Use a production WSGI server instead.\n", + " * Debug mode: on\n", + "[2022-09-13 22:45:22,062] INFO in _internal: * Running on http://0.0.0.0:8018/ (Press CTRL+C to quit)\n", + "[2022-09-13 22:45:22,063] INFO in _internal: * Restarting with stat\n", + "[2022-09-13 22:45:22,238] INFO in recorderServer: START APP\n", + "[2022-09-13 22:45:22,244] WARNING in _internal: * Debugger is active!\n", + "[2022-09-13 22:45:22,268] INFO in _internal: * Debugger PIN: 334-166-753\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# プロキシを起動\n", + "ウェブサーバへのアクセスをするためのプロキシを起動します。\n", + "\n", + "表示されたURLをクリックして開くと別タブでアプリが開きます。\n", + "\n", + "Colabなので、ロードにある程度時間がかかります(30秒くらい)。" + ], + "metadata": { + "id": "WhxcFLQEpctq" + } + }, + { + "cell_type": "code", + "source": [ + "# (7) プロキシを起動\n", + "from google.colab import output\n", + "\n", + "output.serve_kernel_port_as_window(PORT)" + ], + "metadata": { + "id": "nkRjZm95l87C", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "768df0ee-9499-430b-ab4f-c602311114ae" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "application/javascript": [ + "(async (port, path, text, element) => {\n", + " if (!google.colab.kernel.accessAllowed) {\n", + " return;\n", + " }\n", + " element.appendChild(document.createTextNode(''));\n", + " const url = await google.colab.kernel.proxyPort(port);\n", + " const anchor = document.createElement('a');\n", + " anchor.href = new URL(path, url).toString();\n", + " anchor.target = '_blank';\n", + " anchor.setAttribute('data-href', url + path);\n", + " anchor.textContent = text;\n", + " element.appendChild(anchor);\n", + " })(8018, \"/\", \"https://localhost:8018/\", window.element)" + ] + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# トレーニング用データフォルダ\n", + "\n", + "以下、トレーニング用のフォルダを作成します。\n", + "\n", + "\n" + ], + "metadata": { + "id": "ZGuYYN7oCSM4" + } + }, + { + "cell_type": "code", + "source": [ + "corpus_id = \"14oXoQqLxRkP8NJK8qMYGee1_q2uEED1z\"\n", + "\n", + "data_setting = [\n", + " [\"user\", \"\", \"\", \"00_myvoice\", \"107\"],\n", + " [\"zundamon\", \"1h8Ajyvoig7Hl3LSSt2vYX0sUHX3JDF3R\", \"1205_zundamon\", \"01_target_zundamon\", \"100\"],\n", + " [\"tsumugi\", \"14zE0F_5ZCQWXf6m6SUPF5Y3gpL6yb7zk\", \"344_tsumugi\", \"02_target_tsumugi\", \"103\"],\n", + " [\"metan\", \"1iCrpzhqXm-0YdktOPM8M1pMtgQIDF3r4\", \"459_methane\", \"03_target_metan\", \"102\"],\n", + " [\"sora\", \"1MXfMRG_sjbsaLihm7wEASG2PwuCponZF\", \"912_sora\", \"04_target_ksora\", \"101\"],\n", + "]" + ], + "metadata": { + "id": "3PhrmCD2LaCH" + }, + "execution_count": 43, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import os, glob\n", + "\n", + "os.makedirs(MMVC_DATA_DIR, exist_ok=True)\n", + "speaker_list = os.path.join(MMVC_DATA_DIR, \"multi_speaker_correspondence.txt\")\n", + "!echo \"00_myvoice|107\" > {speaker_list}\n", + "!echo \"01_target_zundamon|100\" >> {speaker_list}\n", + "!echo \"02_target_tsumugi|103\" >> {speaker_list}\n", + "!echo \"03_target_metan|102\" >> {speaker_list}\n", + "!echo \"04_target_ksora|101\" >> {speaker_list}\n", + "\n", + "!cat {speaker_list}\n", + "\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "f5l6ggSyACLs", + "outputId": "4db3571a-46e6-4fd9-c560-628cf4af9284" + }, + "execution_count": 57, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "00_myvoice|107\n", + "01_target_zundamon|100\n", + "02_target_tsumugi|103\n", + "03_target_metan|102\n", + "04_target_ksora|101\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "!rm -rf /content/drive/MyDrive/VoiceChanger/train_data/00_myvoice/wav/*" + ], + "metadata": { + "id": "UEVb2GGZSesY" + }, + "execution_count": 71, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import gdown\n", + "\n", + "gdown.download(f'https://drive.google.com/uc?id={corpus_id}', f'ita_corpus.zip', quiet=False)\n", + "!unzip -oq 'ita_corpus.zip'\n", + "\n", + "for chara in data_setting:\n", + " chara_root_dir = os.path.join(MMVC_DATA_DIR, chara[3])\n", + " os.makedirs(chara_root_dir, exist_ok=True)\n", + " \n", + " chara_text_dir = os.path.join(chara_root_dir, \"text\")\n", + " os.makedirs(chara_text_dir, exist_ok=True)\n", + " chara_wav_dir = os.path.join(chara_root_dir, \"wav\")\n", + " os.makedirs(chara_wav_dir, exist_ok=True)\n", + "\n", + " if chara[0] != \"user\":\n", + " gdown.download(f'https://drive.google.com/uc?id={chara[1]}', f'{chara[0]}.zip', quiet=False)\n", + " !unzip -f '{chara[0]}.zip'\n", + " !cp -rf {chara[2]}/* {chara_root_dir}/\n", + "\n", + " if chara[0] == \"user\":\n", + " !cp MMVC向けITAコーパス文章ファイル_配布用/ITA_emotion_hira_100file/* {chara_text_dir}\n", + " !cp MMVC向けITAコーパス文章ファイル_配布用/ITA_recitation_hira_324file/* {chara_text_dir}\n", + "\n", + " file_list = [os.path.abspath(p) for p in glob.glob(f\"{RECORDER_DATA_DIR}/*/*.zip\")]\n", + " for f in list(file_list):\n", + " # print(f)\n", + " basename = os.path.basename(f)\n", + " wavname = os.path.splitext(basename)[0] + \".wav\"\n", + " full_path = os.path.join(chara_wav_dir, wavname)\n", + " # print(basename, wavname, full_path)\n", + " !unzip -oq {f} vf24kTrim.wav\n", + " !cp vf24kTrim.wav {full_path}\n", + "\n", + "\n", + "\n", + "\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "L8UsVp3dDs4R", + "outputId": "5d640caf-87b0-45a6-aa0c-76295e537f6a" + }, + "execution_count": 73, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading...\n", + "From: https://drive.google.com/uc?id=14oXoQqLxRkP8NJK8qMYGee1_q2uEED1z\n", + "To: /content/voice-changer/docs/ita_corpus.zip\n", + "100%|██████████| 1.20M/1.20M [00:00<00:00, 87.9MB/s]\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "/content/drive/MyDrive/VoiceChanger/voice_data/ITA-emotion/emotion000.zip\n", + "/content/drive/MyDrive/VoiceChanger/voice_data/ITA-emotion/emotion002.zip\n", + "/content/drive/MyDrive/VoiceChanger/voice_data/ITA-emotion/emotion001.zip\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading...\n", + "From: https://drive.google.com/uc?id=1h8Ajyvoig7Hl3LSSt2vYX0sUHX3JDF3R\n", + "To: /content/voice-changer/docs/zundamon.zip\n", + "100%|██████████| 55.6M/55.6M [00:00<00:00, 251MB/s]\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Archive: zundamon.zip\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading...\n", + "From: https://drive.google.com/uc?id=14zE0F_5ZCQWXf6m6SUPF5Y3gpL6yb7zk\n", + "To: /content/voice-changer/docs/tsumugi.zip\n", + "100%|██████████| 73.0M/73.0M [00:00<00:00, 226MB/s]\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Archive: tsumugi.zip\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading...\n", + "From: https://drive.google.com/uc?id=1iCrpzhqXm-0YdktOPM8M1pMtgQIDF3r4\n", + "To: /content/voice-changer/docs/metan.zip\n", + "100%|██████████| 51.8M/51.8M [00:00<00:00, 219MB/s]\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Archive: metan.zip\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading...\n", + "From: https://drive.google.com/uc?id=1MXfMRG_sjbsaLihm7wEASG2PwuCponZF\n", + "To: /content/voice-changer/docs/sora.zip\n", + "100%|██████████| 70.2M/70.2M [00:00<00:00, 184MB/s]\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Archive: sora.zip\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "yHmaXx31EOta" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file