Merge branch 'master' into tutorial_for_rvc

This commit is contained in:
nadare 2023-04-28 20:12:37 +09:00 committed by GitHub
commit 7db3047999
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
80 changed files with 6243 additions and 1420 deletions

View File

@ -4,13 +4,18 @@
## What's New! ## What's New!
- v.1.5.2.5
- RVC: Support pitch-less model and rvc-webui model
- so-vits-svc40: some bugfix
- v.1.5.2.4a - v.1.5.2.4a
- Fix: Export ONNX - Fix: Export ONNX
- v.1.5.2.4 - v.1.5.2.4
- RVC で複数も出出る切り替えを実装 - RVC で複数モデル切り替えを実装
- 通信路を 48KHz に固定 - 通信路を 48KHz に固定
# VC Client とは # VC Client とは
@ -21,12 +26,13 @@
- [MMVC](https://github.com/isletennos/MMVC_Trainer) - [MMVC](https://github.com/isletennos/MMVC_Trainer)
- [so-vits-svc](https://github.com/svc-develop-team/so-vits-svc) - [so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)
- [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI) - [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
- [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC)
2. 本ソフトウェアは、ネットワークを介した利用も可能であり、ゲームなどの高負荷なアプリケーションと同時に使用する場合などに音声変換処理の負荷を外部にオフロードすることができます。 2. 本ソフトウェアは、ネットワークを介した利用も可能であり、ゲームなどの高負荷なアプリケーションと同時に使用する場合などに音声変換処理の負荷を外部にオフロードすることができます。
![image](https://user-images.githubusercontent.com/48346627/206640768-53f6052d-0a96-403b-a06c-6714a0b7471d.png) ![image](https://user-images.githubusercontent.com/48346627/206640768-53f6052d-0a96-403b-a06c-6714a0b7471d.png)
3. 複数のプラットフォーに対応しています。 3. 複数のプラットフォーに対応しています。
- Windows, Mac(M1), Linux, Google Colab (MMVC のみ) - Windows, Mac(M1), Linux, Google Colab (MMVC のみ)
@ -59,14 +65,10 @@ Windows 版と Mac 版を提供しています。
- Windows 版は、ダウンロードした zip ファイルを解凍して、`start_http.bat`を実行してください。 - Windows 版は、ダウンロードした zip ファイルを解凍して、`start_http.bat`を実行してください。
- Mac 版はダウンロードファイルを解凍したのちに、`startHttp.command`を実行してください。開発元を検証できない旨が示される場合は、再度コントロールキーを押してクリックして実行してください(or 右クリックから実行してください)。(詳細下記 \*1 - Mac 版はダウンロードファイルを解凍したのちに、`startHttp.command`を実行してください。開発元を検証できない旨が示される場合は、再度コントロールキーを押してクリックして実行してください(or 右クリックから実行してください)。
- リモートから接続する場合は、`.bat`ファイル(win)、`.command`ファイル(mac)の http が https に置き換わっているものを使用してください。 - リモートから接続する場合は、`.bat`ファイル(win)、`.command`ファイル(mac)の http が https に置き換わっているものを使用してください。
- Windows 環境で Nvidia の GPU をお持ちの方は多くの場合は `ONNX(cpu,cuda), PyTorch(cpu,cuda)`版で動きます。
- Windows 環境で Nvidia の GPU をお持ちでない方は多くの場合は `ONNX(cpu,DirectML), PyTorch(cpu) `版で動きます。
- つくよみちゃん、あみたろ、黄琴まひろ、黄琴海月、の動作には content vec のモデルが必要となります。こちらの[リポジトリ](https://github.com/auspicious3000/contentvec)から、ContentVec_legacy 500 のモデルをダウンロードして、実行する`startHttp.command`や`start_http.bat`と同じフォルダに配置してください。 - つくよみちゃん、あみたろ、黄琴まひろ、黄琴海月、の動作には content vec のモデルが必要となります。こちらの[リポジトリ](https://github.com/auspicious3000/contentvec)から、ContentVec_legacy 500 のモデルをダウンロードして、実行する`startHttp.command`や`start_http.bat`と同じフォルダに配置してください。
- so-vits-svc 4.0/so-vits-svc 4.0v2、RVC(Retrieval-based-Voice-Conversion)の動作には hubert のモデルが必要になります。[このリポジトリ](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main)から`hubert_base.pt`をダウンロードして、バッチファイルがあるフォルダに格納してください。 - so-vits-svc 4.0/so-vits-svc 4.0v2、RVC(Retrieval-based-Voice-Conversion)の動作には hubert のモデルが必要になります。[このリポジトリ](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main)から`hubert_base.pt`をダウンロードして、バッチファイルがあるフォルダに格納してください。
@ -74,14 +76,16 @@ Windows 版と Mac 版を提供しています。
- DDSP-SVC の動作には、hubert-soft と enhancer のモデルが必要です。hubert-soft は[このリンク](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)からダウンロードして、バッチファイルがあるフォルダに格納してください。enhancer は[このサイト](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1)から`nsf_hifigan_20221211.zip`ダウンロードして下さい。解凍すると出てくる`nsf_hifigan`というフォルダをバッチファイルがあるフォルダに格納してください。 - DDSP-SVC の動作には、hubert-soft と enhancer のモデルが必要です。hubert-soft は[このリンク](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)からダウンロードして、バッチファイルがあるフォルダに格納してください。enhancer は[このサイト](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1)から`nsf_hifigan_20221211.zip`ダウンロードして下さい。解凍すると出てくる`nsf_hifigan`というフォルダをバッチファイルがあるフォルダに格納してください。
- DDPS-SVC の encoder は hubert-soft のみ対応です。 - DDPS-SVC の encoder は hubert-soft のみ対応です。
| Version | OS | フレームワーク | link | サポート VC | サイズ | - RVC で使用する場合の GUI の各項目説明は[こちら](tutorials/tutorial_rvc_ja.md)をご覧ください
| ---------- | --- | --------------------------------- | ------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------- | ------ |
| v.1.5.2.4a | mac | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1fR86gRWalhpi8kQURJmMfWuDvi53V2Ah&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1lttvCgnZengcKkP4f0O2UBAVOcOph4b2&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
| v.1.5.2.4 | mac | ONNX(cpu,cuda), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1UC0n6Lgyy4ugPznJ-Erd7lskKaOE6--X&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1OmSug85MUR58cnYo_P6Xe_GtNAG7PkKO&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
※ [hugging_face](https://huggingface.co/wok000/vcclient/tree/main)でも公開experimental - ダウンロードはこちらから。
| Version | OS | フレームワーク | link | サポート VC | サイズ |
| --------- | --- | --------------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------ |
| v.1.5.2.6 | mac | ONNX(cpu), PyTorch(cpu,mps) | [通常](https://drive.google.com/uc?id=1NTdtBeKU1bdQKP0_LpbmU3xAjuua1dCT&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 784MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1XdoMQoghBOjW__rE2a02zMyQDz8Gi56n&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2861MB |
(\*1) Google Drive からダウンロードできない方は[hugging_face](https://huggingface.co/wok000/vcclient000/tree/main)からダウンロードしてみてください
- 各キャラクター専用(近々 RVC 版として提供予定) - 各キャラクター専用(近々 RVC 版として提供予定)
@ -99,12 +103,6 @@ Windows 版と Mac 版を提供しています。
\*2 解凍や起動が遅い場合、ウィルス対策ソフトのチェックが走っている可能性があります。ファイルやフォルダを対象外にして実行してみてください。(自己責任です) \*2 解凍や起動が遅い場合、ウィルス対策ソフトのチェックが走っている可能性があります。ファイルやフォルダを対象外にして実行してみてください。(自己責任です)
\*3 本ソフトウェアは開発元の署名しておりません。下記のように警告が出ますが、コントロールキーを押しながらアイコンをクリックすると実行できるようになります。これは Apple のセキュリティポリシーによるものです。実行は自己責任となります。
![image](https://user-images.githubusercontent.com/48346627/212567711-c4a8d599-e24c-4fa3-8145-a5df7211f023.png)
https://user-images.githubusercontent.com/48346627/212569645-e30b7f4e-079d-4504-8cf8-7816c5f40b00.mp4
## (3) Docker や Anaconda など環境構築を行った上での利用 ## (3) Docker や Anaconda など環境構築を行った上での利用
本リポジトリをクローンして利用します。Windows では WSL2 の環境構築が必須になります。また、WSL2 上で Docker もしくは Anaconda などの仮想環境の構築が必要となります。Mac では Anaconda などの Python の仮想環境の構築が必要となります。事前準備が必要となりますが、多くの環境においてこの方法が一番高速で動きます。**<font color="red"> GPU が無くてもそこそこ新しい CPU であれば十分動く可能性があります </font>(下記のリアルタイム性の節を参照)**。 本リポジトリをクローンして利用します。Windows では WSL2 の環境構築が必須になります。また、WSL2 上で Docker もしくは Anaconda などの仮想環境の構築が必要となります。Mac では Anaconda などの Python の仮想環境の構築が必要となります。事前準備が必要となりますが、多くの環境においてこの方法が一番高速で動きます。**<font color="red"> GPU が無くてもそこそこ新しい CPU であれば十分動く可能性があります </font>(下記のリアルタイム性の節を参照)**。
@ -117,7 +115,7 @@ Docker での実行は、[Docker を使用する](docker_vcclient/README.md)を
Anaconda の仮想環境上での実行は、[サーバ開発者向けのページ](README_dev_ja.md)を参考にサーバを起動してください。 Anaconda の仮想環境上での実行は、[サーバ開発者向けのページ](README_dev_ja.md)を参考にサーバを起動してください。
## リアルタイム性 # リアルタイム性MMVC
GPU を使用するとほとんどタイムラグなく変換可能です。 GPU を使用するとほとんどタイムラグなく変換可能です。
@ -129,6 +127,12 @@ https://twitter.com/DannadoriYellow/status/1613553862773997569?s=20&t=7CLD79h1F3
古い CPU( i7-4770)だと、1000msec くらいかかってしまう。 古い CPU( i7-4770)だと、1000msec くらいかかってしまう。
# 開発者の署名について
本ソフトウェアは開発元の署名しておりません。下記のように警告が出ますが、コントロールキーを押しながらアイコンをクリックすると実行できるようになります。これは Apple のセキュリティポリシーによるものです。実行は自己責任となります。
![image](https://user-images.githubusercontent.com/48346627/212567711-c4a8d599-e24c-4fa3-8145-a5df7211f023.png)
# Acknowledgments # Acknowledgments
- [立ちずんだもん素材](https://seiga.nicovideo.jp/seiga/im10792934) - [立ちずんだもん素材](https://seiga.nicovideo.jp/seiga/im10792934)
@ -185,28 +189,3 @@ Github Pages 上で実行できるため、ブラウザのみあれば様々な
[録音アプリ on Github Pages](https://w-okada.github.io/voice-changer/) [録音アプリ on Github Pages](https://w-okada.github.io/voice-changer/)
[解説動画](https://youtu.be/s_GirFEGvaA) [解説動画](https://youtu.be/s_GirFEGvaA)
# 過去バージョン
| Version | OS | フレームワーク | link | サポート VC | サイズ |
| ---------- | --- | --------------------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------ |
| v.1.5.2.3a | mac | ONNX(cpu,cuda), PyTorch(cpu,mps) | [通常](https://drive.google.com/uc?id=1Ll6_m2ArZrOhwvbqz4lcHNVFFJnZXHRk&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 797MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1sZhcrx6sZmmBnfXz_jFEr9Wqez2DGhgj&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
| v.1.5.2.3 | mac | ONNX(cpu,cuda), PyTorch(cpu,mps) | [standard](https://drive.google.com/uc?id=1isX5N9FyC125D5FynJ7NuMnjBCf5dAll&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 798MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [standard](https://drive.google.com/uc?id=1UezbE-QTa5jK4mXHRvZz4w07qRnMaPL5&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
| v.1.5.2.2 | mac | ONNX(cpu), PyTorch(cpu) | [通常](https://drive.google.com/uc?id=1dbAiGkPtGWWcQDNL0IHXl4OyTRZR8SIQ&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 635MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1vIGnrhrU6d_HjvD6JqyWZKT0NruISdj3&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 2795MB |
| v.1.5.2.1 | mac | ONNX(cpu), PyTorch(cpu) | [通常](https://drive.google.com/uc?id=1jaK1ZBdvFpnMmi0PBV8zETw7OY28cKI2&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 635MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1F7WUSO5P7PT77Zw5xD8pK6KMYFJNV9Ip&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 2794MB |
| Version | OS | フレームワーク | link | サポート VC | サイズ |
| ----------- | ------------------------------------- | ------------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------- | ------ |
| v.1.5.1.15b | <span style="color: blue;">win</span> | ONNX(cpu,cuda), PyTorch(cpu) | [通常](https://drive.google.com/uc?id=1nb5DxHQJqnYgzWFTBNxCDOx64__uQqyR&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, RVC | 773MB |
| | <span style="color: blue;">win</span> | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=197U6ip9ypBSyxhIf3oGnkWfBP-M3Gc12&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 2794MB |
| | <span style="color: blue;">win</span> | ONNX(cpu,DirectML), PyTorch(cpu) | [通常](https://drive.google.com/uc?id=18Q9CDBnjgTHwOeklVLWAVMFZI-kk9j3l&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, RVC | 488MB |
| | <span style="color: blue;">win</span> | ONNX(cpu,DirectML), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1rlGewdhvenv1Yn3WFOLcsWQeuo8ecIQ1&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 2665MB |
| | <span style="color: red;">mac</span> | ONNX(cpu), PyTorch(cpu) | [normal](https://drive.google.com/uc?id=1saAe8vycI4zv0LRbvNmFLfYt0utGRWyZ&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 615MB |
| Version | OS | フレームワーク | link | サポート VC | サイズ |
| ----------- | ------------------------------------- | --------------------------------- | ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------- | ------ |
| v.1.5.1.15a | <span style="color: blue;">win</span> | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1lCo4P3D3QVvrl-0DRh305e34d_YmsI10&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC | 2641MB |

View File

@ -4,6 +4,11 @@
## What's New! ## What's New!
- v.1.5.2.5
- RVC: Support pitch-less model and rvc-webui model
- so-vits-svc40: some bugfix
- v.1.5.2.4a - v.1.5.2.4a
- Fix: Export ONNX - Fix: Export ONNX
@ -15,21 +20,21 @@
# What is VC Client # What is VC Client
[VC Client](https://github.com/w-okada/voice-changer) is a client software for real-time voice changers that uses AI such as [MMVC](https://github.com/isletennos/MMVC_Trainer) and [so-vits-svc](https://github.com/svc-develop-team/so-vits-svc), [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI). It also provides an app for recording training audio for real-time voice changers, specifically for MMVC. 1. This is a client software for performing real-time voice conversion using various Voice Conversion (VC) AI. The supported AI for voice conversion are as follows.
# Features - [MMVC](https://github.com/isletennos/MMVC_Trainer)
- [so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)
- [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI)
- [DDSP-SVC](https://github.com/yxlllc/DDSP-SVC)
1. Cross-platform compatibility 2. Distribute the load by running Voice Changer on a different PC
Supports Windows, Mac (including Apple Silicon M1), Linux, and Google Colaboratory.
2. No need to install a separate audio recording app
Audio recording can be done directly on the application hosted on Github Pages. Since it runs entirely on the browser, there is no need to install any special application. Additionally, since it works entirely as a browser application, no data is sent to the server.
3. Distribute the load by running Voice Changer on a different PC
The real-time voice changer of this application works on a server-client configuration. By running the MMVC server on a separate PC, you can run it while minimizing the impact on other resource-intensive processes such as gaming commentary. The real-time voice changer of this application works on a server-client configuration. By running the MMVC server on a separate PC, you can run it while minimizing the impact on other resource-intensive processes such as gaming commentary.
![image](https://user-images.githubusercontent.com/48346627/206640768-53f6052d-0a96-403b-a06c-6714a0b7471d.png) ![image](https://user-images.githubusercontent.com/48346627/206640768-53f6052d-0a96-403b-a06c-6714a0b7471d.png)
3. Cross-platform compatibility
Supports Windows, Mac (including Apple Silicon M1), Linux, and Google Colaboratory.
# usage # usage
Details are summarized [here](https://zenn.dev/wok/books/0004_vc-client-v_1_5_1_x). Details are summarized [here](https://zenn.dev/wok/books/0004_vc-client-v_1_5_1_x).
@ -58,30 +63,28 @@ You can run it on Google's machine learning platform, Colaboratory. If you have
You can download and run executable binaries. You can download and run executable binaries.
We offer Windows and Mac versions. We offer Windows and Mac versions.
- For Mac version, after unzipping the downloaded file, double-click the `startHttp.command` file corresponding to your VC. If a message indicating that the developer cannot be verified is displayed, please press the control key and click to run it again (or right-click to run it). (Details below \* 1)
- For Windows user, after unzipping the downloaded zip file, please run the `start_http.bat` file corresponding to your VC. - For Windows user, after unzipping the downloaded zip file, please run the `start_http.bat` file corresponding to your VC.
- For Mac version, after unzipping the downloaded file, double-click the `startHttp.command` file corresponding to your VC. If a message indicating that the developer cannot be verified is displayed, please press the control key and click to run it again (or right-click to run it).
- If you are connecting remotely, please use the `.command` file (Mac) or `.bat` file (Windows) with https instead of http. - If you are connecting remotely, please use the `.command` file (Mac) or `.bat` file (Windows) with https instead of http.
- If you have an Nvidia GPU on Windows, it will usually work with the `ONNX(cpu,cuda),PyTorch(cpu)` version. In rare cases, the GPU may not be recognized, in which case please use the `ONNX(cpu,cuda), PyTorch(cpu,cuda)` version (which is much larger in size). - Tsukuyomi-chan, Ami-taro, Kogane Mahiro, and Kogane Kaigetsu require the Content Vec model for their actions. Please download the ContentVec_legacy 500 model from [this repository](https://github.com/auspicious3000/contentvec) and place it in the same folder as startHttp.command or start_http.bat to execute it.
- If you do not have an Nvidia GPU on Windows, it will usually work with the `ONNX(cpu,DirectML), PyTorch(cpu)` version. - For the operation of RVC (Retrieval-based-Voice-Conversion) on so-vits-svc 4.0/so-vits-svc 4.0v2, a model of hubert is required. Please download `hubert_base.pt` from [this repository](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main) and store it in the folder where the batch files are located.
- If you are using `so-vits-svc 4.0`/`so-vits-svc 4.0v2` on Windows, please use the `ONNX(cpu,cuda), PyTorch(cpu,cuda)` version. - To run DDSP-SVC, you need to download the hubert-soft and enhancer models. Download hubert-soft from [this link](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt) and store it in the folder with the batch files. Download nsf_hifigan_20221211.zip from [this site](https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1) for enhancer. After unzipping, store the nsf_hifigan folder in the folder with the batch files.
- To use `so-vits-svc 4.0`/`so-vits-svc 4.0v2` or `tsukuyomi-chan`, you need the content vec model. Please download the ContentVec_legacy 500 model from [this repository](https://github.com/auspicious3000/contentvec), and place it in the same folder as `startHttp_xxx.command` or `start_http_xxx.bat` to run. - The encoder of DDPS-SVC only supports hubert-soft.
- You need to have the hubert model to use RVC(Retrieval-based-Voice-Conversion). Please download `hubert_base.pt` from [this repository](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main) and store it in the folder where the batch file is located. - Please refer to [here](tutorials/tutorial_rvc_en.md) for the description of each item of GUI to be used in RVC.
| Version | OS | Framework | link | VC Support | Size | - Download (When you cannot download from google drive, try [hugging_face](https://huggingface.co/wok000/vcclient000/tree/main))
| ---------- | --- | --------------------------------- | ------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------- | ------ |
| v.1.5.2.4a | mac | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1fR86gRWalhpi8kQURJmMfWuDvi53V2Ah&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1lttvCgnZengcKkP4f0O2UBAVOcOph4b2&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
| v.1.5.2.4 | mac | ONNX(cpu,cuda), PyTorch(cpu,mps) | [normal](https://drive.google.com/uc?id=1UC0n6Lgyy4ugPznJ-Erd7lskKaOE6--X&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 795MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [normal](https://drive.google.com/uc?id=1OmSug85MUR58cnYo_P6Xe_GtNAG7PkKO&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2871MB |
\*\*\* [hugging_face](https://huggingface.co/wok000/vcclient/tree/main) (experimental) | Version | OS | フレームワーク | link | サポート VC | サイズ |
| --------- | --- | --------------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------ |
| v.1.5.2.6 | mac | ONNX(cpu), PyTorch(cpu,mps) | [通常](https://drive.google.com/uc?id=1NTdtBeKU1bdQKP0_LpbmU3xAjuua1dCT&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, RVC | 784MB |
| | win | ONNX(cpu,cuda), PyTorch(cpu,cuda) | [通常](https://drive.google.com/uc?id=1XdoMQoghBOjW__rE2a02zMyQDz8Gi56n&export=download) | MMVC v.1.5.x, MMVC v.1.3.x, so-vits-svc 4.0, so-vits-svc 4.0v2, RVC, DDSP-SVC | 2861MB |
| Version | OS | Framework | link | VC Support | Size | | Version | OS | Framework | link | VC Support | Size |
| ---------- | ------------------------------------- | --------- | -------------------------------------------------------------------------------------------------- | ---------- | ----- | | ---------- | ------------------------------------- | --------- | -------------------------------------------------------------------------------------------------- | ---------- | ----- |
@ -94,17 +97,9 @@ We offer Windows and Mac versions.
| | <span style="color: blue;">win</span> | - | [Kikoto Kurage](https://drive.google.com/uc?id=1fiymPcoYzwE1yxyIfC_FTPiFfGEC2jA8&export=download) | - | 823MB | | | <span style="color: blue;">win</span> | - | [Kikoto Kurage](https://drive.google.com/uc?id=1fiymPcoYzwE1yxyIfC_FTPiFfGEC2jA8&export=download) | - | 823MB |
| | <span style="color: blue;">win</span> | - | [Amitaro](https://drive.google.com/uc?id=1Vt4WBEOAz0EhIWs3ZRFIcg7ELtSHnYfe&export=download) | - | 821MB | | | <span style="color: blue;">win</span> | - | [Amitaro](https://drive.google.com/uc?id=1Vt4WBEOAz0EhIWs3ZRFIcg7ELtSHnYfe&export=download) | - | 821MB |
\*1 MMVC v.1.5.x is Experimental. \*1 Tsukuyo Michan uses free character "Tsukuyo Michan" voice data that is publicly available for free. (Details such as terms of use are at the end of the document)
\*2 Tsukuyo Michan uses free character "Tsukuyo Michan" voice data that is publicly available for free. (Details such as terms of use are at the end of the document) \*2 If unpacking or starting is slow, there is a possibility that virus checking is running on your antivirus software. Please try running it with the file or folder excluded from the target. (At your own risk)
\*3 If unpacking or starting is slow, there is a possibility that virus checking is running on your antivirus software. Please try running it with the file or folder excluded from the target. (At your own risk)
\*4 This software is not signed by the developer. A warning message will appear, but you can run the software by clicking the icon while holding down the control key. This is due to Apple's security policy. Running the software is at your own risk.
![image](https://user-images.githubusercontent.com/48346627/212567711-c4a8d599-e24c-4fa3-8145-a5df7211f023.png)
https://user-images.githubusercontent.com/48346627/212569645-e30b7f4e-079d-4504-8cf8-7816c5f40b00.mp4
## (2-3) Usage after setting up the environment such as Docker or Anaconda ## (2-3) Usage after setting up the environment such as Docker or Anaconda
@ -118,7 +113,7 @@ To run docker, see [start docker](docker_vcclient/README_en.md).
To run on Anaconda venv, see [server developer's guide](README_dev_en.md) To run on Anaconda venv, see [server developer's guide](README_dev_en.md)
## Real-time performance # Real-time performance
Conversion is almost instantaneous when using GPU. Conversion is almost instantaneous when using GPU.
@ -130,6 +125,14 @@ https://twitter.com/DannadoriYellow/status/1613553862773997569?s=20&t=7CLD79h1F3
With an old CPU (i7-4770), it takes about 1000 msec for conversion. With an old CPU (i7-4770), it takes about 1000 msec for conversion.
# Software Signing
This software is not signed by the developer. A warning message will appear, but you can run the software by clicking the icon while holding down the control key. This is due to Apple's security policy. Running the software is at your own risk.
![image](https://user-images.githubusercontent.com/48346627/212567711-c4a8d599-e24c-4fa3-8145-a5df7211f023.png)
https://user-images.githubusercontent.com/48346627/212569645-e30b7f4e-079d-4504-8cf8-7816c5f40b00.mp4
# Acknowledgments # Acknowledgments
- [Tachizunda-mon materials](https://seiga.nicovideo.jp/seiga/im10792934) - [Tachizunda-mon materials](https://seiga.nicovideo.jp/seiga/im10792934)

View File

@ -43,6 +43,7 @@
"showFeature": false, "showFeature": false,
"showIndex": false, "showIndex": false,
"showHalfPrecision": false, "showHalfPrecision": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true, "defaultEnablePyTorch": true,
"showOnnxExportButton": false "showOnnxExportButton": false

View File

@ -40,6 +40,7 @@
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": false "defaultEnablePyTorch": false
} }
}, },

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": true, "showCorrespondence": true,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": false "defaultEnablePyTorch": false
} }
}, },

View File

@ -36,6 +36,14 @@
{ {
"name": "onnxExport", "name": "onnxExport",
"options": {} "options": {}
},
{
"name": "onnxExecutor",
"options": {}
},
{
"name": "modelSamplingRate",
"options": {}
} }
], ],
"modelSetting": [ "modelSetting": [
@ -43,29 +51,23 @@
"name": "modelUploader", "name": "modelUploader",
"options": { "options": {
"showModelSlot": true, "showModelSlot": true,
"showFrameworkSelector": false,
"showConfig": false, "showConfig": false,
"showOnnx": true, "oneModelFileType": true,
"showPyTorch": true, "showOnnx": false,
"showPyTorch": false,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showFeature": true, "showFeature": true,
"showIndex": true, "showIndex": true,
"showHalfPrecision": true, "showHalfPrecision": true,
"showPyTorchEnableCheckBox": false,
"defaultEnablePyTorch": true, "defaultEnablePyTorch": true,
"onlySelectedFramework": true,
"showDefaultTune": true "showDefaultTune": true
} }
},
{
"name": "framework",
"options": {
"showFramework": true
}
},
{
"name": "modelSamplingRate",
"options": {}
} }
], ],
"deviceSetting": [ "deviceSetting": [

View File

@ -0,0 +1,183 @@
{
"type": "demo",
"id": "RVC",
"front": {
"title": [
{
"name": "title",
"options": {
"mainTitle": "Realtime Voice Changer Client",
"subTitle": "for RVC",
"lineNum": 1
}
},
{
"name": "clearSetting",
"options": {}
}
],
"serverControl": [
{
"name": "startButton",
"options": {}
},
{
"name": "performance",
"options": {}
},
{
"name": "serverInfo",
"options": {}
},
{
"name": "modelSwitch",
"options": {}
},
{
"name": "onnxExport",
"options": {}
}
],
"modelSetting": [
{
"name": "modelUploader",
"options": {
"showModelSlot": true,
"showConfig": false,
"showOnnx": true,
"showPyTorch": true,
"showCorrespondence": false,
"showPyTorchCluster": false,
"showFeature": true,
"showIndex": true,
"showHalfPrecision": true,
"defaultEnablePyTorch": true,
"showDefaultTune": true
}
},
{
"name": "framework",
"options": {
"showFramework": true
}
},
{
"name": "modelSamplingRate",
"options": {}
}
],
"deviceSetting": [
{
"name": "audioInput",
"options": {}
},
{
"name": "audioOutput",
"options": {}
}
],
"qualityControl": [
{
"name": "noiseControl",
"options": {}
},
{
"name": "gainControl",
"options": {}
},
{
"name": "f0Detector",
"options": {
"detectors": ["pm", "harvest"]
}
},
{
"name": "divider",
"options": {}
},
{
"name": "analyzer",
"options": {}
}
],
"speakerSetting": [
{
"name": "dstId",
"options": {
"showF0": true,
"useServerInfo": false
}
},
{
"name": "tune",
"options": {}
},
{
"name": "indexRatio",
"options": {}
},
{
"name": "silentThreshold",
"options": {}
}
],
"converterSetting": [
{
"name": "inputChunkNum",
"options": {}
},
{
"name": "extraDataLength",
"options": {}
},
{
"name": "gpu",
"options": {}
}
],
"advancedSetting": [
{
"name": "protocol",
"options": {}
},
{
"name": "crossFadeOverlapSize",
"options": {}
},
{
"name": "crossFadeOffsetRate",
"options": {}
},
{
"name": "crossFadeEndRate",
"options": {}
},
{
"name": "trancateNumThreshold",
"options": {}
},
{
"name": "rvcQuality",
"options": {}
},
{
"name": "silenceFront",
"options": {}
}
]
},
"dialogs": {
"license": [
{
"title": "Retrieval-based-Voice-Conversion-WebUI",
"auther": "liujing04",
"contact": "",
"url": "https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI",
"license": "MIT"
}
]
}
}

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": true, "showPyTorchCluster": true,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true "defaultEnablePyTorch": true
} }
}, },

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": true, "showPyTorchCluster": true,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true "defaultEnablePyTorch": true
} }
}, },

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -23,7 +23,7 @@
"@babel/preset-env": "^7.21.4", "@babel/preset-env": "^7.21.4",
"@babel/preset-react": "^7.18.6", "@babel/preset-react": "^7.18.6",
"@babel/preset-typescript": "^7.21.4", "@babel/preset-typescript": "^7.21.4",
"@types/node": "^18.15.13", "@types/node": "^18.16.0",
"@types/react": "^18.0.38", "@types/react": "^18.0.38",
"@types/react-dom": "^18.0.11", "@types/react-dom": "^18.0.11",
"autoprefixer": "^10.4.14", "autoprefixer": "^10.4.14",
@ -40,7 +40,7 @@
"npm-run-all": "^4.1.5", "npm-run-all": "^4.1.5",
"postcss-loader": "^7.2.4", "postcss-loader": "^7.2.4",
"postcss-nested": "^6.0.1", "postcss-nested": "^6.0.1",
"prettier": "^2.8.7", "prettier": "^2.8.8",
"rimraf": "^5.0.0", "rimraf": "^5.0.0",
"style-loader": "^3.3.2", "style-loader": "^3.3.2",
"ts-loader": "^9.4.2", "ts-loader": "^9.4.2",
@ -3691,9 +3691,9 @@
"license": "MIT" "license": "MIT"
}, },
"node_modules/@types/node": { "node_modules/@types/node": {
"version": "18.15.13", "version": "18.16.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.15.13.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-18.16.0.tgz",
"integrity": "sha512-N+0kuo9KgrUQ1Sn/ifDXsvg0TTleP7rIy4zOBGECxAljqvqfqpTfzx0Q1NUedOixRMBfe2Whhb056a42cWs26Q==" "integrity": "sha512-BsAaKhB+7X+H4GnSjGhJG9Qi8Tw+inU9nJDwmD5CgOmBLEI6ArdhikpLX7DjbjDRDTbqZzU2LSQNZg8WGPiSZQ=="
}, },
"node_modules/@types/prop-types": { "node_modules/@types/prop-types": {
"version": "15.7.5", "version": "15.7.5",
@ -8489,9 +8489,10 @@
} }
}, },
"node_modules/prettier": { "node_modules/prettier": {
"version": "2.8.7", "version": "2.8.8",
"resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.8.tgz",
"integrity": "sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==",
"dev": true, "dev": true,
"license": "MIT",
"bin": { "bin": {
"prettier": "bin-prettier.js" "prettier": "bin-prettier.js"
}, },
@ -13358,9 +13359,9 @@
"dev": true "dev": true
}, },
"@types/node": { "@types/node": {
"version": "18.15.13", "version": "18.16.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.15.13.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-18.16.0.tgz",
"integrity": "sha512-N+0kuo9KgrUQ1Sn/ifDXsvg0TTleP7rIy4zOBGECxAljqvqfqpTfzx0Q1NUedOixRMBfe2Whhb056a42cWs26Q==" "integrity": "sha512-BsAaKhB+7X+H4GnSjGhJG9Qi8Tw+inU9nJDwmD5CgOmBLEI6ArdhikpLX7DjbjDRDTbqZzU2LSQNZg8WGPiSZQ=="
}, },
"@types/prop-types": { "@types/prop-types": {
"version": "15.7.5", "version": "15.7.5",
@ -16405,7 +16406,9 @@
"dev": true "dev": true
}, },
"prettier": { "prettier": {
"version": "2.8.7", "version": "2.8.8",
"resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.8.tgz",
"integrity": "sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==",
"dev": true "dev": true
}, },
"prettier-linter-helpers": { "prettier-linter-helpers": {

View File

@ -23,7 +23,7 @@
"@babel/preset-env": "^7.21.4", "@babel/preset-env": "^7.21.4",
"@babel/preset-react": "^7.18.6", "@babel/preset-react": "^7.18.6",
"@babel/preset-typescript": "^7.21.4", "@babel/preset-typescript": "^7.21.4",
"@types/node": "^18.15.13", "@types/node": "^18.16.0",
"@types/react": "^18.0.38", "@types/react": "^18.0.38",
"@types/react-dom": "^18.0.11", "@types/react-dom": "^18.0.11",
"autoprefixer": "^10.4.14", "autoprefixer": "^10.4.14",
@ -40,7 +40,7 @@
"npm-run-all": "^4.1.5", "npm-run-all": "^4.1.5",
"postcss-loader": "^7.2.4", "postcss-loader": "^7.2.4",
"postcss-nested": "^6.0.1", "postcss-nested": "^6.0.1",
"prettier": "^2.8.7", "prettier": "^2.8.8",
"rimraf": "^5.0.0", "rimraf": "^5.0.0",
"style-loader": "^3.3.2", "style-loader": "^3.3.2",
"ts-loader": "^9.4.2", "ts-loader": "^9.4.2",

View File

@ -43,6 +43,7 @@
"showFeature": false, "showFeature": false,
"showIndex": false, "showIndex": false,
"showHalfPrecision": false, "showHalfPrecision": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true, "defaultEnablePyTorch": true,
"showOnnxExportButton": false "showOnnxExportButton": false

View File

@ -40,6 +40,7 @@
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": false "defaultEnablePyTorch": false
} }
}, },

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": true, "showCorrespondence": true,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": false "defaultEnablePyTorch": false
} }
}, },

View File

@ -36,6 +36,14 @@
{ {
"name": "onnxExport", "name": "onnxExport",
"options": {} "options": {}
},
{
"name": "onnxExecutor",
"options": {}
},
{
"name": "modelSamplingRate",
"options": {}
} }
], ],
"modelSetting": [ "modelSetting": [
@ -43,29 +51,23 @@
"name": "modelUploader", "name": "modelUploader",
"options": { "options": {
"showModelSlot": true, "showModelSlot": true,
"showFrameworkSelector": false,
"showConfig": false, "showConfig": false,
"showOnnx": true, "oneModelFileType": true,
"showPyTorch": true, "showOnnx": false,
"showPyTorch": false,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": false, "showPyTorchCluster": false,
"showFeature": true, "showFeature": true,
"showIndex": true, "showIndex": true,
"showHalfPrecision": true, "showHalfPrecision": true,
"showPyTorchEnableCheckBox": false,
"defaultEnablePyTorch": true, "defaultEnablePyTorch": true,
"onlySelectedFramework": true,
"showDefaultTune": true "showDefaultTune": true
} }
},
{
"name": "framework",
"options": {
"showFramework": true
}
},
{
"name": "modelSamplingRate",
"options": {}
} }
], ],
"deviceSetting": [ "deviceSetting": [

View File

@ -0,0 +1,183 @@
{
"type": "demo",
"id": "RVC",
"front": {
"title": [
{
"name": "title",
"options": {
"mainTitle": "Realtime Voice Changer Client",
"subTitle": "for RVC",
"lineNum": 1
}
},
{
"name": "clearSetting",
"options": {}
}
],
"serverControl": [
{
"name": "startButton",
"options": {}
},
{
"name": "performance",
"options": {}
},
{
"name": "serverInfo",
"options": {}
},
{
"name": "modelSwitch",
"options": {}
},
{
"name": "onnxExport",
"options": {}
}
],
"modelSetting": [
{
"name": "modelUploader",
"options": {
"showModelSlot": true,
"showConfig": false,
"showOnnx": true,
"showPyTorch": true,
"showCorrespondence": false,
"showPyTorchCluster": false,
"showFeature": true,
"showIndex": true,
"showHalfPrecision": true,
"defaultEnablePyTorch": true,
"showDefaultTune": true
}
},
{
"name": "framework",
"options": {
"showFramework": true
}
},
{
"name": "modelSamplingRate",
"options": {}
}
],
"deviceSetting": [
{
"name": "audioInput",
"options": {}
},
{
"name": "audioOutput",
"options": {}
}
],
"qualityControl": [
{
"name": "noiseControl",
"options": {}
},
{
"name": "gainControl",
"options": {}
},
{
"name": "f0Detector",
"options": {
"detectors": ["pm", "harvest"]
}
},
{
"name": "divider",
"options": {}
},
{
"name": "analyzer",
"options": {}
}
],
"speakerSetting": [
{
"name": "dstId",
"options": {
"showF0": true,
"useServerInfo": false
}
},
{
"name": "tune",
"options": {}
},
{
"name": "indexRatio",
"options": {}
},
{
"name": "silentThreshold",
"options": {}
}
],
"converterSetting": [
{
"name": "inputChunkNum",
"options": {}
},
{
"name": "extraDataLength",
"options": {}
},
{
"name": "gpu",
"options": {}
}
],
"advancedSetting": [
{
"name": "protocol",
"options": {}
},
{
"name": "crossFadeOverlapSize",
"options": {}
},
{
"name": "crossFadeOffsetRate",
"options": {}
},
{
"name": "crossFadeEndRate",
"options": {}
},
{
"name": "trancateNumThreshold",
"options": {}
},
{
"name": "rvcQuality",
"options": {}
},
{
"name": "silenceFront",
"options": {}
}
]
},
"dialogs": {
"license": [
{
"title": "Retrieval-based-Voice-Conversion-WebUI",
"auther": "liujing04",
"contact": "",
"url": "https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI",
"license": "MIT"
}
]
}
}

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": true, "showPyTorchCluster": true,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true "defaultEnablePyTorch": true
} }
}, },

View File

@ -39,7 +39,7 @@
"showPyTorch": true, "showPyTorch": true,
"showCorrespondence": false, "showCorrespondence": false,
"showPyTorchCluster": true, "showPyTorchCluster": true,
"showPyTorchEnableCheckBox": true,
"defaultEnablePyTorch": true "defaultEnablePyTorch": true
} }
}, },

View File

@ -42,6 +42,7 @@ import { DstIdRow2, DstIdRow2Props } from "./components/602v2_DstIdRow2"
import { SilenceFrontRow, SilenceFrontRowProps } from "./components/812_SilenceFrontRow" import { SilenceFrontRow, SilenceFrontRowProps } from "./components/812_SilenceFrontRow"
import { ModelSwitchRow, ModelSwitchRowProps } from "./components/204_ModelSwitchRow" import { ModelSwitchRow, ModelSwitchRowProps } from "./components/204_ModelSwitchRow"
import { ONNXExportRow, ONNXExportRowProps } from "./components/205_ONNXExportRow" import { ONNXExportRow, ONNXExportRowProps } from "./components/205_ONNXExportRow"
import { ONNXExecutorRow, ONNXExecutorRowProps } from "./components/206_ONNXExecutorRow"
export const catalog: { [key: string]: (props: any) => JSX.Element } = {} export const catalog: { [key: string]: (props: any) => JSX.Element } = {}
@ -68,6 +69,7 @@ const initialize = () => {
addToCatalog("serverInfo", (props: ServerInfoRowProps) => { return <ServerInfoRow {...props} /> }) addToCatalog("serverInfo", (props: ServerInfoRowProps) => { return <ServerInfoRow {...props} /> })
addToCatalog("modelSwitch", (props: ModelSwitchRowProps) => { return <ModelSwitchRow {...props} /> }) addToCatalog("modelSwitch", (props: ModelSwitchRowProps) => { return <ModelSwitchRow {...props} /> })
addToCatalog("onnxExport", (props: ONNXExportRowProps) => { return <ONNXExportRow {...props} /> }) addToCatalog("onnxExport", (props: ONNXExportRowProps) => { return <ONNXExportRow {...props} /> })
addToCatalog("onnxExecutor", (props: ONNXExecutorRowProps) => { return <ONNXExecutorRow {...props} /> })

View File

@ -1,3 +1,4 @@
import { Framework } from "@dannadori/voice-changer-client-js"
import React, { useMemo } from "react" import React, { useMemo } from "react"
import { useAppState } from "../../../001_provider/001_AppStateProvider" import { useAppState } from "../../../001_provider/001_AppStateProvider"
@ -9,23 +10,40 @@ export const ModelSwitchRow = (_props: ModelSwitchRowProps) => {
const appState = useAppState() const appState = useAppState()
const modelSwitchRow = useMemo(() => { const modelSwitchRow = useMemo(() => {
const slot = appState.serverSetting.serverSetting.modelSlotIndex
const onSwitchModelClicked = (index: number) => { const onSwitchModelClicked = async (index: number, filename: string) => {
const framework: Framework = filename.endsWith(".onnx") ? "ONNX" : "PyTorch"
appState.serverSetting.updateServerSettings({ ...appState.serverSetting.serverSetting, modelSlotIndex: index })
// Quick hack for same slot is selected. 下桁が実際のSlotID
const dummyModelSlotIndex = (Math.floor(Date.now() / 1000)) * 1000 + index
await appState.serverSetting.updateServerSettings({ ...appState.serverSetting.serverSetting, modelSlotIndex: dummyModelSlotIndex, framework: framework })
} }
let filename = ""
const modelOptions = appState.serverSetting.serverSetting.modelSlots.map((x, index) => { const modelOptions = appState.serverSetting.serverSetting.modelSlots.map((x, index) => {
const className = index == slot ? "body-button-active left-margin-1" : "body-button left-margin-1"
let filename = ""
if (x.pyTorchModelFile && x.pyTorchModelFile.length > 0) { if (x.pyTorchModelFile && x.pyTorchModelFile.length > 0) {
filename = x.pyTorchModelFile.replace(/^.*[\\\/]/, '') filename = x.pyTorchModelFile.replace(/^.*[\\\/]/, '')
return <div key={index} className="body-button left-margin-1" onClick={() => { onSwitchModelClicked(index) }}>{filename}</div>
} else if (x.onnxModelFile && x.onnxModelFile.length > 0) { } else if (x.onnxModelFile && x.onnxModelFile.length > 0) {
filename = x.onnxModelFile.replace(/^.*[\\\/]/, '') filename = x.onnxModelFile.replace(/^.*[\\\/]/, '')
return <div key={index} className="body-button left-margin-1" onClick={() => { onSwitchModelClicked(index) }}>{filename}</div>
} else { } else {
return <div key={index} ></div> return <div key={index} ></div>
} }
const f0str = x.f0 == true ? "f0" : "nof0"
const srstr = Math.floor(x.samplingRate / 1000) + "K"
const embedstr = x.embChannels
const typestr = x.modelType == 0 ? "org" : "webui"
const metadata = x.deprecated ? "[deprecated version]" : `[${f0str},${srstr},${embedstr},${typestr}]`
return (
<div key={index} className={className} onClick={() => { onSwitchModelClicked(index, filename) }}>
<div>
{filename}
</div>
<div>{metadata}</div>
</div>
)
}) })

View File

@ -12,6 +12,10 @@ export const ONNXExportRow = (_props: ONNXExportRowProps) => {
const guiState = useGuiState() const guiState = useGuiState()
const onnxExporthRow = useMemo(() => { const onnxExporthRow = useMemo(() => {
if (appState.serverSetting.serverSetting.framework != "PyTorch") {
return <></>
}
const onnxExportButtonAction = async () => { const onnxExportButtonAction = async () => {
if (guiState.isConverting) { if (guiState.isConverting) {

View File

@ -0,0 +1,42 @@
import { OnnxExecutionProvider } from "@dannadori/voice-changer-client-js"
import React, { useMemo } from "react"
import { useAppState } from "../../../001_provider/001_AppStateProvider"
export type ONNXExecutorRowProps = {
}
export const ONNXExecutorRow = (_props: ONNXExecutorRowProps) => {
const appState = useAppState()
const onnxExecutorRow = useMemo(() => {
if (appState.serverSetting.serverSetting.framework != "ONNX") {
return <></>
}
const onOnnxExecutionProviderChanged = async (val: OnnxExecutionProvider) => {
appState.serverSetting.updateServerSettings({ ...appState.serverSetting.serverSetting, onnxExecutionProvider: val })
}
return (
<div className="body-row split-3-7 left-padding-1">
<div className="body-item-title left-padding-2">OnnxExecutionProvider</div>
<div className="body-select-container">
<select className="body-select" value={appState.serverSetting.serverSetting.onnxExecutionProvider} onChange={(e) => {
onOnnxExecutionProviderChanged(e.target.value as
OnnxExecutionProvider)
}}>
{
Object.values(OnnxExecutionProvider).map(x => {
return <option key={x} value={x}>{x}</option>
})
}
</select>
</div>
</div>
)
}, [appState.getInfo, appState.serverSetting.serverSetting])
return onnxExecutorRow
}

View File

@ -0,0 +1,73 @@
import React, { useMemo } from "react"
import { fileSelector } from "@dannadori/voice-changer-client-js"
import { useAppState } from "../../../001_provider/001_AppStateProvider"
import { useGuiState } from "../001_GuiStateProvider"
export const ModelSelectRow = () => {
const appState = useAppState()
const guiState = useGuiState()
const onnxSelectRow = useMemo(() => {
const slot = guiState.modelSlotNum
const fileUploadSetting = appState.serverSetting.fileUploadSettings[slot]
if (!fileUploadSetting) {
return <></>
}
const onnxModelFilenameText = fileUploadSetting.onnxModel?.filename || fileUploadSetting.onnxModel?.file?.name || ""
const pyTorchFilenameText = fileUploadSetting.pyTorchModel?.filename || fileUploadSetting.pyTorchModel?.file?.name || ""
const modelFilenameText = onnxModelFilenameText + pyTorchFilenameText
const onModelFileLoadClicked = async () => {
const file = await fileSelector("")
if (file.name.endsWith(".onnx") == false && file.name.endsWith(".pth") == false) {
alert("モデルファイルの拡張子は.onnxか.pthである必要があります。(Extension of the model file should be .onnx or .pth.)")
return
}
if (file.name.endsWith(".onnx") == true) {
appState.serverSetting.setFileUploadSetting(slot, {
...appState.serverSetting.fileUploadSettings[slot],
onnxModel: {
file: file
},
pyTorchModel: null
})
return
}
if (file.name.endsWith(".pth") == true) {
appState.serverSetting.setFileUploadSetting(slot, {
...appState.serverSetting.fileUploadSettings[slot],
pyTorchModel: {
file: file
},
onnxModel: null
})
return
}
}
const onModelFileClearClicked = () => {
appState.serverSetting.setFileUploadSetting(slot, {
...appState.serverSetting.fileUploadSettings[slot],
onnxModel: null,
pyTorchModel: null
})
}
return (
<div className="body-row split-3-3-4 left-padding-1 guided">
<div className="body-item-title left-padding-2">Model(.onnx or .pth)</div>
<div className="body-item-text">
<div>{modelFilenameText}</div>
</div>
<div className="body-button-container">
<div className="body-button" onClick={onModelFileLoadClicked}>select</div>
<div className="body-button left-margin-1" onClick={onModelFileClearClicked}>clear</div>
</div>
</div>
)
}, [appState.serverSetting.fileUploadSettings, appState.serverSetting.setFileUploadSetting, guiState.modelSlotNum])
return onnxSelectRow
}

View File

@ -3,13 +3,21 @@ import { fileSelector } from "@dannadori/voice-changer-client-js"
import { useAppState } from "../../../001_provider/001_AppStateProvider" import { useAppState } from "../../../001_provider/001_AppStateProvider"
import { useGuiState } from "../001_GuiStateProvider" import { useGuiState } from "../001_GuiStateProvider"
export const ONNXSelectRow = () => { type ONNXSelectRowProps = {
onlyWhenSelected: boolean
}
export const ONNXSelectRow = (props: ONNXSelectRowProps) => {
const appState = useAppState() const appState = useAppState()
const guiState = useGuiState() const guiState = useGuiState()
const onnxSelectRow = useMemo(() => { const onnxSelectRow = useMemo(() => {
const slot = guiState.modelSlotNum const slot = guiState.modelSlotNum
if (props.onlyWhenSelected && appState.serverSetting.fileUploadSettings[slot]?.framework != "ONNX") {
return <></>
}
const onnxModelFilenameText = appState.serverSetting.fileUploadSettings[slot]?.onnxModel?.filename || appState.serverSetting.fileUploadSettings[slot]?.onnxModel?.file?.name || "" const onnxModelFilenameText = appState.serverSetting.fileUploadSettings[slot]?.onnxModel?.filename || appState.serverSetting.fileUploadSettings[slot]?.onnxModel?.file?.name || ""
const onOnnxFileLoadClicked = async () => { const onOnnxFileLoadClicked = async () => {
const file = await fileSelector("") const file = await fileSelector("")

View File

@ -3,15 +3,24 @@ import { fileSelector } from "@dannadori/voice-changer-client-js"
import { useAppState } from "../../../001_provider/001_AppStateProvider" import { useAppState } from "../../../001_provider/001_AppStateProvider"
import { useGuiState } from "../001_GuiStateProvider" import { useGuiState } from "../001_GuiStateProvider"
export type PyTorchSelectRow = { export type PyTorchSelectRowProps = {
onlyWhenSelected: boolean
} }
export const PyTorchSelectRow = (_props: PyTorchSelectRow) => { export const PyTorchSelectRow = (props: PyTorchSelectRowProps) => {
const appState = useAppState() const appState = useAppState()
const guiState = useGuiState() const guiState = useGuiState()
const pyTorchSelectRow = useMemo(() => { const pyTorchSelectRow = useMemo(() => {
if (guiState.showPyTorchModelUpload == false) {
return <></>
}
const slot = guiState.modelSlotNum const slot = guiState.modelSlotNum
if (props.onlyWhenSelected && appState.serverSetting.fileUploadSettings[slot]?.framework != "PyTorch") {
return <></>
}
const pyTorchFilenameText = appState.serverSetting.fileUploadSettings[slot]?.pyTorchModel?.filename || appState.serverSetting.fileUploadSettings[slot]?.pyTorchModel?.file?.name || "" const pyTorchFilenameText = appState.serverSetting.fileUploadSettings[slot]?.pyTorchModel?.filename || appState.serverSetting.fileUploadSettings[slot]?.pyTorchModel?.file?.name || ""
const onPyTorchFileLoadClicked = async () => { const onPyTorchFileLoadClicked = async () => {
const file = await fileSelector("") const file = await fileSelector("")

View File

@ -9,6 +9,11 @@ export const HalfPrecisionRow = () => {
const halfPrecisionSelectRow = useMemo(() => { const halfPrecisionSelectRow = useMemo(() => {
const slot = guiState.modelSlotNum const slot = guiState.modelSlotNum
const fileUploadSetting = appState.serverSetting.fileUploadSettings[slot]
if (!fileUploadSetting) {
return <></>
}
const currentValue = fileUploadSetting ? fileUploadSetting.isHalf : true
const onHalfPrecisionChanged = () => { const onHalfPrecisionChanged = () => {
appState.serverSetting.setFileUploadSetting(slot, { appState.serverSetting.setFileUploadSetting(slot, {
...appState.serverSetting.fileUploadSettings[slot], ...appState.serverSetting.fileUploadSettings[slot],
@ -16,16 +21,13 @@ export const HalfPrecisionRow = () => {
}) })
} }
const currentVal = appState.serverSetting.fileUploadSettings[slot] ? appState.serverSetting.fileUploadSettings[slot].isHalf : true
return ( return (
<div className="body-row split-3-3-4 left-padding-1 guided"> <div className="body-row split-3-3-4 left-padding-1 guided">
<div className="body-item-title left-padding-2">-</div> <div className="body-item-title left-padding-2">-</div>
<div className="body-item-text"> <div className="body-item-text">
<div></div> <input type="checkbox" checked={currentValue} onChange={() => onHalfPrecisionChanged()} /> half-precision
</div> </div>
<div className="body-button-container"> <div className="body-button-container">
<input type="checkbox" checked={currentVal} onChange={() => onHalfPrecisionChanged()} /> half-precision
</div> </div>
</div> </div>
) )

View File

@ -27,7 +27,7 @@ export const ModelUploadButtonRow = () => {
</div> </div>
<div className="body-button-container"> <div className="body-button-container">
<div className={uploadButtonClassName} onClick={uploadButtonAction}>{uploadButtonLabel}</div> <div className={uploadButtonClassName} onClick={uploadButtonAction}>{uploadButtonLabel}</div>
<div>{uploadedText}</div> <div className="body-item-text-em" >{uploadedText}</div>
</div> </div>
</div> </div>

View File

@ -8,6 +8,10 @@ export const DefaultTuneRow = () => {
const defaultTuneRow = useMemo(() => { const defaultTuneRow = useMemo(() => {
const slot = guiState.modelSlotNum const slot = guiState.modelSlotNum
const fileUploadSetting = appState.serverSetting.fileUploadSettings[slot] const fileUploadSetting = appState.serverSetting.fileUploadSettings[slot]
if (!fileUploadSetting) {
return <></>
}
const currentValue = fileUploadSetting.defaultTune
const onDefaultTuneChanged = (val: number) => { const onDefaultTuneChanged = (val: number) => {
appState.serverSetting.setFileUploadSetting(slot, { appState.serverSetting.setFileUploadSetting(slot, {
@ -20,10 +24,10 @@ export const DefaultTuneRow = () => {
<div className="body-row split-3-2-1-4 left-padding-1 guided"> <div className="body-row split-3-2-1-4 left-padding-1 guided">
<div className="body-item-title left-padding-2 ">Default Tune</div> <div className="body-item-title left-padding-2 ">Default Tune</div>
<div> <div>
<input type="range" className="body-item-input-slider" min="-50" max="50" step="1" value={fileUploadSetting?.defaultTune || 0} onChange={(e) => { <input type="range" className="body-item-input-slider" min="-50" max="50" step="1" value={currentValue} onChange={(e) => {
onDefaultTuneChanged(Number(e.target.value)) onDefaultTuneChanged(Number(e.target.value))
}}></input> }}></input>
<span className="body-item-input-slider-val">{fileUploadSetting?.defaultTune || 0}</span> <span className="body-item-input-slider-val">{currentValue}</span>
</div> </div>
<div> <div>
</div> </div>

View File

@ -0,0 +1,42 @@
import { Framework } from "@dannadori/voice-changer-client-js"
import React, { useMemo } from "react"
import { useAppState } from "../../../001_provider/001_AppStateProvider"
import { useGuiState } from "../001_GuiStateProvider"
export const FrameworkSelectorRow = () => {
const appState = useAppState()
const guiState = useGuiState()
const frameworkSelectorRow = useMemo(() => {
const slot = guiState.modelSlotNum
const fileUploadSetting = appState.serverSetting.fileUploadSettings[slot]
const currentValue = fileUploadSetting?.framework || Framework.PyTorch
const onFrameworkChanged = (val: Framework) => {
appState.serverSetting.setFileUploadSetting(slot, {
...appState.serverSetting.fileUploadSettings[slot],
framework: val
})
}
return (
<div className="body-row split-3-7 left-padding-1 guided">
<div className="body-item-title left-padding-2">Framework</div>
<div className="body-input-container">
<div className="body-select-container">
<select className="body-select" value={currentValue} onChange={(e) => {
onFrameworkChanged(e.target.value as Framework)
}}>
{
Object.values(Framework).map(x => {
return <option key={x} value={x}>{x}</option>
})
}
</select>
</div>
</div>
</div>
)
}, [appState.serverSetting.fileUploadSettings, appState.serverSetting.setFileUploadSetting, guiState.modelSlotNum])
return frameworkSelectorRow
}

View File

@ -1,6 +1,7 @@
import React, { useMemo, useEffect } from "react" import React, { useMemo, useEffect } from "react"
import { useGuiState } from "../001_GuiStateProvider" import { useGuiState } from "../001_GuiStateProvider"
import { ConfigSelectRow } from "./301-1_ConfigSelectRow" import { ConfigSelectRow } from "./301-1_ConfigSelectRow"
import { ModelSelectRow } from "./301-2-5_ModelSelectRow"
import { ONNXSelectRow } from "./301-2_ONNXSelectRow" import { ONNXSelectRow } from "./301-2_ONNXSelectRow"
import { PyTorchSelectRow } from "./301-3_PyTorchSelectRow" import { PyTorchSelectRow } from "./301-3_PyTorchSelectRow"
import { CorrespondenceSelectRow } from "./301-4_CorrespondenceSelectRow" import { CorrespondenceSelectRow } from "./301-4_CorrespondenceSelectRow"
@ -11,9 +12,11 @@ import { HalfPrecisionRow } from "./301-8_HalfPrescisionRow"
import { ModelUploadButtonRow } from "./301-9_ModelUploadButtonRow" import { ModelUploadButtonRow } from "./301-9_ModelUploadButtonRow"
import { ModelSlotRow } from "./301-a_ModelSlotRow" import { ModelSlotRow } from "./301-a_ModelSlotRow"
import { DefaultTuneRow } from "./301-c_DefaultTuneRow" import { DefaultTuneRow } from "./301-c_DefaultTuneRow"
import { FrameworkSelectorRow } from "./301-d_FrameworkSelector"
export type ModelUploaderRowProps = { export type ModelUploaderRowProps = {
showModelSlot: boolean showModelSlot: boolean
showFrameworkSelector: boolean
showConfig: boolean showConfig: boolean
showOnnx: boolean showOnnx: boolean
showPyTorch: boolean showPyTorch: boolean
@ -26,7 +29,10 @@ export type ModelUploaderRowProps = {
showDescription: boolean showDescription: boolean
showDefaultTune: boolean showDefaultTune: boolean
showPyTorchEnableCheckBox: boolean
defaultEnablePyTorch: boolean defaultEnablePyTorch: boolean
onlySelectedFramework: boolean
oneModelFileType: boolean
showOnnxExportButton: boolean showOnnxExportButton: boolean
} }
@ -38,6 +44,15 @@ export const ModelUploaderRow = (props: ModelUploaderRowProps) => {
}, []) }, [])
const modelUploaderRow = useMemo(() => { const modelUploaderRow = useMemo(() => {
const pytorchEnableCheckBox = props.showPyTorchEnableCheckBox ?
<div>
<input type="checkbox" checked={guiState.showPyTorchModelUpload} onChange={(e) => {
guiState.setShowPyTorchModelUpload(e.target.checked)
}} /> enable PyTorch
</div>
:
<></>
return ( return (
<> <>
<div className="body-row split-3-3-4 left-padding-1 guided"> <div className="body-row split-3-3-4 left-padding-1 guided">
@ -46,17 +61,17 @@ export const ModelUploaderRow = (props: ModelUploaderRowProps) => {
<div></div> <div></div>
</div> </div>
<div className="body-item-text"> <div className="body-item-text">
<div> {pytorchEnableCheckBox}
<input type="checkbox" checked={guiState.showPyTorchModelUpload} onChange={(e) => {
guiState.setShowPyTorchModelUpload(e.target.checked)
}} /> enable PyTorch
</div> </div>
</div> </div>
</div> {props.showModelSlot ? <ModelSlotRow /> : <></>}
<ModelSlotRow /> {props.showFrameworkSelector ? <FrameworkSelectorRow /> : <></>}
{props.showConfig ? <ConfigSelectRow /> : <></>} {props.showConfig ? <ConfigSelectRow /> : <></>}
{props.showOnnx ? <ONNXSelectRow /> : <></>}
{props.showPyTorch && guiState.showPyTorchModelUpload ? <PyTorchSelectRow /> : <></>} {props.oneModelFileType ? <ModelSelectRow /> : <></>}
{props.showOnnx ? <ONNXSelectRow onlyWhenSelected={props.onlySelectedFramework} /> : <></>}
{props.showPyTorch ? <PyTorchSelectRow onlyWhenSelected={props.onlySelectedFramework} /> : <></>}
{props.showCorrespondence ? <CorrespondenceSelectRow /> : <></>} {props.showCorrespondence ? <CorrespondenceSelectRow /> : <></>}
{props.showPyTorchCluster ? <PyTorchClusterSelectRow /> : <></>} {props.showPyTorchCluster ? <PyTorchClusterSelectRow /> : <></>}
{props.showFeature ? <FeatureSelectRow /> : <></>} {props.showFeature ? <FeatureSelectRow /> : <></>}

View File

@ -1,5 +1,5 @@
import React, { useMemo } from "react" import React, { useMemo } from "react"
import { fileSelector, ModelSamplingRate } from "@dannadori/voice-changer-client-js" import { ModelSamplingRate } from "@dannadori/voice-changer-client-js"
import { useAppState } from "../../../001_provider/001_AppStateProvider" import { useAppState } from "../../../001_provider/001_AppStateProvider"
export type ModelSamplingRateRowProps = { export type ModelSamplingRateRowProps = {

View File

@ -535,6 +535,14 @@ body {
color: rgb(30, 30, 30); color: rgb(30, 30, 30);
font-size: 0.7rem; font-size: 0.7rem;
} }
.body-item-text-em {
color: rgb(250, 30, 30);
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
font-weight: 700;
}
.body-input-container { .body-input-container {
display: flex; display: flex;
} }

View File

@ -1,12 +1,12 @@
{ {
"name": "@dannadori/voice-changer-client-js", "name": "@dannadori/voice-changer-client-js",
"version": "1.0.114", "version": "1.0.115",
"lockfileVersion": 2, "lockfileVersion": 2,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "@dannadori/voice-changer-client-js", "name": "@dannadori/voice-changer-client-js",
"version": "1.0.114", "version": "1.0.115",
"license": "ISC", "license": "ISC",
"dependencies": { "dependencies": {
"@types/readable-stream": "^2.3.15", "@types/readable-stream": "^2.3.15",
@ -18,17 +18,17 @@
"socket.io-client": "^4.6.1" "socket.io-client": "^4.6.1"
}, },
"devDependencies": { "devDependencies": {
"@types/audioworklet": "^0.0.41", "@types/audioworklet": "^0.0.42",
"@types/node": "^18.15.13", "@types/node": "^18.16.0",
"@types/react": "18.0.38", "@types/react": "18.0.38",
"@types/react-dom": "18.0.11", "@types/react-dom": "18.0.11",
"eslint": "^8.38.0", "eslint": "^8.39.0",
"eslint-config-prettier": "^8.8.0", "eslint-config-prettier": "^8.8.0",
"eslint-plugin-prettier": "^4.2.1", "eslint-plugin-prettier": "^4.2.1",
"eslint-plugin-react": "^7.32.2", "eslint-plugin-react": "^7.32.2",
"eslint-webpack-plugin": "^4.0.1", "eslint-webpack-plugin": "^4.0.1",
"npm-run-all": "^4.1.5", "npm-run-all": "^4.1.5",
"prettier": "^2.8.7", "prettier": "^2.8.8",
"raw-loader": "^4.0.2", "raw-loader": "^4.0.2",
"rimraf": "^5.0.0", "rimraf": "^5.0.0",
"ts-loader": "^9.4.2", "ts-loader": "^9.4.2",
@ -1451,9 +1451,9 @@
} }
}, },
"node_modules/@eslint/js": { "node_modules/@eslint/js": {
"version": "8.38.0", "version": "8.39.0",
"resolved": "https://registry.npmjs.org/@eslint/js/-/js-8.38.0.tgz", "resolved": "https://registry.npmjs.org/@eslint/js/-/js-8.39.0.tgz",
"integrity": "sha512-IoD2MfUnOV58ghIHCiil01PcohxjbYR/qCxsoC+xNgUwh1EY8jOOrYmu3d3a71+tJJ23uscEV4X2HJWMsPJu4g==", "integrity": "sha512-kf9RB0Fg7NZfap83B3QOqOGg9QmD9yBudqQXzzOtn3i4y7ZUXe5ONeW34Gwi+TxhH4mvj72R1Zc300KUMa9Bng==",
"dev": true, "dev": true,
"engines": { "engines": {
"node": "^12.22.0 || ^14.17.0 || >=16.0.0" "node": "^12.22.0 || ^14.17.0 || >=16.0.0"
@ -1686,9 +1686,9 @@
"integrity": "sha512-+9jVqKhRSpsc591z5vX+X5Yyw+he/HCB4iQ/RYxw35CEPaY1gnsNE43nf9n9AaYjAQrTiI/mOwKUKdUs9vf7Xg==" "integrity": "sha512-+9jVqKhRSpsc591z5vX+X5Yyw+he/HCB4iQ/RYxw35CEPaY1gnsNE43nf9n9AaYjAQrTiI/mOwKUKdUs9vf7Xg=="
}, },
"node_modules/@types/audioworklet": { "node_modules/@types/audioworklet": {
"version": "0.0.41", "version": "0.0.42",
"resolved": "https://registry.npmjs.org/@types/audioworklet/-/audioworklet-0.0.41.tgz", "resolved": "https://registry.npmjs.org/@types/audioworklet/-/audioworklet-0.0.42.tgz",
"integrity": "sha512-8BWffzGoSRz436IviQVPye75YYWfac4OKdcLgkZxb3APZxSmAOp2SMtsH1yuM1x57/z/J7bsm05Yq98Hzk1t/w==", "integrity": "sha512-vUHhMkam6BjeomsxZc2f7g0d4fI7PV5EnAoaHo83iy4hNlYphgBgRbcWRK0UEY7jUgfY46kCLYO1riZUdH/P+g==",
"dev": true "dev": true
}, },
"node_modules/@types/body-parser": { "node_modules/@types/body-parser": {
@ -1829,9 +1829,9 @@
"dev": true "dev": true
}, },
"node_modules/@types/node": { "node_modules/@types/node": {
"version": "18.15.13", "version": "18.16.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.15.13.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-18.16.0.tgz",
"integrity": "sha512-N+0kuo9KgrUQ1Sn/ifDXsvg0TTleP7rIy4zOBGECxAljqvqfqpTfzx0Q1NUedOixRMBfe2Whhb056a42cWs26Q==" "integrity": "sha512-BsAaKhB+7X+H4GnSjGhJG9Qi8Tw+inU9nJDwmD5CgOmBLEI6ArdhikpLX7DjbjDRDTbqZzU2LSQNZg8WGPiSZQ=="
}, },
"node_modules/@types/prop-types": { "node_modules/@types/prop-types": {
"version": "15.7.5", "version": "15.7.5",
@ -3230,15 +3230,15 @@
} }
}, },
"node_modules/eslint": { "node_modules/eslint": {
"version": "8.38.0", "version": "8.39.0",
"resolved": "https://registry.npmjs.org/eslint/-/eslint-8.38.0.tgz", "resolved": "https://registry.npmjs.org/eslint/-/eslint-8.39.0.tgz",
"integrity": "sha512-pIdsD2jwlUGf/U38Jv97t8lq6HpaU/G9NKbYmpWpZGw3LdTNhZLbJePqxOXGB5+JEKfOPU/XLxYxFh03nr1KTg==", "integrity": "sha512-mwiok6cy7KTW7rBpo05k6+p4YVZByLNjAZ/ACB9DRCu4YDRwjXI01tWHp6KAUWelsBetTxKK/2sHB0vdS8Z2Og==",
"dev": true, "dev": true,
"dependencies": { "dependencies": {
"@eslint-community/eslint-utils": "^4.2.0", "@eslint-community/eslint-utils": "^4.2.0",
"@eslint-community/regexpp": "^4.4.0", "@eslint-community/regexpp": "^4.4.0",
"@eslint/eslintrc": "^2.0.2", "@eslint/eslintrc": "^2.0.2",
"@eslint/js": "8.38.0", "@eslint/js": "8.39.0",
"@humanwhocodes/config-array": "^0.11.8", "@humanwhocodes/config-array": "^0.11.8",
"@humanwhocodes/module-importer": "^1.0.1", "@humanwhocodes/module-importer": "^1.0.1",
"@nodelib/fs.walk": "^1.2.8", "@nodelib/fs.walk": "^1.2.8",
@ -3248,7 +3248,7 @@
"debug": "^4.3.2", "debug": "^4.3.2",
"doctrine": "^3.0.0", "doctrine": "^3.0.0",
"escape-string-regexp": "^4.0.0", "escape-string-regexp": "^4.0.0",
"eslint-scope": "^7.1.1", "eslint-scope": "^7.2.0",
"eslint-visitor-keys": "^3.4.0", "eslint-visitor-keys": "^3.4.0",
"espree": "^9.5.1", "espree": "^9.5.1",
"esquery": "^1.4.2", "esquery": "^1.4.2",
@ -3361,9 +3361,9 @@
} }
}, },
"node_modules/eslint-scope": { "node_modules/eslint-scope": {
"version": "7.1.1", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-7.1.1.tgz", "resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-7.2.0.tgz",
"integrity": "sha512-QKQM/UXpIiHcLqJ5AOyIW7XZmzjkzQXYE54n1++wb0u9V/abW3l9uQnxX8Z5Xd18xyKIMTUAyQ0k1e8pz6LUrw==", "integrity": "sha512-DYj5deGlHBfMt15J7rdtyKNq/Nqlv5KfU4iodrQ019XESsRnwXH9KAE0y3cwtUHDo2ob7CypAnCqefh6vioWRw==",
"dev": true, "dev": true,
"dependencies": { "dependencies": {
"esrecurse": "^4.3.0", "esrecurse": "^4.3.0",
@ -3371,6 +3371,9 @@
}, },
"engines": { "engines": {
"node": "^12.22.0 || ^14.17.0 || >=16.0.0" "node": "^12.22.0 || ^14.17.0 || >=16.0.0"
},
"funding": {
"url": "https://opencollective.com/eslint"
} }
}, },
"node_modules/eslint-visitor-keys": { "node_modules/eslint-visitor-keys": {
@ -5843,9 +5846,9 @@
} }
}, },
"node_modules/prettier": { "node_modules/prettier": {
"version": "2.8.7", "version": "2.8.8",
"resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.7.tgz", "resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.8.tgz",
"integrity": "sha512-yPngTo3aXUUmyuTjeTUT75txrf+aMh9FiD7q9ZE/i6r0bPb22g4FsE6Y338PQX1bmfy08i9QQCB7/rcUAVntfw==", "integrity": "sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==",
"dev": true, "dev": true,
"bin": { "bin": {
"prettier": "bin-prettier.js" "prettier": "bin-prettier.js"
@ -9132,9 +9135,9 @@
} }
}, },
"@eslint/js": { "@eslint/js": {
"version": "8.38.0", "version": "8.39.0",
"resolved": "https://registry.npmjs.org/@eslint/js/-/js-8.38.0.tgz", "resolved": "https://registry.npmjs.org/@eslint/js/-/js-8.39.0.tgz",
"integrity": "sha512-IoD2MfUnOV58ghIHCiil01PcohxjbYR/qCxsoC+xNgUwh1EY8jOOrYmu3d3a71+tJJ23uscEV4X2HJWMsPJu4g==", "integrity": "sha512-kf9RB0Fg7NZfap83B3QOqOGg9QmD9yBudqQXzzOtn3i4y7ZUXe5ONeW34Gwi+TxhH4mvj72R1Zc300KUMa9Bng==",
"dev": true "dev": true
}, },
"@humanwhocodes/config-array": { "@humanwhocodes/config-array": {
@ -9330,9 +9333,9 @@
"integrity": "sha512-+9jVqKhRSpsc591z5vX+X5Yyw+he/HCB4iQ/RYxw35CEPaY1gnsNE43nf9n9AaYjAQrTiI/mOwKUKdUs9vf7Xg==" "integrity": "sha512-+9jVqKhRSpsc591z5vX+X5Yyw+he/HCB4iQ/RYxw35CEPaY1gnsNE43nf9n9AaYjAQrTiI/mOwKUKdUs9vf7Xg=="
}, },
"@types/audioworklet": { "@types/audioworklet": {
"version": "0.0.41", "version": "0.0.42",
"resolved": "https://registry.npmjs.org/@types/audioworklet/-/audioworklet-0.0.41.tgz", "resolved": "https://registry.npmjs.org/@types/audioworklet/-/audioworklet-0.0.42.tgz",
"integrity": "sha512-8BWffzGoSRz436IviQVPye75YYWfac4OKdcLgkZxb3APZxSmAOp2SMtsH1yuM1x57/z/J7bsm05Yq98Hzk1t/w==", "integrity": "sha512-vUHhMkam6BjeomsxZc2f7g0d4fI7PV5EnAoaHo83iy4hNlYphgBgRbcWRK0UEY7jUgfY46kCLYO1riZUdH/P+g==",
"dev": true "dev": true
}, },
"@types/body-parser": { "@types/body-parser": {
@ -9473,9 +9476,9 @@
"dev": true "dev": true
}, },
"@types/node": { "@types/node": {
"version": "18.15.13", "version": "18.16.0",
"resolved": "https://registry.npmjs.org/@types/node/-/node-18.15.13.tgz", "resolved": "https://registry.npmjs.org/@types/node/-/node-18.16.0.tgz",
"integrity": "sha512-N+0kuo9KgrUQ1Sn/ifDXsvg0TTleP7rIy4zOBGECxAljqvqfqpTfzx0Q1NUedOixRMBfe2Whhb056a42cWs26Q==" "integrity": "sha512-BsAaKhB+7X+H4GnSjGhJG9Qi8Tw+inU9nJDwmD5CgOmBLEI6ArdhikpLX7DjbjDRDTbqZzU2LSQNZg8WGPiSZQ=="
}, },
"@types/prop-types": { "@types/prop-types": {
"version": "15.7.5", "version": "15.7.5",
@ -10563,15 +10566,15 @@
"dev": true "dev": true
}, },
"eslint": { "eslint": {
"version": "8.38.0", "version": "8.39.0",
"resolved": "https://registry.npmjs.org/eslint/-/eslint-8.38.0.tgz", "resolved": "https://registry.npmjs.org/eslint/-/eslint-8.39.0.tgz",
"integrity": "sha512-pIdsD2jwlUGf/U38Jv97t8lq6HpaU/G9NKbYmpWpZGw3LdTNhZLbJePqxOXGB5+JEKfOPU/XLxYxFh03nr1KTg==", "integrity": "sha512-mwiok6cy7KTW7rBpo05k6+p4YVZByLNjAZ/ACB9DRCu4YDRwjXI01tWHp6KAUWelsBetTxKK/2sHB0vdS8Z2Og==",
"dev": true, "dev": true,
"requires": { "requires": {
"@eslint-community/eslint-utils": "^4.2.0", "@eslint-community/eslint-utils": "^4.2.0",
"@eslint-community/regexpp": "^4.4.0", "@eslint-community/regexpp": "^4.4.0",
"@eslint/eslintrc": "^2.0.2", "@eslint/eslintrc": "^2.0.2",
"@eslint/js": "8.38.0", "@eslint/js": "8.39.0",
"@humanwhocodes/config-array": "^0.11.8", "@humanwhocodes/config-array": "^0.11.8",
"@humanwhocodes/module-importer": "^1.0.1", "@humanwhocodes/module-importer": "^1.0.1",
"@nodelib/fs.walk": "^1.2.8", "@nodelib/fs.walk": "^1.2.8",
@ -10581,7 +10584,7 @@
"debug": "^4.3.2", "debug": "^4.3.2",
"doctrine": "^3.0.0", "doctrine": "^3.0.0",
"escape-string-regexp": "^4.0.0", "escape-string-regexp": "^4.0.0",
"eslint-scope": "^7.1.1", "eslint-scope": "^7.2.0",
"eslint-visitor-keys": "^3.4.0", "eslint-visitor-keys": "^3.4.0",
"espree": "^9.5.1", "espree": "^9.5.1",
"esquery": "^1.4.2", "esquery": "^1.4.2",
@ -10661,9 +10664,9 @@
} }
}, },
"eslint-scope": { "eslint-scope": {
"version": "7.1.1", "version": "7.2.0",
"resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-7.1.1.tgz", "resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-7.2.0.tgz",
"integrity": "sha512-QKQM/UXpIiHcLqJ5AOyIW7XZmzjkzQXYE54n1++wb0u9V/abW3l9uQnxX8Z5Xd18xyKIMTUAyQ0k1e8pz6LUrw==", "integrity": "sha512-DYj5deGlHBfMt15J7rdtyKNq/Nqlv5KfU4iodrQ019XESsRnwXH9KAE0y3cwtUHDo2ob7CypAnCqefh6vioWRw==",
"dev": true, "dev": true,
"requires": { "requires": {
"esrecurse": "^4.3.0", "esrecurse": "^4.3.0",
@ -12477,9 +12480,9 @@
"dev": true "dev": true
}, },
"prettier": { "prettier": {
"version": "2.8.7", "version": "2.8.8",
"resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.7.tgz", "resolved": "https://registry.npmjs.org/prettier/-/prettier-2.8.8.tgz",
"integrity": "sha512-yPngTo3aXUUmyuTjeTUT75txrf+aMh9FiD7q9ZE/i6r0bPb22g4FsE6Y338PQX1bmfy08i9QQCB7/rcUAVntfw==", "integrity": "sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==",
"dev": true "dev": true
}, },
"prettier-linter-helpers": { "prettier-linter-helpers": {

View File

@ -1,6 +1,6 @@
{ {
"name": "@dannadori/voice-changer-client-js", "name": "@dannadori/voice-changer-client-js",
"version": "1.0.114", "version": "1.0.115",
"description": "", "description": "",
"main": "dist/index.js", "main": "dist/index.js",
"directories": { "directories": {
@ -26,17 +26,17 @@
"author": "wataru.okada@flect.co.jp", "author": "wataru.okada@flect.co.jp",
"license": "ISC", "license": "ISC",
"devDependencies": { "devDependencies": {
"@types/audioworklet": "^0.0.41", "@types/audioworklet": "^0.0.42",
"@types/node": "^18.15.13", "@types/node": "^18.16.0",
"@types/react": "18.0.38", "@types/react": "18.0.38",
"@types/react-dom": "18.0.11", "@types/react-dom": "18.0.11",
"eslint": "^8.38.0", "eslint": "^8.39.0",
"eslint-config-prettier": "^8.8.0", "eslint-config-prettier": "^8.8.0",
"eslint-plugin-prettier": "^4.2.1", "eslint-plugin-prettier": "^4.2.1",
"eslint-plugin-react": "^7.32.2", "eslint-plugin-react": "^7.32.2",
"eslint-webpack-plugin": "^4.0.1", "eslint-webpack-plugin": "^4.0.1",
"npm-run-all": "^4.1.5", "npm-run-all": "^4.1.5",
"prettier": "^2.8.7", "prettier": "^2.8.8",
"raw-loader": "^4.0.2", "raw-loader": "^4.0.2",
"rimraf": "^5.0.0", "rimraf": "^5.0.0",
"ts-loader": "^9.4.2", "ts-loader": "^9.4.2",

View File

@ -110,6 +110,10 @@ export class ServerConfigurator {
} }
loadModel = async (slot: number, configFilename: string, pyTorchModelFilename: string | null, onnxModelFilename: string | null, clusterTorchModelFilename: string | null, featureFilename: string | null, indexFilename: string | null, isHalf: boolean, params: string = "{}") => { loadModel = async (slot: number, configFilename: string, pyTorchModelFilename: string | null, onnxModelFilename: string | null, clusterTorchModelFilename: string | null, featureFilename: string | null, indexFilename: string | null, isHalf: boolean, params: string = "{}") => {
if (isHalf == undefined || isHalf == null) {
console.warn("isHalf is invalid value", isHalf)
isHalf = false
}
const url = this.serverUrl + "/load_model" const url = this.serverUrl + "/load_model"
const info = new Promise<ServerInfo>(async (resolve) => { const info = new Promise<ServerInfo>(async (resolve) => {
const formData = new FormData(); const formData = new FormData();

View File

@ -138,13 +138,28 @@ export type VoiceChangerServerSetting = {
inputSampleRate: InputSampleRate inputSampleRate: InputSampleRate
} }
type ModelSlot = {
onnxModelFile: string,
pyTorchModelFile: string
featureFile: string,
indexFile: string,
defaultTrans: number,
modelType: number,
embChannels: number,
f0: boolean,
samplingRate: number
deprecated: boolean
}
export type ServerInfo = VoiceChangerServerSetting & { export type ServerInfo = VoiceChangerServerSetting & {
status: string status: string
configFile: string, configFile: string,
pyTorchModelFile: string, pyTorchModelFile: string,
onnxModelFile: string, onnxModelFile: string,
onnxExecutionProviders: OnnxExecutionProvider[] onnxExecutionProviders: OnnxExecutionProvider[]
modelSlots: any[] modelSlots: ModelSlot[]
} }
export type ServerInfoSoVitsSVC = ServerInfo & { export type ServerInfoSoVitsSVC = ServerInfo & {

View File

@ -1,5 +1,5 @@
import { useState, useMemo, useEffect } from "react" import { useState, useMemo, useEffect } from "react"
import { VoiceChangerServerSetting, ServerInfo, ServerSettingKey, INDEXEDDB_KEY_SERVER, INDEXEDDB_KEY_MODEL_DATA, ClientType, DefaultServerSetting_MMVCv13, DefaultServerSetting_MMVCv15, DefaultServerSetting_so_vits_svc_40v2, DefaultServerSetting_so_vits_svc_40, DefaultServerSetting_so_vits_svc_40_c, DefaultServerSetting_RVC, OnnxExporterInfo, DefaultServerSetting_DDSP_SVC, MAX_MODEL_SLOT_NUM } from "../const" import { VoiceChangerServerSetting, ServerInfo, ServerSettingKey, INDEXEDDB_KEY_SERVER, INDEXEDDB_KEY_MODEL_DATA, ClientType, DefaultServerSetting_MMVCv13, DefaultServerSetting_MMVCv15, DefaultServerSetting_so_vits_svc_40v2, DefaultServerSetting_so_vits_svc_40, DefaultServerSetting_so_vits_svc_40_c, DefaultServerSetting_RVC, OnnxExporterInfo, DefaultServerSetting_DDSP_SVC, MAX_MODEL_SLOT_NUM, Framework } from "../const"
import { VoiceChangerClient } from "../VoiceChangerClient" import { VoiceChangerClient } from "../VoiceChangerClient"
import { useIndexedDB } from "./useIndexedDB" import { useIndexedDB } from "./useIndexedDB"
@ -22,6 +22,7 @@ export type FileUploadSetting = {
isHalf: boolean isHalf: boolean
uploaded: boolean uploaded: boolean
defaultTune: number defaultTune: number
framework: Framework
params: string params: string
} }
@ -38,6 +39,7 @@ const InitialFileUploadSetting: FileUploadSetting = {
isHalf: true, isHalf: true,
uploaded: false, uploaded: false,
defaultTune: 0, defaultTune: 0,
framework: Framework.PyTorch,
params: "{}" params: "{}"
} }
@ -267,8 +269,11 @@ export const useServerSetting = (props: UseServerSettingProps): ServerSettingSta
const configFileName = fileUploadSetting.configFile ? fileUploadSetting.configFile.filename || "-" : "-" const configFileName = fileUploadSetting.configFile ? fileUploadSetting.configFile.filename || "-" : "-"
const params = JSON.stringify({ const params = JSON.stringify({
trans: fileUploadSetting.defaultTune trans: fileUploadSetting.defaultTune || 0
}) })
if (fileUploadSetting.isHalf == undefined) {
fileUploadSetting.isHalf = false
}
const loadPromise = props.voiceChangerClient.loadModel( const loadPromise = props.voiceChangerClient.loadModel(
slot, slot,
configFileName, configFileName,
@ -279,7 +284,6 @@ export const useServerSetting = (props: UseServerSettingProps): ServerSettingSta
fileUploadSetting.index?.filename || null, fileUploadSetting.index?.filename || null,
fileUploadSetting.isHalf, fileUploadSetting.isHalf,
params, params,
) )
// サーバでロード中にキャッシュにセーブ // サーバでロード中にキャッシュにセーブ
@ -322,6 +326,7 @@ export const useServerSetting = (props: UseServerSettingProps): ServerSettingSta
isHalf: fileUploadSetting.isHalf, // キャッシュとしては不使用。guiで上書きされる。 isHalf: fileUploadSetting.isHalf, // キャッシュとしては不使用。guiで上書きされる。
uploaded: false, // キャッシュから読み込まれるときには、まだuploadされていないから。 uploaded: false, // キャッシュから読み込まれるときには、まだuploadされていないから。
defaultTune: fileUploadSetting.defaultTune, defaultTune: fileUploadSetting.defaultTune,
framework: fileUploadSetting.framework,
params: fileUploadSetting.params params: fileUploadSetting.params
} }
setItem(`${INDEXEDDB_KEY_MODEL_DATA}_${slot}`, saveData) setItem(`${INDEXEDDB_KEY_MODEL_DATA}_${slot}`, saveData)

View File

@ -61,6 +61,9 @@ RUN pip install einops==0.6.0
RUN pip install local_attention==1.8.5 RUN pip install local_attention==1.8.5
RUN pip install websockets==11.0.2 RUN pip install websockets==11.0.2
WORKDIR /
ADD dummy /
RUN git clone https://github.com/w-okada/voice-changer.git RUN git clone https://github.com/w-okada/voice-changer.git
ADD /setup.sh /voice-changer/server ADD /setup.sh /voice-changer/server

View File

@ -7,7 +7,7 @@
"build:docker": "date +%Y%m%d%H%M%S > docker/dummy && DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile docker/ -t voice-changer", "build:docker": "date +%Y%m%d%H%M%S > docker/dummy && DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile docker/ -t voice-changer",
"build:docker:onnx": "DOCKER_BUILDKIT=1 docker build -f docker_onnx/Dockerfile docker/ -t onnx-converter", "build:docker:onnx": "DOCKER_BUILDKIT=1 docker build -f docker_onnx/Dockerfile docker/ -t onnx-converter",
"build:docker:trainer": "date +%Y%m%d%H%M%S > docker_trainer/dummy && DOCKER_BUILDKIT=1 docker build -f docker_trainer/Dockerfile docker_trainer/ -t trainer", "build:docker:trainer": "date +%Y%m%d%H%M%S > docker_trainer/dummy && DOCKER_BUILDKIT=1 docker build -f docker_trainer/Dockerfile docker_trainer/ -t trainer",
"build:docker:vcclient": "date +%Y%m%d%H%M%S > docker/dummy && DOCKER_BUILDKIT=1 docker build -f docker_vcclient/Dockerfile docker_vcclient/ -t vcclient", "build:docker:vcclient": "date +%Y%m%d%H%M%S > docker_vcclient/dummy && DOCKER_BUILDKIT=1 docker build -f docker_vcclient/Dockerfile docker_vcclient/ -t vcclient",
"push:docker": "bash script/001_pushDocker.sh", "push:docker": "bash script/001_pushDocker.sh",
"push:docker:trainer": "bash script/002_pushDockerTrainer.sh", "push:docker:trainer": "bash script/002_pushDockerTrainer.sh",
"push:docker:vcclient": "bash script/003_pushDockerVCClient.sh", "push:docker:vcclient": "bash script/003_pushDockerVCClient.sh",

16
server/.vscode/settings.json vendored Normal file
View File

@ -0,0 +1,16 @@
{
"workbench.colorCustomizations": {
"tab.activeBackground": "#65952acc"
},
"python.formatting.provider": "black",
"python.linting.mypyEnabled": true,
"[python]": {
"editor.defaultFormatter": null, // Prettier 使
"editor.formatOnSave": true //
},
"flake8.args": [
"--ignore=E501,E402,E722,E741,W503"
// "--max-line-length=150",
// "--max-complexity=20"
]
}

View File

@ -1,12 +1,13 @@
class NoModeLoadedException(Exception): class NoModeLoadedException(Exception):
def __init__(self, framework): def __init__(self, framework):
self.framework = framework self.framework = framework
def __str__(self): def __str__(self):
return repr(f"No model for {self.framework} loaded. Please confirm the model uploaded.") return repr(
f"No model for {self.framework} loaded. Please confirm the model uploaded."
)
class ONNXInputArgumentException(Exception): class ONNXInputArgumentException(Exception):
def __str__(self): def __str__(self):
return repr(f"ONNX received invalid argument.") return repr("ONNX received invalid argument.")

View File

@ -2,12 +2,12 @@ import sys
from distutils.util import strtobool from distutils.util import strtobool
from datetime import datetime from datetime import datetime
from dataclasses import dataclass
import misc.log_control
import socket import socket
import platform import platform
import os import os
import argparse import argparse
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
import uvicorn import uvicorn
from mods.ssl import create_self_signed_cert from mods.ssl import create_self_signed_cert
from voice_changer.VoiceChangerManager import VoiceChangerManager from voice_changer.VoiceChangerManager import VoiceChangerManager
@ -16,35 +16,56 @@ from restapi.MMVC_Rest import MMVC_Rest
from const import NATIVE_CLIENT_FILE_MAC, NATIVE_CLIENT_FILE_WIN, SSL_KEY_DIR from const import NATIVE_CLIENT_FILE_MAC, NATIVE_CLIENT_FILE_WIN, SSL_KEY_DIR
import subprocess import subprocess
import multiprocessing as mp import multiprocessing as mp
from misc.log_control import setup_loggers
setup_loggers()
def setupArgParser(): def setupArgParser():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument("-p", type=int, default=18888, help="port") parser.add_argument("-p", type=int, default=18888, help="port")
parser.add_argument("--https", type=strtobool, parser.add_argument("--https", type=strtobool, default=False, help="use https")
default=False, help="use https") parser.add_argument(
parser.add_argument("--httpsKey", type=str, "--httpsKey", type=str, default="ssl.key", help="path for the key of https"
default="ssl.key", help="path for the key of https") )
parser.add_argument("--httpsCert", type=str, parser.add_argument(
default="ssl.cert", help="path for the cert of https") "--httpsCert", type=str, default="ssl.cert", help="path for the cert of https"
parser.add_argument("--httpsSelfSigned", type=strtobool, )
default=True, help="generate self-signed certificate") parser.add_argument(
"--httpsSelfSigned",
type=strtobool,
default=True,
help="generate self-signed certificate",
)
# parser.add_argument("--internal", type=strtobool, default=False, help="各種パスをmac appの中身に変換") parser.add_argument(
"--content_vec_500", type=str, help="path to content_vec_500 model(pytorch)"
parser.add_argument("--content_vec_500", type=str, help="path to content_vec_500 model(pytorch)") )
parser.add_argument("--content_vec_500_onnx", type=str, help="path to content_vec_500 model(onnx)") parser.add_argument(
parser.add_argument("--content_vec_500_onnx_on", type=strtobool, default=False, help="use or not onnx for content_vec_500") "--content_vec_500_onnx", type=str, help="path to content_vec_500 model(onnx)"
parser.add_argument("--hubert_base", type=str, help="path to hubert_base model(pytorch)") )
parser.add_argument("--hubert_soft", type=str, help="path to hubert_soft model(pytorch)") parser.add_argument(
parser.add_argument("--nsf_hifigan", type=str, help="path to nsf_hifigan model(pytorch)") "--content_vec_500_onnx_on",
type=strtobool,
default=False,
help="use or not onnx for content_vec_500",
)
parser.add_argument(
"--hubert_base", type=str, help="path to hubert_base model(pytorch)"
)
parser.add_argument(
"--hubert_soft", type=str, help="path to hubert_soft model(pytorch)"
)
parser.add_argument(
"--nsf_hifigan", type=str, help="path to nsf_hifigan model(pytorch)"
)
return parser return parser
def printMessage(message, level=0): def printMessage(message, level=0):
pf = platform.system() pf = platform.system()
if pf == 'Windows': if pf == "Windows":
if level == 0: if level == 0:
print(f"{message}") print(f"{message}")
elif level == 1: elif level == 1:
@ -78,37 +99,38 @@ def localServer():
host="0.0.0.0", host="0.0.0.0",
port=int(PORT), port=int(PORT),
reload=False if hasattr(sys, "_MEIPASS") else True, reload=False if hasattr(sys, "_MEIPASS") else True,
log_level="warning" log_level="warning",
) )
if __name__ == 'MMVCServerSIO': if __name__ == "MMVCServerSIO":
voiceChangerManager = VoiceChangerManager.get_instance({ voiceChangerParams = VoiceChangerParams(
"content_vec_500": args.content_vec_500, content_vec_500=args.content_vec_500,
"content_vec_500_onnx": args.content_vec_500_onnx, content_vec_500_onnx=args.content_vec_500_onnx,
"content_vec_500_onnx_on": args.content_vec_500_onnx_on, content_vec_500_onnx_on=args.content_vec_500_onnx_on,
"hubert_base": args.hubert_base, hubert_base=args.hubert_base,
"hubert_soft": args.hubert_soft, hubert_soft=args.hubert_soft,
"nsf_hifigan": args.nsf_hifigan, nsf_hifigan=args.nsf_hifigan,
}) )
voiceChangerManager = VoiceChangerManager.get_instance(voiceChangerParams)
print("voiceChangerManager", voiceChangerManager)
app_fastapi = MMVC_Rest.get_instance(voiceChangerManager) app_fastapi = MMVC_Rest.get_instance(voiceChangerManager)
app_socketio = MMVC_SocketIOApp.get_instance(app_fastapi, voiceChangerManager) app_socketio = MMVC_SocketIOApp.get_instance(app_fastapi, voiceChangerManager)
if __name__ == '__mp_main__': if __name__ == "__mp_main__":
printMessage(f"サーバプロセスを起動しています。", level=2) printMessage("サーバプロセスを起動しています。", level=2)
if __name__ == '__main__': if __name__ == "__main__":
mp.freeze_support() mp.freeze_support()
printMessage(f"Voice Changerを起動しています。", level=2) printMessage("Voice Changerを起動しています。", level=2)
PORT = args.p PORT = args.p
if os.getenv("EX_PORT"): if os.getenv("EX_PORT"):
EX_PORT = os.environ["EX_PORT"] EX_PORT = os.environ["EX_PORT"]
printMessage( printMessage(f"External_Port:{EX_PORT} Internal_Port:{PORT}", level=1)
f"External_Port:{EX_PORT} Internal_Port:{PORT}", level=1)
else: else:
printMessage(f"Internal_Port:{PORT}", level=1) printMessage(f"Internal_Port:{PORT}", level=1)
@ -123,38 +145,42 @@ if __name__ == '__main__':
key_base_name = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}" key_base_name = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}"
keyname = f"{key_base_name}.key" keyname = f"{key_base_name}.key"
certname = f"{key_base_name}.cert" certname = f"{key_base_name}.cert"
create_self_signed_cert(certname, keyname, certargs={"Country": "JP", create_self_signed_cert(
certname,
keyname,
certargs={
"Country": "JP",
"State": "Tokyo", "State": "Tokyo",
"City": "Chuo-ku", "City": "Chuo-ku",
"Organization": "F", "Organization": "F",
"Org. Unit": "F"}, cert_dir=SSL_KEY_DIR) "Org. Unit": "F",
},
cert_dir=SSL_KEY_DIR,
)
key_path = os.path.join(SSL_KEY_DIR, keyname) key_path = os.path.join(SSL_KEY_DIR, keyname)
cert_path = os.path.join(SSL_KEY_DIR, certname) cert_path = os.path.join(SSL_KEY_DIR, certname)
printMessage( printMessage(
f"protocol: HTTPS(self-signed), key:{key_path}, cert:{cert_path}", level=1) f"protocol: HTTPS(self-signed), key:{key_path}, cert:{cert_path}", level=1
)
elif args.https and args.httpsSelfSigned == 0: elif args.https and args.httpsSelfSigned == 0:
# HTTPS # HTTPS
key_path = args.httpsKey key_path = args.httpsKey
cert_path = args.httpsCert cert_path = args.httpsCert
printMessage( printMessage(f"protocol: HTTPS, key:{key_path}, cert:{cert_path}", level=1)
f"protocol: HTTPS, key:{key_path}, cert:{cert_path}", level=1)
else: else:
# HTTP # HTTP
printMessage(f"protocol: HTTP", level=1) printMessage("protocol: HTTP", level=1)
printMessage(f"-- ---- -- ", level=1) printMessage("-- ---- -- ", level=1)
# アドレス表示 # アドレス表示
printMessage( printMessage("ブラウザで次のURLを開いてください.", level=2)
f"ブラウザで次のURLを開いてください.", level=2)
if args.https == 1: if args.https == 1:
printMessage( printMessage("https://<IP>:<PORT>/", level=1)
f"https://<IP>:<PORT>/", level=1)
else: else:
printMessage( printMessage("http://<IP>:<PORT>/", level=1)
f"http://<IP>:<PORT>/", level=1)
printMessage(f"多くの場合は次のいずれかのURLにアクセスすると起動します。", level=2) printMessage("多くの場合は次のいずれかのURLにアクセスすると起動します。", level=2)
if "EX_PORT" in locals() and "EX_IP" in locals(): # シェルスクリプト経由起動(docker) if "EX_PORT" in locals() and "EX_IP" in locals(): # シェルスクリプト経由起動(docker)
if args.https == 1: if args.https == 1:
printMessage(f"https://localhost:{EX_PORT}/", level=1) printMessage(f"https://localhost:{EX_PORT}/", level=1)
@ -175,7 +201,7 @@ if __name__ == '__main__':
# サーバ起動 # サーバ起動
if args.https: if args.https:
# HTTPS サーバ起動 # HTTPS サーバ起動
res = uvicorn.run( uvicorn.run(
f"{os.path.basename(__file__)[:-3]}:app_socketio", f"{os.path.basename(__file__)[:-3]}:app_socketio",
host="0.0.0.0", host="0.0.0.0",
port=int(PORT), port=int(PORT),
@ -188,13 +214,17 @@ if __name__ == '__main__':
p = mp.Process(name="p", target=localServer) p = mp.Process(name="p", target=localServer)
p.start() p.start()
try: try:
if sys.platform.startswith('win'): if sys.platform.startswith("win"):
process = subprocess.Popen([NATIVE_CLIENT_FILE_WIN, "-u", f"http://localhost:{PORT}/"]) process = subprocess.Popen(
[NATIVE_CLIENT_FILE_WIN, "-u", f"http://localhost:{PORT}/"]
)
return_code = process.wait() return_code = process.wait()
print("client closed.") print("client closed.")
p.terminate() p.terminate()
elif sys.platform.startswith('darwin'): elif sys.platform.startswith("darwin"):
process = subprocess.Popen([NATIVE_CLIENT_FILE_MAC, "-u", f"http://localhost:{PORT}/"]) process = subprocess.Popen(
[NATIVE_CLIENT_FILE_MAC, "-u", f"http://localhost:{PORT}/"]
)
return_code = process.wait() return_code = process.wait()
print("client closed.") print("client closed.")
p.terminate() p.terminate()

View File

@ -4,7 +4,15 @@ import tempfile
from typing import Literal, TypeAlias from typing import Literal, TypeAlias
ModelType: TypeAlias = Literal['MMVCv15', 'MMVCv13', 'so-vits-svc-40v2', 'so-vits-svc-40', 'so-vits-svc-40_c', 'DDSP-SVC', 'RVC'] ModelType: TypeAlias = Literal[
"MMVCv15",
"MMVCv13",
"so-vits-svc-40v2",
"so-vits-svc-40",
"so-vits-svc-40_c",
"DDSP-SVC",
"RVC",
]
ERROR_NO_ONNX_SESSION = "ERROR_NO_ONNX_SESSION" ERROR_NO_ONNX_SESSION = "ERROR_NO_ONNX_SESSION"
@ -13,27 +21,45 @@ tmpdir = tempfile.TemporaryDirectory()
# print("generate tmpdir:::",tmpdir) # print("generate tmpdir:::",tmpdir)
SSL_KEY_DIR = os.path.join(tmpdir.name, "keys") if hasattr(sys, "_MEIPASS") else "keys" SSL_KEY_DIR = os.path.join(tmpdir.name, "keys") if hasattr(sys, "_MEIPASS") else "keys"
MODEL_DIR = os.path.join(tmpdir.name, "logs") if hasattr(sys, "_MEIPASS") else "logs" MODEL_DIR = os.path.join(tmpdir.name, "logs") if hasattr(sys, "_MEIPASS") else "logs"
UPLOAD_DIR = os.path.join(tmpdir.name, "upload_dir") if hasattr(sys, "_MEIPASS") else "upload_dir" UPLOAD_DIR = (
NATIVE_CLIENT_FILE_WIN = os.path.join(sys._MEIPASS, "voice-changer-native-client.exe") if hasattr(sys, "_MEIPASS") else "voice-changer-native-client" os.path.join(tmpdir.name, "upload_dir")
NATIVE_CLIENT_FILE_MAC = os.path.join(sys._MEIPASS, "voice-changer-native-client.app", "Contents", "MacOS", if hasattr(sys, "_MEIPASS")
"voice-changer-native-client") if hasattr(sys, "_MEIPASS") else "voice-changer-native-client" else "upload_dir"
)
NATIVE_CLIENT_FILE_WIN = (
os.path.join(sys._MEIPASS, "voice-changer-native-client.exe") # type: ignore
if hasattr(sys, "_MEIPASS")
else "voice-changer-native-client"
)
NATIVE_CLIENT_FILE_MAC = (
os.path.join(
sys._MEIPASS, # type: ignore
"voice-changer-native-client.app",
"Contents",
"MacOS",
"voice-changer-native-client",
)
if hasattr(sys, "_MEIPASS")
else "voice-changer-native-client"
)
HUBERT_ONNX_MODEL_PATH = os.path.join(sys._MEIPASS, "model_hubert/hubert_simple.onnx") if hasattr(sys, HUBERT_ONNX_MODEL_PATH = (
"_MEIPASS") else "model_hubert/hubert_simple.onnx" os.path.join(sys._MEIPASS, "model_hubert/hubert_simple.onnx") # type: ignore
if hasattr(sys, "_MEIPASS")
else "model_hubert/hubert_simple.onnx"
)
TMP_DIR = os.path.join(tmpdir.name, "tmp_dir") if hasattr(sys, "_MEIPASS") else "tmp_dir" TMP_DIR = (
os.path.join(tmpdir.name, "tmp_dir") if hasattr(sys, "_MEIPASS") else "tmp_dir"
)
os.makedirs(TMP_DIR, exist_ok=True) os.makedirs(TMP_DIR, exist_ok=True)
# modelType: ModelType = "MMVCv15"
# def getModelType() -> ModelType:
# return modelType
# def setModelType(_modelType: ModelType):
# global modelType
# modelType = _modelType
def getFrontendPath(): def getFrontendPath():
frontend_path = os.path.join(sys._MEIPASS, "dist") if hasattr(sys, "_MEIPASS") else "../client/demo/dist" frontend_path = (
os.path.join(sys._MEIPASS, "dist")
if hasattr(sys, "_MEIPASS")
else "../client/demo/dist"
)
return frontend_path return frontend_path

View File

@ -8,32 +8,31 @@ class UvicornSuppressFilter(logging.Filter):
return False return False
# logger = logging.getLogger("uvicorn.error") def setup_loggers():
# logger.addFilter(UvicornSuppressFilter()) # logger = logging.getLogger("uvicorn.error")
# logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("fairseq.tasks.hubert_pretraining") logger = logging.getLogger("fairseq.tasks.hubert_pretraining")
logger.addFilter(UvicornSuppressFilter()) logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("fairseq.models.hubert.hubert") logger = logging.getLogger("fairseq.models.hubert.hubert")
logger.addFilter(UvicornSuppressFilter()) logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("fairseq.tasks.text_to_speech") logger = logging.getLogger("fairseq.tasks.text_to_speech")
logger.addFilter(UvicornSuppressFilter()) logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("numba.core.ssa")
logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("numba.core.ssa") logger = logging.getLogger("numba.core.interpreter")
logger.addFilter(UvicornSuppressFilter()) logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("numba.core.interpreter") logger = logging.getLogger("numba.core.byteflow")
logger.addFilter(UvicornSuppressFilter()) logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("numba.core.byteflow") # logger.propagate = False
logger.addFilter(UvicornSuppressFilter())
logger = logging.getLogger("multipart.multipart")
logger.propagate = False
# logger.propagate = False logging.getLogger("asyncio").setLevel(logging.WARNING)
logger = logging.getLogger("multipart.multipart")
logger.propagate = False
logging.getLogger('asyncio').setLevel(logging.WARNING)

View File

@ -17,7 +17,6 @@ scipy==1.10.1
matplotlib==3.7.1 matplotlib==3.7.1
fairseq==0.12.2 fairseq==0.12.2
websockets==11.0.2 websockets==11.0.2
praat-parselmouth==0.4.3
faiss-cpu==1.7.3 faiss-cpu==1.7.3
torchcrepe==0.0.18 torchcrepe==0.0.18
librosa==0.9.1 librosa==0.9.1

View File

@ -1,7 +1,8 @@
from fastapi import FastAPI, Request, Response from fastapi import FastAPI, Request, Response, HTTPException
from fastapi.routing import APIRoute from fastapi.routing import APIRoute
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles from fastapi.staticfiles import StaticFiles
from fastapi.exceptions import RequestValidationError
from typing import Callable from typing import Callable
from voice_changer.VoiceChangerManager import VoiceChangerManager from voice_changer.VoiceChangerManager import VoiceChangerManager
@ -18,7 +19,7 @@ class ValidationErrorLoggingRoute(APIRoute):
async def custom_route_handler(request: Request) -> Response: async def custom_route_handler(request: Request) -> Response:
try: try:
return await original_route_handler(request) return await original_route_handler(request)
except Exception as exc: except RequestValidationError as exc:
print("Exception", request.url, str(exc)) print("Exception", request.url, str(exc))
body = await request.body() body = await request.body()
detail = {"errors": exc.errors(), "body": body.decode()} detail = {"errors": exc.errors(), "body": body.decode()}
@ -28,10 +29,11 @@ class ValidationErrorLoggingRoute(APIRoute):
class MMVC_Rest: class MMVC_Rest:
_instance = None
@classmethod @classmethod
def get_instance(cls, voiceChangerManager: VoiceChangerManager): def get_instance(cls, voiceChangerManager: VoiceChangerManager):
if not hasattr(cls, "_instance"): if cls._instance is None:
app_fastapi = FastAPI() app_fastapi = FastAPI()
app_fastapi.router.route_class = ValidationErrorLoggingRoute app_fastapi.router.route_class = ValidationErrorLoggingRoute
app_fastapi.add_middleware( app_fastapi.add_middleware(
@ -43,15 +45,25 @@ class MMVC_Rest:
) )
app_fastapi.mount( app_fastapi.mount(
"/front", StaticFiles(directory=f'{getFrontendPath()}', html=True), name="static") "/front",
StaticFiles(directory=f"{getFrontendPath()}", html=True),
name="static",
)
app_fastapi.mount( app_fastapi.mount(
"/trainer", StaticFiles(directory=f'{getFrontendPath()}', html=True), name="static") "/trainer",
StaticFiles(directory=f"{getFrontendPath()}", html=True),
name="static",
)
app_fastapi.mount( app_fastapi.mount(
"/recorder", StaticFiles(directory=f'{getFrontendPath()}', html=True), name="static") "/recorder",
StaticFiles(directory=f"{getFrontendPath()}", html=True),
name="static",
)
app_fastapi.mount( app_fastapi.mount(
"/tmp", StaticFiles(directory=f'{TMP_DIR}'), name="static") "/tmp", StaticFiles(directory=f"{TMP_DIR}"), name="static"
)
restHello = MMVC_Rest_Hello() restHello = MMVC_Rest_Hello()
app_fastapi.include_router(restHello.router) app_fastapi.include_router(restHello.router)

View File

@ -4,12 +4,16 @@ from typing import Union
from fastapi import APIRouter from fastapi import APIRouter
from fastapi.encoders import jsonable_encoder from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from fastapi import HTTPException, FastAPI, UploadFile, File, Form from fastapi import UploadFile, File, Form
from restapi.mods.FileUploader import upload_file, concat_file_chunks from restapi.mods.FileUploader import upload_file, concat_file_chunks
from voice_changer.VoiceChangerManager import VoiceChangerManager from voice_changer.VoiceChangerManager import VoiceChangerManager
from const import MODEL_DIR, UPLOAD_DIR, ModelType from const import MODEL_DIR, UPLOAD_DIR, ModelType
from voice_changer.utils.LoadModelParams import FilePaths, LoadModelParams
from dataclasses import fields
os.makedirs(UPLOAD_DIR, exist_ok=True) os.makedirs(UPLOAD_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True) os.makedirs(MODEL_DIR, exist_ok=True)
@ -19,12 +23,16 @@ class MMVC_Rest_Fileuploader:
self.voiceChangerManager = voiceChangerManager self.voiceChangerManager = voiceChangerManager
self.router = APIRouter() self.router = APIRouter()
self.router.add_api_route("/info", self.get_info, methods=["GET"]) self.router.add_api_route("/info", self.get_info, methods=["GET"])
self.router.add_api_route("/upload_file", self.post_upload_file, methods=["POST"]) self.router.add_api_route(
self.router.add_api_route("/concat_uploaded_file", self.post_concat_uploaded_file, methods=["POST"]) "/upload_file", self.post_upload_file, methods=["POST"]
self.router.add_api_route("/update_settings", self.post_update_settings, methods=["POST"]) )
self.router.add_api_route(
"/concat_uploaded_file", self.post_concat_uploaded_file, methods=["POST"]
)
self.router.add_api_route(
"/update_settings", self.post_update_settings, methods=["POST"]
)
self.router.add_api_route("/load_model", self.post_load_model, methods=["POST"]) self.router.add_api_route("/load_model", self.post_load_model, methods=["POST"])
self.router.add_api_route("/load_model_for_train", self.post_load_model_for_train, methods=["POST"])
self.router.add_api_route("/extract_voices", self.post_extract_voices, methods=["POST"])
self.router.add_api_route("/model_type", self.post_model_type, methods=["POST"]) self.router.add_api_route("/model_type", self.post_model_type, methods=["POST"])
self.router.add_api_route("/model_type", self.get_model_type, methods=["GET"]) self.router.add_api_route("/model_type", self.get_model_type, methods=["GET"])
self.router.add_api_route("/onnx", self.get_onnx, methods=["GET"]) self.router.add_api_route("/onnx", self.get_onnx, methods=["GET"])
@ -34,9 +42,13 @@ class MMVC_Rest_Fileuploader:
json_compatible_item_data = jsonable_encoder(res) json_compatible_item_data = jsonable_encoder(res)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
def post_concat_uploaded_file(self, filename: str = Form(...), filenameChunkNum: int = Form(...)): def post_concat_uploaded_file(
self, filename: str = Form(...), filenameChunkNum: int = Form(...)
):
slot = 0 slot = 0
res = concat_file_chunks(slot, UPLOAD_DIR, filename, filenameChunkNum, UPLOAD_DIR) res = concat_file_chunks(
slot, UPLOAD_DIR, filename, filenameChunkNum, UPLOAD_DIR
)
json_compatible_item_data = jsonable_encoder(res) json_compatible_item_data = jsonable_encoder(res)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
@ -45,7 +57,9 @@ class MMVC_Rest_Fileuploader:
json_compatible_item_data = jsonable_encoder(info) json_compatible_item_data = jsonable_encoder(info)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
def post_update_settings(self, key: str = Form(...), val: Union[int, str, float] = Form(...)): def post_update_settings(
self, key: str = Form(...), val: Union[int, str, float] = Form(...)
):
print("post_update_settings", key, val) print("post_update_settings", key, val)
info = self.voiceChangerManager.update_settings(key, val) info = self.voiceChangerManager.update_settings(key, val)
json_compatible_item_data = jsonable_encoder(info) json_compatible_item_data = jsonable_encoder(info)
@ -63,72 +77,42 @@ class MMVC_Rest_Fileuploader:
isHalf: bool = Form(...), isHalf: bool = Form(...),
params: str = Form(...), params: str = Form(...),
): ):
files = FilePaths(
configFilename=configFilename,
pyTorchModelFilename=pyTorchModelFilename,
onnxModelFilename=onnxModelFilename,
clusterTorchModelFilename=clusterTorchModelFilename,
featureFilename=featureFilename,
indexFilename=indexFilename,
)
props: LoadModelParams = LoadModelParams(
slot=slot, isHalf=isHalf, params=params, files=files
)
props = {
"slot": slot,
"isHalf": isHalf,
"files": {
"configFilename": configFilename,
"pyTorchModelFilename": pyTorchModelFilename,
"onnxModelFilename": onnxModelFilename,
"clusterTorchModelFilename": clusterTorchModelFilename,
"featureFilename": featureFilename,
"indexFilename": indexFilename
},
"params": params
}
# Change Filepath # Change Filepath
for key, val in props["files"].items(): for field in fields(props.files):
key = field.name
val = getattr(props.files, key)
if val != "-": if val != "-":
uploadPath = os.path.join(UPLOAD_DIR, val) uploadPath = os.path.join(UPLOAD_DIR, val)
storeDir = os.path.join(UPLOAD_DIR, f"{slot}") storeDir = os.path.join(UPLOAD_DIR, f"{slot}")
os.makedirs(storeDir, exist_ok=True) os.makedirs(storeDir, exist_ok=True)
storePath = os.path.join(storeDir, val) storePath = os.path.join(storeDir, val)
shutil.move(uploadPath, storePath) shutil.move(uploadPath, storePath)
props["files"][key] = storePath setattr(props.files, key, storePath)
else: else:
props["files"][key] = None setattr(props.files, key, None)
# print("---------------------------------------------------2>", props)
info = self.voiceChangerManager.loadModel(props) info = self.voiceChangerManager.loadModel(props)
json_compatible_item_data = jsonable_encoder(info) json_compatible_item_data = jsonable_encoder(info)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
# return {"load": f"{configFilePath}, {pyTorchModelFilePath}, {onnxModelFilePath}"}
def post_load_model_for_train( def post_model_type(self, modelType: ModelType = Form(...)):
self,
modelGFilename: str = Form(...),
modelGFilenameChunkNum: int = Form(...),
modelDFilename: str = Form(...),
modelDFilenameChunkNum: int = Form(...),
):
modelGFilePath = concat_file_chunks(
UPLOAD_DIR, modelGFilename, modelGFilenameChunkNum, MODEL_DIR)
modelDFilePath = concat_file_chunks(
UPLOAD_DIR, modelDFilename, modelDFilenameChunkNum, MODEL_DIR)
return {"File saved": f"{modelGFilePath}, {modelDFilePath}"}
def post_extract_voices(
self,
zipFilename: str = Form(...),
zipFileChunkNum: int = Form(...),
):
zipFilePath = concat_file_chunks(
UPLOAD_DIR, zipFilename, zipFileChunkNum, UPLOAD_DIR)
shutil.unpack_archive(zipFilePath, "MMVC_Trainer/dataset/textful/")
return {"Zip file unpacked": f"{zipFilePath}"}
def post_model_type(
self,
modelType: ModelType = Form(...),
):
info = self.voiceChangerManager.switchModelType(modelType) info = self.voiceChangerManager.switchModelType(modelType)
json_compatible_item_data = jsonable_encoder(info) json_compatible_item_data = jsonable_encoder(info)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)
def get_model_type( def get_model_type(self):
self,
):
info = self.voiceChangerManager.getModelType() info = self.voiceChangerManager.getModelType()
json_compatible_item_data = jsonable_encoder(info) json_compatible_item_data = jsonable_encoder(info)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)

View File

@ -1,6 +1,6 @@
from fastapi import APIRouter from fastapi import APIRouter
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
class MMVC_Rest_Hello: class MMVC_Rest_Hello:
def __init__(self): def __init__(self):
self.router = APIRouter() self.router = APIRouter()
@ -8,6 +8,3 @@ class MMVC_Rest_Hello:
def hello(self): def hello(self):
return {"result": "Index"} return {"result": "Index"}

View File

@ -31,24 +31,24 @@ class MMVC_Rest_VoiceChanger:
buffer = voice.buffer buffer = voice.buffer
wav = base64.b64decode(buffer) wav = base64.b64decode(buffer)
if wav == 0: # if wav == 0:
samplerate, data = read("dummy.wav") # samplerate, data = read("dummy.wav")
unpackedData = data # unpackedData = data
else: # else:
unpackedData = np.array(struct.unpack( # unpackedData = np.array(
'<%sh' % (len(wav) // struct.calcsize('<h')), wav)) # struct.unpack("<%sh" % (len(wav) // struct.calcsize("<h")), wav)
# write("logs/received_data.wav", 24000, # )
# unpackedData.astype(np.int16))
unpackedData = np.array(
struct.unpack("<%sh" % (len(wav) // struct.calcsize("<h")), wav)
)
self.tlock.acquire() self.tlock.acquire()
changedVoice = self.voiceChangerManager.changeVoice(unpackedData) changedVoice = self.voiceChangerManager.changeVoice(unpackedData)
self.tlock.release() self.tlock.release()
changedVoiceBase64 = base64.b64encode(changedVoice[0]).decode('utf-8') changedVoiceBase64 = base64.b64encode(changedVoice[0]).decode("utf-8")
data = { data = {"timestamp": timestamp, "changedVoiceBase64": changedVoiceBase64}
"timestamp": timestamp,
"changedVoiceBase64": changedVoiceBase64
}
json_compatible_item_data = jsonable_encoder(data) json_compatible_item_data = jsonable_encoder(data)
return JSONResponse(content=json_compatible_item_data) return JSONResponse(content=json_compatible_item_data)

View File

@ -30,7 +30,6 @@ class MMVC_Namespace(socketio.AsyncNamespace):
else: else:
unpackedData = np.array(struct.unpack('<%sh' % (len(data) // struct.calcsize('<h')), data)).astype(np.int16) unpackedData = np.array(struct.unpack('<%sh' % (len(data) // struct.calcsize('<h')), data)).astype(np.int16)
# audio1, perf = self.voiceChangerManager.changeVoice(unpackedData)
res = self.voiceChangerManager.changeVoice(unpackedData) res = self.voiceChangerManager.changeVoice(unpackedData)
audio1 = res[0] audio1 = res[0]
perf = res[1] if len(res) == 2 else [0, 0, 0] perf = res[1] if len(res) == 2 else [0, 0, 0]

Binary file not shown.

Binary file not shown.

View File

@ -1,6 +1,11 @@
import sys import sys
import os import os
if sys.platform.startswith('darwin'): from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -10,24 +15,25 @@ if sys.platform.startswith('darwin'):
else: else:
sys.path.append("DDSP-SVC") sys.path.append("DDSP-SVC")
import io
from dataclasses import dataclass, asdict, field from dataclasses import dataclass, asdict, field
from functools import reduce
import numpy as np import numpy as np
import torch import torch
import onnxruntime import ddsp.vocoder as vo # type:ignore
import pyworld as pw from ddsp.core import upsample # type:ignore
import ddsp.vocoder as vo from enhancer import Enhancer # type:ignore
from ddsp.core import upsample
from enhancer import Enhancer
from Exceptions import NoModeLoadedException from Exceptions import NoModeLoadedException
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
@dataclass @dataclass
class DDSP_SVCSettings(): class DDSP_SVCSettings:
gpu: int = 0 gpu: int = 0
dstId: int = 0 dstId: int = 0
@ -45,18 +51,26 @@ class DDSP_SVCSettings():
onnxModelFile: str = "" onnxModelFile: str = ""
configFile: str = "" configFile: str = ""
speakers: dict[str, int] = field( speakers: dict[str, int] = field(default_factory=lambda: {})
default_factory=lambda: {}
)
# ↓mutableな物だけ列挙 # ↓mutableな物だけ列挙
intData = ["gpu", "dstId", "tran", "predictF0", "extraConvertSize", "enableEnhancer", "enhancerTune"] intData = [
"gpu",
"dstId",
"tran",
"predictF0",
"extraConvertSize",
"enableEnhancer",
"enhancerTune",
]
floatData = ["silentThreshold", "clusterInferRatio"] floatData = ["silentThreshold", "clusterInferRatio"]
strData = ["framework", "f0Detector"] strData = ["framework", "f0Detector"]
class DDSP_SVC: class DDSP_SVC:
def __init__(self, params): audio_buffer: AudioInOut | None = None
def __init__(self, params: VoiceChangerParams):
self.settings = DDSP_SVCSettings() self.settings = DDSP_SVCSettings()
self.net_g = None self.net_g = None
self.onnx_session = None self.onnx_session = None
@ -72,24 +86,30 @@ class DDSP_SVC:
else: else:
return torch.device("cpu") return torch.device("cpu")
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
# self.settings.configFile = props["files"]["configFilename"] # 同じフォルダにあるyamlを使う self.settings.pyTorchModelFile = props.files.pyTorchModelFilename
self.settings.pyTorchModelFile = props["files"]["pyTorchModelFilename"]
# model # model
model, args = vo.load_model(self.settings.pyTorchModelFile, device=self.useDevice()) model, args = vo.load_model(
self.settings.pyTorchModelFile, device=self.useDevice()
)
self.model = model self.model = model
self.args = args self.args = args
self.sampling_rate = args.data.sampling_rate self.sampling_rate = args.data.sampling_rate
self.hop_size = int(self.args.data.block_size * self.sampling_rate / self.args.data.sampling_rate) self.hop_size = int(
self.args.data.block_size
* self.sampling_rate
/ self.args.data.sampling_rate
)
# hubert # hubert
self.vec_path = self.params["hubert_soft"] self.vec_path = self.params.hubert_soft
self.encoder = vo.Units_Encoder( self.encoder = vo.Units_Encoder(
self.args.data.encoder, self.args.data.encoder,
self.vec_path, self.vec_path,
self.args.data.encoder_sample_rate, self.args.data.encoder_sample_rate,
self.args.data.encoder_hop_size, self.args.data.encoder_hop_size,
device=self.useDevice()) device=self.useDevice(),
)
# ort_options = onnxruntime.SessionOptions() # ort_options = onnxruntime.SessionOptions()
# ort_options.intra_op_num_threads = 8 # ort_options.intra_op_num_threads = 8
@ -111,36 +131,59 @@ class DDSP_SVC:
self.sampling_rate, self.sampling_rate,
self.hop_size, self.hop_size,
float(50), float(50),
float(1100)) float(1100),
)
self.volume_extractor = vo.Volume_Extractor(self.hop_size) self.volume_extractor = vo.Volume_Extractor(self.hop_size)
self.enhancer_path = self.params["nsf_hifigan"] self.enhancer_path = self.params.nsf_hifigan
self.enhancer = Enhancer(self.args.enhancer.type, self.enhancer_path, device=self.useDevice()) self.enhancer = Enhancer(
self.args.enhancer.type, self.enhancer_path, device=self.useDevice()
)
return self.get_info() return self.get_info()
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.onnx_session != None: if key == "onnxExecutionProvider" and self.onnx_session is not None:
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = int(val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.onnx_session != None: setattr(self.settings, key, val)
if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.onnx_session is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
if key == "gpu" and len(self.settings.pyTorchModelFile) > 0: if key == "gpu" and len(self.settings.pyTorchModelFile) > 0:
model, _args = vo.load_model(self.settings.pyTorchModelFile, device=self.useDevice()) model, _args = vo.load_model(
self.settings.pyTorchModelFile, device=self.useDevice()
)
self.model = model self.model = model
self.enhancer = Enhancer(self.args.enhancer.type, self.enhancer_path, device=self.useDevice()) self.enhancer = Enhancer(
self.encoder = vo.Units_Encoder(self.args.data.encoder, self.vec_path, self.args.data.encoder_sample_rate, self.args.enhancer.type, self.enhancer_path, device=self.useDevice()
self.args.data.encoder_hop_size, device=self.useDevice()) )
self.encoder = vo.Units_Encoder(
self.args.data.encoder,
self.vec_path,
self.args.data.encoder_sample_rate,
self.args.data.encoder_hop_size,
device=self.useDevice(),
)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
@ -148,19 +191,16 @@ class DDSP_SVC:
setattr(self.settings, key, str(val)) setattr(self.settings, key, str(val))
if key == "f0Detector": if key == "f0Detector":
print("f0Detector update", val) print("f0Detector update", val)
if val == "dio": # if val == "dio":
val = "parselmouth" # val = "parselmouth"
if hasattr(self, "sampling_rate") == False: if hasattr(self, "sampling_rate") is False:
self.sampling_rate = 44100 self.sampling_rate = 44100
self.hop_size = 512 self.hop_size = 512
self.f0_detector = vo.F0_Extractor( self.f0_detector = vo.F0_Extractor(
val, val, self.sampling_rate, self.hop_size, float(50), float(1100)
self.sampling_rate, )
self.hop_size,
float(50),
float(1100))
else: else:
return False return False
@ -169,10 +209,12 @@ class DDSP_SVC:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.onnx_session != None else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers() if self.onnx_session is not None else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -182,41 +224,64 @@ class DDSP_SVC:
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
return self.sampling_rate return self.sampling_rate
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / 32768.0 newData = newData.astype(np.float32) / 32768.0
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 self.audio_buffer = np.concatenate(
[self.audio_buffer, newData], 0
) # 過去のデータに連結
else: else:
self.audio_buffer = newData self.audio_buffer = newData
convertSize = inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize convertSize = (
inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize
)
if convertSize % self.hop_size != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % self.hop_size != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (self.hop_size - (convertSize % self.hop_size)) convertSize = convertSize + (self.hop_size - (convertSize % self.hop_size))
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
# f0 # f0
f0 = self.f0_detector.extract(self.audio_buffer * 32768.0, uv_interp=True, f0 = self.f0_detector.extract(
silence_front=self.settings.extraConvertSize / self.sampling_rate) self.audio_buffer * 32768.0,
uv_interp=True,
silence_front=self.settings.extraConvertSize / self.sampling_rate,
)
f0 = torch.from_numpy(f0).float().unsqueeze(-1).unsqueeze(0) f0 = torch.from_numpy(f0).float().unsqueeze(-1).unsqueeze(0)
f0 = f0 * 2 ** (float(self.settings.tran) / 12) f0 = f0 * 2 ** (float(self.settings.tran) / 12)
# volume, mask # volume, mask
volume = self.volume_extractor.extract(self.audio_buffer) volume = self.volume_extractor.extract(self.audio_buffer)
mask = (volume > 10 ** (float(-60) / 20)).astype('float') mask = (volume > 10 ** (float(-60) / 20)).astype("float")
mask = np.pad(mask, (4, 4), constant_values=(mask[0], mask[-1])) mask = np.pad(mask, (4, 4), constant_values=(mask[0], mask[-1]))
mask = np.array([np.max(mask[n: n + 9]) for n in range(len(mask) - 8)]) mask = np.array(
[np.max(mask[n : n + 9]) for n in range(len(mask) - 8)] # noqa: E203
)
mask = torch.from_numpy(mask).float().unsqueeze(-1).unsqueeze(0) mask = torch.from_numpy(mask).float().unsqueeze(-1).unsqueeze(0)
mask = upsample(mask, self.args.data.block_size).squeeze(-1) mask = upsample(mask, self.args.data.block_size).squeeze(-1)
volume = torch.from_numpy(volume).float().unsqueeze(-1).unsqueeze(0) volume = torch.from_numpy(volume).float().unsqueeze(-1).unsqueeze(0)
# embed # embed
audio = torch.from_numpy(self.audio_buffer).float().to(self.useDevice()).unsqueeze(0) audio = (
torch.from_numpy(self.audio_buffer)
.float()
.to(self.useDevice())
.unsqueeze(0)
)
seg_units = self.encoder.encode(audio, self.sampling_rate, self.hop_size) seg_units = self.encoder.encode(audio, self.sampling_rate, self.hop_size)
crop = self.audio_buffer[-1 * (inputSize + crossfadeSize):-1 * (crossfadeSize)] cropOffset = -1 * (inputSize + crossfadeSize)
cropEnd = -1 * (crossfadeSize)
crop = self.audio_buffer[cropOffset:cropEnd]
rms = np.sqrt(np.square(crop).mean(axis=0)) rms = np.sqrt(np.square(crop).mean(axis=0))
vol = max(rms, self.prevVol * 0.0) vol = max(rms, self.prevVol * 0.0)
@ -225,15 +290,14 @@ class DDSP_SVC:
return (seg_units, f0, volume, mask, convertSize, vol) return (seg_units, f0, volume, mask, convertSize, vol)
def _onnx_inference(self, data): def _onnx_inference(self, data):
if hasattr(self, "onnx_session") == False or self.onnx_session == None: if hasattr(self, "onnx_session") is False or self.onnx_session is None:
print("[Voice Changer] No onnx session.") print("[Voice Changer] No onnx session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if hasattr(self, "model") is False or self.model is None:
if hasattr(self, "model") == False or self.model == None:
print("[Voice Changer] No pyTorch session.") print("[Voice Changer] No pyTorch session.")
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
@ -242,15 +306,19 @@ class DDSP_SVC:
volume = data[2].to(self.useDevice()) volume = data[2].to(self.useDevice())
mask = data[3].to(self.useDevice()) mask = data[3].to(self.useDevice())
convertSize = data[4] # convertSize = data[4]
vol = data[5] # vol = data[5]
# if vol < self.settings.silentThreshold: # if vol < self.settings.silentThreshold:
# print("threshold") # print("threshold")
# return np.zeros(convertSize).astype(np.int16) # return np.zeros(convertSize).astype(np.int16)
with torch.no_grad(): with torch.no_grad():
spk_id = torch.LongTensor(np.array([[self.settings.dstId]])).to(self.useDevice()) spk_id = torch.LongTensor(np.array([[self.settings.dstId]])).to(
seg_output, _, (s_h, s_n) = self.model(c, f0, volume, spk_id=spk_id, spk_mix_dict=None) self.useDevice()
)
seg_output, _, (s_h, s_n) = self.model(
c, f0, volume, spk_id=spk_id, spk_mix_dict=None
)
seg_output *= mask seg_output *= mask
if self.settings.enableEnhancer: if self.settings.enableEnhancer:
@ -260,8 +328,9 @@ class DDSP_SVC:
f0, f0,
self.args.data.block_size, self.args.data.block_size,
# adaptive_key=float(self.settings.enhancerTune), # adaptive_key=float(self.settings.enhancerTune),
adaptive_key='auto', adaptive_key="auto",
silence_front=self.settings.extraConvertSize / self.sampling_rate) silence_front=self.settings.extraConvertSize / self.sampling_rate,
)
result = seg_output.squeeze().cpu().numpy() * 32768.0 result = seg_output.squeeze().cpu().numpy() * 32768.0
return np.array(result).astype(np.int16) return np.array(result).astype(np.int16)
@ -282,7 +351,7 @@ class DDSP_SVC:
del self.onnx_session del self.onnx_session
remove_path = os.path.join("DDSP-SVC") remove_path = os.path.join("DDSP-SVC")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -291,5 +360,5 @@ class DDSP_SVC:
if file_path.find("DDSP-SVC" + os.path.sep) >= 0: if file_path.find("DDSP-SVC" + os.path.sep) >= 0:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except: # type:ignore
pass pass

View File

@ -1,40 +0,0 @@
import os
import numpy as np
import pylab
import librosa
import librosa.display
import pyworld as pw
class IOAnalyzer:
def _get_f0_dio(self, y, sr):
_f0, time = pw.dio(y, sr, frame_period=5)
f0 = pw.stonemask(y, _f0, time, sr)
time = np.linspace(0, y.shape[0] / sr, len(time))
return f0, time
def _get_f0_harvest(self, y, sr):
_f0, time = pw.harvest(y, sr, frame_period=5)
f0 = pw.stonemask(y, _f0, time, sr)
time = np.linspace(0, y.shape[0] / sr, len(time))
return f0, time
def analyze(self, inputDataFile: str, dioImageFile: str, harvestImageFile: str, samplingRate: int):
y, sr = librosa.load(inputDataFile, samplingRate)
y = y.astype(np.float64)
spec = librosa.amplitude_to_db(np.abs(librosa.stft(y, n_fft=2048, win_length=2048, hop_length=128)), ref=np.max)
f0_dio, times = self._get_f0_dio(y, sr=samplingRate)
f0_harvest, times = self._get_f0_harvest(y, sr=samplingRate)
pylab.close()
HOP_LENGTH = 128
img = librosa.display.specshow(spec, sr=samplingRate, hop_length=HOP_LENGTH, x_axis='time', y_axis='log', )
pylab.plot(times, f0_dio, label='f0', color=(0, 1, 1, 0.6), linewidth=3)
pylab.savefig(dioImageFile)
pylab.close()
HOP_LENGTH = 128
img = librosa.display.specshow(spec, sr=samplingRate, hop_length=HOP_LENGTH, x_axis='time', y_axis='log', )
pylab.plot(times, f0_harvest, label='f0', color=(0, 1, 1, 0.6), linewidth=3)
pylab.savefig(harvestImageFile)

View File

@ -1,6 +1,10 @@
import sys import sys
import os import os
if sys.platform.startswith('darwin'):
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -12,23 +16,32 @@ else:
sys.path.append(modulePath) sys.path.append(modulePath)
from dataclasses import dataclass, asdict from dataclasses import dataclass, asdict, field
import numpy as np import numpy as np
import torch import torch
import onnxruntime import onnxruntime
import pyworld as pw
from symbols import symbols from symbols import symbols # type:ignore
from models import SynthesizerTrn from models import SynthesizerTrn # type:ignore
from voice_changer.MMVCv13.TrainerFunctions import TextAudioSpeakerCollate, spectrogram_torch, load_checkpoint, get_hparams_from_file from voice_changer.MMVCv13.TrainerFunctions import (
TextAudioSpeakerCollate,
spectrogram_torch,
load_checkpoint,
get_hparams_from_file,
)
from Exceptions import NoModeLoadedException from Exceptions import NoModeLoadedException
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
@dataclass @dataclass
class MMVCv13Settings(): class MMVCv13Settings:
gpu: int = 0 gpu: int = 0
srcId: int = 0 srcId: int = 0
dstId: int = 101 dstId: int = 101
@ -40,11 +53,13 @@ class MMVCv13Settings():
# ↓mutableな物だけ列挙 # ↓mutableな物だけ列挙
intData = ["gpu", "srcId", "dstId"] intData = ["gpu", "srcId", "dstId"]
floatData = [] floatData: list[str] = field(default_factory=lambda: [])
strData = ["framework"] strData = ["framework"]
class MMVCv13: class MMVCv13:
audio_buffer: AudioInOut | None = None
def __init__(self): def __init__(self):
self.settings = MMVCv13Settings() self.settings = MMVCv13Settings()
self.net_g = None self.net_g = None
@ -53,51 +68,62 @@ class MMVCv13:
self.gpu_num = torch.cuda.device_count() self.gpu_num = torch.cuda.device_count()
self.text_norm = torch.LongTensor([0, 6, 0]) self.text_norm = torch.LongTensor([0, 6, 0])
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
self.settings.configFile = props["files"]["configFilename"] self.settings.configFile = props.files.configFilename
self.hps = get_hparams_from_file(self.settings.configFile) self.hps = get_hparams_from_file(self.settings.configFile)
self.settings.pyTorchModelFile = props["files"]["pyTorchModelFilename"] self.settings.pyTorchModelFile = props.files.pyTorchModelFilename
self.settings.onnxModelFile = props["files"]["onnxModelFilename"] self.settings.onnxModelFile = props.files.onnxModelFilename
# PyTorchモデル生成 # PyTorchモデル生成
if self.settings.pyTorchModelFile != None: if self.settings.pyTorchModelFile is not None:
self.net_g = SynthesizerTrn( self.net_g = SynthesizerTrn(
len(symbols), len(symbols),
self.hps.data.filter_length // 2 + 1, self.hps.data.filter_length // 2 + 1,
self.hps.train.segment_size // self.hps.data.hop_length, self.hps.train.segment_size // self.hps.data.hop_length,
n_speakers=self.hps.data.n_speakers, n_speakers=self.hps.data.n_speakers,
**self.hps.model) **self.hps.model
)
self.net_g.eval() self.net_g.eval()
load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None) load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None)
# ONNXモデル生成 # ONNXモデル生成
if self.settings.onnxModelFile != None: if self.settings.onnxModelFile is not None:
ort_options = onnxruntime.SessionOptions() ort_options = onnxruntime.SessionOptions()
ort_options.intra_op_num_threads = 8 ort_options.intra_op_num_threads = 8
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.settings.onnxModelFile, self.settings.onnxModelFile, providers=providers
providers=providers
) )
return self.get_info() return self.get_info()
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.onnx_session != None: if key == "onnxExecutionProvider" and self.onnx_session is not None:
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = int(val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.onnx_session != None: setattr(self.settings, key, val)
if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.onnx_session is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -110,10 +136,12 @@ class MMVCv13:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.onnx_session != None else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers() if self.onnx_session is not None else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -121,22 +149,35 @@ class MMVCv13:
return data return data
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
if hasattr(self, "hps") == False: if hasattr(self, "hps") is False:
raise NoModeLoadedException("config") raise NoModeLoadedException("config")
return self.hps.data.sampling_rate return self.hps.data.sampling_rate
def _get_spec(self, audio: any): def _get_spec(self, audio: AudioInOut):
spec = spectrogram_torch(audio, self.hps.data.filter_length, spec = spectrogram_torch(
self.hps.data.sampling_rate, self.hps.data.hop_length, self.hps.data.win_length, audio,
center=False) self.hps.data.filter_length,
self.hps.data.sampling_rate,
self.hps.data.hop_length,
self.hps.data.win_length,
center=False,
)
spec = torch.squeeze(spec, 0) spec = torch.squeeze(spec, 0)
return spec return spec
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / self.hps.data.max_wav_value newData = newData.astype(np.float32) / self.hps.data.max_wav_value
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 self.audio_buffer = np.concatenate(
[self.audio_buffer, newData], 0
) # 過去のデータに連結
else: else:
self.audio_buffer = newData self.audio_buffer = newData
@ -145,9 +186,12 @@ class MMVCv13:
if convertSize < 8192: if convertSize < 8192:
convertSize = 8192 convertSize = 8192
if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)) convertSize = convertSize + (
self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)
)
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
audio = torch.FloatTensor(self.audio_buffer) audio = torch.FloatTensor(self.audio_buffer)
audio_norm = audio.unsqueeze(0) # unsqueeze audio_norm = audio.unsqueeze(0) # unsqueeze
@ -160,25 +204,29 @@ class MMVCv13:
return data return data
def _onnx_inference(self, data): def _onnx_inference(self, data):
if hasattr(self, "onnx_session") == False or self.onnx_session == None: if hasattr(self, "onnx_session") is False or self.onnx_session is None:
print("[Voice Changer] No ONNX session.") print("[Voice Changer] No ONNX session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x for x in data] x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x for x in data]
sid_tgt1 = torch.LongTensor([self.settings.dstId]) sid_tgt1 = torch.LongTensor([self.settings.dstId])
# if spec.size()[2] >= 8: # if spec.size()[2] >= 8:
audio1 = self.onnx_session.run( audio1 = (
self.onnx_session.run(
["audio"], ["audio"],
{ {
"specs": spec.numpy(), "specs": spec.numpy(),
"lengths": spec_lengths.numpy(), "lengths": spec_lengths.numpy(),
"sid_src": sid_src.numpy(), "sid_src": sid_src.numpy(),
"sid_tgt": sid_tgt1.numpy() "sid_tgt": sid_tgt1.numpy(),
})[0][0, 0] * self.hps.data.max_wav_value },
)[0][0, 0]
* self.hps.data.max_wav_value
)
return audio1 return audio1
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if hasattr(self, "net_g") == False or self.net_g == None: if hasattr(self, "net_g") is False or self.net_g is None:
print("[Voice Changer] No pyTorch session.") print("[Voice Changer] No pyTorch session.")
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
@ -188,11 +236,19 @@ class MMVCv13:
dev = torch.device("cuda", index=self.settings.gpu) dev = torch.device("cuda", index=self.settings.gpu)
with torch.no_grad(): with torch.no_grad():
x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [x.to(dev) for x in data] x, x_lengths, spec, spec_lengths, y, y_lengths, sid_src = [
x.to(dev) for x in data
]
sid_target = torch.LongTensor([self.settings.dstId]).to(dev) sid_target = torch.LongTensor([self.settings.dstId]).to(dev)
audio1 = (self.net_g.to(dev).voice_conversion(spec, spec_lengths, sid_src=sid_src, audio1 = (
sid_tgt=sid_target)[0, 0].data * self.hps.data.max_wav_value) self.net_g.to(dev)
.voice_conversion(
spec, spec_lengths, sid_src=sid_src, sid_tgt=sid_target
)[0, 0]
.data
* self.hps.data.max_wav_value
)
result = audio1.float().cpu().numpy() result = audio1.float().cpu().numpy()
return result return result
@ -208,7 +264,7 @@ class MMVCv13:
del self.net_g del self.net_g
del self.onnx_session del self.onnx_session
remove_path = os.path.join("MMVC_Client_v13", "python") remove_path = os.path.join("MMVC_Client_v13", "python")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -217,5 +273,5 @@ class MMVCv13:
if file_path.find(remove_path + os.path.sep) >= 0: if file_path.find(remove_path + os.path.sep) >= 0:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except: # type:ignore
pass pass

View File

@ -1,36 +1,58 @@
import torch import torch
import os, sys, json import os
import sys
import json
import logging import logging
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging logger = logging
hann_window = {} hann_window = {}
def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False): def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False):
if torch.min(y) < -1.: if torch.min(y) < -1.0:
print('min value is ', torch.min(y)) print("min value is ", torch.min(y))
if torch.max(y) > 1.: if torch.max(y) > 1.0:
print('max value is ', torch.max(y)) print("max value is ", torch.max(y))
global hann_window global hann_window
dtype_device = str(y.dtype) + '_' + str(y.device) dtype_device = str(y.dtype) + "_" + str(y.device)
wnsize_dtype_device = str(win_size) + '_' + dtype_device wnsize_dtype_device = str(win_size) + "_" + dtype_device
if wnsize_dtype_device not in hann_window: if wnsize_dtype_device not in hann_window:
hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device) hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(
dtype=y.dtype, device=y.device
)
y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect') y = torch.nn.functional.pad(
y.unsqueeze(1),
(int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)),
mode="reflect",
)
y = y.squeeze(1) y = y.squeeze(1)
spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device], spec = torch.stft(
center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True) y,
n_fft,
hop_length=hop_size,
win_length=win_size,
window=hann_window[wnsize_dtype_device],
center=center,
pad_mode="reflect",
normalized=False,
onesided=True,
return_complex=True,
)
spec = torch.view_as_real(spec) spec = torch.view_as_real(spec)
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6) spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
return spec return spec
class TextAudioSpeakerCollate():
""" Zero-pads model inputs and targets class TextAudioSpeakerCollate:
""" """Zero-pads model inputs and targets"""
def __init__(self, return_ids=False, no_text = False):
def __init__(self, return_ids=False, no_text=False):
self.return_ids = return_ids self.return_ids = return_ids
self.no_text = no_text self.no_text = no_text
@ -42,8 +64,8 @@ class TextAudioSpeakerCollate():
""" """
# Right zero-pad all one-hot text sequences to max input length # Right zero-pad all one-hot text sequences to max input length
_, ids_sorted_decreasing = torch.sort( _, ids_sorted_decreasing = torch.sort(
torch.LongTensor([x[1].size(1) for x in batch]), torch.LongTensor([x[1].size(1) for x in batch]), dim=0, descending=True
dim=0, descending=True) )
max_text_len = max([len(x[0]) for x in batch]) max_text_len = max([len(x[0]) for x in batch])
max_spec_len = max([x[1].size(1) for x in batch]) max_spec_len = max([x[1].size(1) for x in batch])
@ -64,49 +86,69 @@ class TextAudioSpeakerCollate():
row = batch[ids_sorted_decreasing[i]] row = batch[ids_sorted_decreasing[i]]
text = row[0] text = row[0]
text_padded[i, :text.size(0)] = text text_padded[i, : text.size(0)] = text
text_lengths[i] = text.size(0) text_lengths[i] = text.size(0)
spec = row[1] spec = row[1]
spec_padded[i, :, :spec.size(1)] = spec spec_padded[i, :, : spec.size(1)] = spec
spec_lengths[i] = spec.size(1) spec_lengths[i] = spec.size(1)
wav = row[2] wav = row[2]
wav_padded[i, :, :wav.size(1)] = wav wav_padded[i, :, : wav.size(1)] = wav
wav_lengths[i] = wav.size(1) wav_lengths[i] = wav.size(1)
sid[i] = row[3] sid[i] = row[3]
if self.return_ids: if self.return_ids:
return text_padded, text_lengths, spec_padded, spec_lengths, wav_padded, wav_lengths, sid, ids_sorted_decreasing return (
return text_padded, text_lengths, spec_padded, spec_lengths, wav_padded, wav_lengths, sid text_padded,
text_lengths,
spec_padded,
spec_lengths,
wav_padded,
wav_lengths,
sid,
ids_sorted_decreasing,
)
return (
text_padded,
text_lengths,
spec_padded,
spec_lengths,
wav_padded,
wav_lengths,
sid,
)
def load_checkpoint(checkpoint_path, model, optimizer=None): def load_checkpoint(checkpoint_path, model, optimizer=None):
assert os.path.isfile(checkpoint_path), f"No such file or directory: {checkpoint_path}" assert os.path.isfile(
checkpoint_dict = torch.load(checkpoint_path, map_location='cpu') checkpoint_path
iteration = checkpoint_dict['iteration'] ), f"No such file or directory: {checkpoint_path}"
learning_rate = checkpoint_dict['learning_rate'] checkpoint_dict = torch.load(checkpoint_path, map_location="cpu")
iteration = checkpoint_dict["iteration"]
learning_rate = checkpoint_dict["learning_rate"]
if optimizer is not None: if optimizer is not None:
optimizer.load_state_dict(checkpoint_dict['optimizer']) optimizer.load_state_dict(checkpoint_dict["optimizer"])
saved_state_dict = checkpoint_dict['model'] saved_state_dict = checkpoint_dict["model"]
if hasattr(model, 'module'): if hasattr(model, "module"):
state_dict = model.module.state_dict() state_dict = model.module.state_dict()
else: else:
state_dict = model.state_dict() state_dict = model.state_dict()
new_state_dict= {} new_state_dict = {}
for k, v in state_dict.items(): for k, v in state_dict.items():
try: try:
new_state_dict[k] = saved_state_dict[k] new_state_dict[k] = saved_state_dict[k]
except: except:
logger.info("%s is not in the checkpoint" % k) logger.info("%s is not in the checkpoint" % k)
new_state_dict[k] = v new_state_dict[k] = v
if hasattr(model, 'module'): if hasattr(model, "module"):
model.module.load_state_dict(new_state_dict) model.module.load_state_dict(new_state_dict)
else: else:
model.load_state_dict(new_state_dict) model.load_state_dict(new_state_dict)
logger.info("Loaded checkpoint '{}' (iteration {})" .format( logger.info(
checkpoint_path, iteration)) "Loaded checkpoint '{}' (iteration {})".format(checkpoint_path, iteration)
)
return model, optimizer, learning_rate, iteration return model, optimizer, learning_rate, iteration
@ -115,10 +157,11 @@ def get_hparams_from_file(config_path):
data = f.read() data = f.read()
config = json.loads(data) config = json.loads(data)
hparams =HParams(**config) hparams = HParams(**config)
return hparams return hparams
class HParams():
class HParams:
def __init__(self, **kwargs): def __init__(self, **kwargs):
for k, v in kwargs.items(): for k, v in kwargs.items():
if type(v) == dict: if type(v) == dict:
@ -148,4 +191,3 @@ class HParams():
def __repr__(self): def __repr__(self):
return self.__dict__.__repr__() return self.__dict__.__repr__()

View File

@ -1,6 +1,10 @@
import sys import sys
import os import os
if sys.platform.startswith('darwin'):
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -17,16 +21,26 @@ import torch
import onnxruntime import onnxruntime
import pyworld as pw import pyworld as pw
from models import SynthesizerTrn from models import SynthesizerTrn # type:ignore
from voice_changer.MMVCv15.client_modules import convert_continuos_f0, spectrogram_torch, get_hparams_from_file, load_checkpoint from voice_changer.MMVCv15.client_modules import (
convert_continuos_f0,
spectrogram_torch,
get_hparams_from_file,
load_checkpoint,
)
from Exceptions import NoModeLoadedException, ONNXInputArgumentException from Exceptions import NoModeLoadedException, ONNXInputArgumentException
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
@dataclass @dataclass
class MMVCv15Settings(): class MMVCv15Settings:
gpu: int = 0 gpu: int = 0
srcId: int = 0 srcId: int = 0
dstId: int = 101 dstId: int = 101
@ -46,6 +60,8 @@ class MMVCv15Settings():
class MMVCv15: class MMVCv15:
audio_buffer: AudioInOut | None = None
def __init__(self): def __init__(self):
self.settings = MMVCv15Settings() self.settings = MMVCv15Settings()
self.net_g = None self.net_g = None
@ -53,13 +69,12 @@ class MMVCv15:
self.gpu_num = torch.cuda.device_count() self.gpu_num = torch.cuda.device_count()
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
self.settings.configFile = props.files.configFilename
self.settings.configFile = props["files"]["configFilename"]
self.hps = get_hparams_from_file(self.settings.configFile) self.hps = get_hparams_from_file(self.settings.configFile)
self.settings.pyTorchModelFile = props["files"]["pyTorchModelFilename"] self.settings.pyTorchModelFile = props.files.pyTorchModelFilename
self.settings.onnxModelFile = props["files"]["onnxModelFilename"] self.settings.onnxModelFile = props.files.onnxModelFilename
# PyTorchモデル生成 # PyTorchモデル生成
self.net_g = SynthesizerTrn( self.net_g = SynthesizerTrn(
@ -78,20 +93,19 @@ class MMVCv15:
requires_grad_pe=self.hps.requires_grad.pe, requires_grad_pe=self.hps.requires_grad.pe,
requires_grad_flow=self.hps.requires_grad.flow, requires_grad_flow=self.hps.requires_grad.flow,
requires_grad_text_enc=self.hps.requires_grad.text_enc, requires_grad_text_enc=self.hps.requires_grad.text_enc,
requires_grad_dec=self.hps.requires_grad.dec requires_grad_dec=self.hps.requires_grad.dec,
) )
if self.settings.pyTorchModelFile != None: if self.settings.pyTorchModelFile is not None:
self.net_g.eval() self.net_g.eval()
load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None) load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None)
# ONNXモデル生成 # ONNXモデル生成
self.onxx_input_length = 8192 self.onxx_input_length = 8192
if self.settings.onnxModelFile != None: if self.settings.onnxModelFile is not None:
ort_options = onnxruntime.SessionOptions() ort_options = onnxruntime.SessionOptions()
ort_options.intra_op_num_threads = 8 ort_options.intra_op_num_threads = 8
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.settings.onnxModelFile, self.settings.onnxModelFile, providers=providers
providers=providers
) )
inputs_info = self.onnx_session.get_inputs() inputs_info = self.onnx_session.get_inputs()
for i in inputs_info: for i in inputs_info:
@ -100,23 +114,39 @@ class MMVCv15:
self.onxx_input_length = i.shape[2] self.onxx_input_length = i.shape[2]
return self.get_info() return self.get_info()
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.settings.onnxModelFile != "": # self.onnx_session != None: if (
key == "onnxExecutionProvider"
and self.settings.onnxModelFile != ""
and self.settings.onnxModelFile is not None
):
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = int(val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.settings.onnxModelFile != "": # self.onnx_session != None: setattr(self.settings, key, val)
if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.settings.onnxModelFile != ""
and self.settings.onnxModelFile is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -129,10 +159,15 @@ class MMVCv15:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.settings.onnxModelFile != "" else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers()
if self.settings.onnxModelFile != ""
and self.settings.onnxModelFile is not None
else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -140,36 +175,58 @@ class MMVCv15:
return data return data
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
if hasattr(self, "hps") == False: if hasattr(self, "hps") is False:
raise NoModeLoadedException("config") raise NoModeLoadedException("config")
return self.hps.data.sampling_rate return self.hps.data.sampling_rate
def _get_f0(self, detector: str, newData: any): def _get_f0(self, detector: str, newData: AudioInOut):
audio_norm_np = newData.astype(np.float64) audio_norm_np = newData.astype(np.float64)
if detector == "dio": if detector == "dio":
_f0, _time = pw.dio(audio_norm_np, self.hps.data.sampling_rate, frame_period=5.5) _f0, _time = pw.dio(
audio_norm_np, self.hps.data.sampling_rate, frame_period=5.5
)
f0 = pw.stonemask(audio_norm_np, _f0, _time, self.hps.data.sampling_rate) f0 = pw.stonemask(audio_norm_np, _f0, _time, self.hps.data.sampling_rate)
else: else:
f0, t = pw.harvest(audio_norm_np, self.hps.data.sampling_rate, frame_period=5.5, f0_floor=71.0, f0_ceil=1000.0) f0, t = pw.harvest(
f0 = convert_continuos_f0(f0, int(audio_norm_np.shape[0] / self.hps.data.hop_length)) audio_norm_np,
self.hps.data.sampling_rate,
frame_period=5.5,
f0_floor=71.0,
f0_ceil=1000.0,
)
f0 = convert_continuos_f0(
f0, int(audio_norm_np.shape[0] / self.hps.data.hop_length)
)
f0 = torch.from_numpy(f0.astype(np.float32)) f0 = torch.from_numpy(f0.astype(np.float32))
return f0 return f0
def _get_spec(self, newData: any): def _get_spec(self, newData: AudioInOut):
audio = torch.FloatTensor(newData) audio = torch.FloatTensor(newData)
audio_norm = audio.unsqueeze(0) # unsqueeze audio_norm = audio.unsqueeze(0) # unsqueeze
spec = spectrogram_torch(audio_norm, self.hps.data.filter_length, spec = spectrogram_torch(
self.hps.data.sampling_rate, self.hps.data.hop_length, self.hps.data.win_length, audio_norm,
center=False) self.hps.data.filter_length,
self.hps.data.sampling_rate,
self.hps.data.hop_length,
self.hps.data.win_length,
center=False,
)
spec = torch.squeeze(spec, 0) spec = torch.squeeze(spec, 0)
return spec return spec
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / self.hps.data.max_wav_value newData = newData.astype(np.float32) / self.hps.data.max_wav_value
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 self.audio_buffer = np.concatenate(
[self.audio_buffer, newData], 0
) # 過去のデータに連結
else: else:
self.audio_buffer = newData self.audio_buffer = newData
@ -178,13 +235,16 @@ class MMVCv15:
if convertSize < 8192: if convertSize < 8192:
convertSize = 8192 convertSize = 8192
if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)) convertSize = convertSize + (
self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)
)
# ONNX は固定長 # ONNX は固定長
if self.settings.framework == "ONNX": if self.settings.framework == "ONNX":
convertSize = self.onxx_input_length convertSize = self.onxx_input_length
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
f0 = self._get_f0(self.settings.f0Detector, self.audio_buffer) # torch f0 = self._get_f0(self.settings.f0Detector, self.audio_buffer) # torch
f0 = (f0 * self.settings.f0Factor).unsqueeze(0).unsqueeze(0) f0 = (f0 * self.settings.f0Factor).unsqueeze(0).unsqueeze(0)
@ -193,7 +253,7 @@ class MMVCv15:
return [spec, f0, sid] return [spec, f0, sid]
def _onnx_inference(self, data): def _onnx_inference(self, data):
if self.settings.onnxModelFile == "" or self.settings.onnxModelFile == None: if self.settings.onnxModelFile == "" and self.settings.onnxModelFile is None:
print("[Voice Changer] No ONNX session.") print("[Voice Changer] No ONNX session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
@ -203,7 +263,8 @@ class MMVCv15:
sid_tgt1 = torch.LongTensor([self.settings.dstId]) sid_tgt1 = torch.LongTensor([self.settings.dstId])
sin, d = self.net_g.make_sin_d(f0) sin, d = self.net_g.make_sin_d(f0)
(d0, d1, d2, d3) = d (d0, d1, d2, d3) = d
audio1 = self.onnx_session.run( audio1 = (
self.onnx_session.run(
["audio"], ["audio"],
{ {
"specs": spec.numpy(), "specs": spec.numpy(),
@ -214,12 +275,18 @@ class MMVCv15:
"d2": d2.numpy(), "d2": d2.numpy(),
"d3": d3.numpy(), "d3": d3.numpy(),
"sid_src": sid_src.numpy(), "sid_src": sid_src.numpy(),
"sid_tgt": sid_tgt1.numpy() "sid_tgt": sid_tgt1.numpy(),
})[0][0, 0] * self.hps.data.max_wav_value },
)[0][0, 0]
* self.hps.data.max_wav_value
)
return audio1 return audio1
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if self.settings.pyTorchModelFile == "" or self.settings.pyTorchModelFile == None: if (
self.settings.pyTorchModelFile == ""
or self.settings.pyTorchModelFile is None
):
print("[Voice Changer] No pyTorch session.") print("[Voice Changer] No pyTorch session.")
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
@ -236,7 +303,12 @@ class MMVCv15:
sid_src = sid_src.to(dev) sid_src = sid_src.to(dev)
sid_target = torch.LongTensor([self.settings.dstId]).to(dev) sid_target = torch.LongTensor([self.settings.dstId]).to(dev)
audio1 = self.net_g.to(dev).voice_conversion(spec, spec_lengths, f0, sid_src, sid_target)[0, 0].data * self.hps.data.max_wav_value audio1 = (
self.net_g.to(dev)
.voice_conversion(spec, spec_lengths, f0, sid_src, sid_target)[0, 0]
.data
* self.hps.data.max_wav_value
)
result = audio1.float().cpu().numpy() result = audio1.float().cpu().numpy()
return result return result
@ -256,7 +328,7 @@ class MMVCv15:
del self.onnx_session del self.onnx_session
remove_path = os.path.join("MMVC_Client_v15", "python") remove_path = os.path.join("MMVC_Client_v15", "python")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -265,5 +337,5 @@ class MMVCv15:
if file_path.find(remove_path + os.path.sep) >= 0: if file_path.find(remove_path + os.path.sep) >= 0:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except: # type:ignore
pass pass

View File

@ -0,0 +1,17 @@
from dataclasses import dataclass
from voice_changer.RVC.const import RVC_MODEL_TYPE_RVC
@dataclass
class ModelSlot:
pyTorchModelFile: str = ""
onnxModelFile: str = ""
featureFile: str = ""
indexFile: str = ""
defaultTrans: int = 0
modelType: int = RVC_MODEL_TYPE_RVC
samplingRate: int = -1
f0: bool = True
embChannels: int = 256
deprecated: bool = False
embedder: str = "hubert_base" # "hubert_base", "contentvec", "distilhubert"

View File

@ -1,6 +1,8 @@
import onnxruntime import onnxruntime
import torch import torch
import numpy as np import numpy as np
import json
# providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] # providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"]
providers = ["CPUExecutionProvider"] providers = ["CPUExecutionProvider"]
@ -12,8 +14,7 @@ class ModelWrapper:
# ort_options = onnxruntime.SessionOptions() # ort_options = onnxruntime.SessionOptions()
# ort_options.intra_op_num_threads = 8 # ort_options.intra_op_num_threads = 8
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.onnx_model, self.onnx_model, providers=providers
providers=providers
) )
# input_info = s # input_info = s
first_input_type = self.onnx_session.get_inputs()[0].type first_input_type = self.onnx_session.get_inputs()[0].type
@ -21,21 +22,89 @@ class ModelWrapper:
self.is_half = False self.is_half = False
else: else:
self.is_half = True self.is_half = True
modelmeta = self.onnx_session.get_modelmeta()
try:
metadata = json.loads(modelmeta.custom_metadata_map["metadata"])
self.samplingRate = metadata["samplingRate"]
self.f0 = metadata["f0"]
self.embChannels = metadata["embChannels"]
self.modelType = metadata["modelType"]
self.deprecated = False
self.embedder = (
metadata["embedder"] if "embedder" in metadata else "hubert_base"
)
print(
f"[Voice Changer] Onnx metadata: sr:{self.samplingRate}, f0:{self.f0}, embedder:{self.embedder}"
)
except:
self.samplingRate = 48000
self.f0 = True
self.embChannels = 256
self.modelType = 0
self.deprecated = True
self.embedder = "hubert_base"
print(
"[Voice Changer] ############## !!!! CAUTION !!!! ####################"
)
print(
"[Voice Changer] This onnx's version is depricated. Please regenerate onnxfile. Fallback to default"
)
print(
f"[Voice Changer] Onnx metadata: sr:{self.samplingRate}, f0:{self.f0}"
)
print(
"[Voice Changer] ############## !!!! CAUTION !!!! ####################"
)
def getSamplingRate(self):
return self.samplingRate
def getF0(self):
return self.f0
def getEmbChannels(self):
return self.embChannels
def getModelType(self):
return self.modelType
def getDeprecated(self):
return self.deprecated
def getEmbedder(self):
return self.embedder
def set_providers(self, providers, provider_options=[{}]): def set_providers(self, providers, provider_options=[{}]):
self.onnx_session.set_providers(providers=providers, provider_options=provider_options) self.onnx_session.set_providers(
providers=providers, provider_options=provider_options
)
def get_providers(self): def get_providers(self):
return self.onnx_session.get_providers() return self.onnx_session.get_providers()
def infer_pitchless(self, feats, p_len, sid):
if self.is_half:
audio1 = self.onnx_session.run(
["audio"],
{
"feats": feats.cpu().numpy().astype(np.float16),
"p_len": p_len.cpu().numpy().astype(np.int64),
"sid": sid.cpu().numpy().astype(np.int64),
},
)
else:
audio1 = self.onnx_session.run(
["audio"],
{
"feats": feats.cpu().numpy().astype(np.float32),
"p_len": p_len.cpu().numpy().astype(np.int64),
"sid": sid.cpu().numpy().astype(np.int64),
},
)
return torch.tensor(np.array(audio1))
def infer(self, feats, p_len, pitch, pitchf, sid): def infer(self, feats, p_len, pitch, pitchf, sid):
if self.is_half: if self.is_half:
# print("feats", feats.cpu().numpy().dtype)
# print("p_len", p_len.cpu().numpy().dtype)
# print("pitch", pitch.cpu().numpy().dtype)
# print("pitchf", pitchf.cpu().numpy().dtype)
# print("sid", sid.cpu().numpy().dtype)
audio1 = self.onnx_session.run( audio1 = self.onnx_session.run(
["audio"], ["audio"],
{ {
@ -44,7 +113,8 @@ class ModelWrapper:
"pitch": pitch.cpu().numpy().astype(np.int64), "pitch": pitch.cpu().numpy().astype(np.int64),
"pitchf": pitchf.cpu().numpy().astype(np.float32), "pitchf": pitchf.cpu().numpy().astype(np.float32),
"sid": sid.cpu().numpy().astype(np.int64), "sid": sid.cpu().numpy().astype(np.int64),
}) },
)
else: else:
audio1 = self.onnx_session.run( audio1 = self.onnx_session.run(
["audio"], ["audio"],
@ -54,6 +124,7 @@ class ModelWrapper:
"pitch": pitch.cpu().numpy().astype(np.int64), "pitch": pitch.cpu().numpy().astype(np.int64),
"pitchf": pitchf.cpu().numpy().astype(np.float32), "pitchf": pitchf.cpu().numpy().astype(np.float32),
"sid": sid.cpu().numpy().astype(np.int64), "sid": sid.cpu().numpy().astype(np.int64),
}) },
)
return torch.tensor(np.array(audio1)) return torch.tensor(np.array(audio1))

View File

@ -4,11 +4,27 @@ import json
import resampy import resampy
from voice_changer.RVC.ModelWrapper import ModelWrapper from voice_changer.RVC.ModelWrapper import ModelWrapper
from Exceptions import NoModeLoadedException from Exceptions import NoModeLoadedException
from voice_changer.RVC.RVCSettings import RVCSettings
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
from dataclasses import asdict
from typing import cast
import numpy as np
import torch
from fairseq import checkpoint_utils
import traceback
import faiss
from const import TMP_DIR # type:ignore
# avoiding parse arg error in RVC # avoiding parse arg error in RVC
sys.argv = ["MMVCServerSIO.py"] sys.argv = ["MMVCServerSIO.py"]
if sys.platform.startswith('darwin'): if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -18,112 +34,93 @@ if sys.platform.startswith('darwin'):
else: else:
sys.path.append("RVC") sys.path.append("RVC")
import io
from dataclasses import dataclass, asdict, field
from functools import reduce
import numpy as np
import torch
import onnxruntime
# onnxruntime.set_default_logger_severity(3)
from const import HUBERT_ONNX_MODEL_PATH, TMP_DIR
import pyworld as pw
from .models import SynthesizerTrnMsNSFsid as SynthesizerTrnMsNSFsid_webui
from .models import SynthesizerTrnMsNSFsidNono as SynthesizerTrnMsNSFsidNono_webui
from .const import RVC_MODEL_TYPE_RVC, RVC_MODEL_TYPE_WEBUI
from voice_changer.RVC.custom_vc_infer_pipeline import VC from voice_changer.RVC.custom_vc_infer_pipeline import VC
from infer_pack.models import SynthesizerTrnMs256NSFsid from infer_pack.models import ( # type:ignore
from fairseq import checkpoint_utils SynthesizerTrnMs256NSFsid,
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] SynthesizerTrnMs256NSFsid_nono,
)
providers = [
@dataclass "OpenVINOExecutionProvider",
class ModelSlot(): "CUDAExecutionProvider",
pyTorchModelFile: str = "" "DmlExecutionProvider",
onnxModelFile: str = "" "CPUExecutionProvider",
featureFile: str = "" ]
indexFile: str = ""
defaultTrans: int = ""
@dataclass
class RVCSettings():
gpu: int = 0
dstId: int = 0
f0Detector: str = "pm" # pm or harvest
tran: int = 20
silentThreshold: float = 0.00001
extraConvertSize: int = 1024 * 32
clusterInferRatio: float = 0.1
framework: str = "PyTorch" # PyTorch or ONNX
pyTorchModelFile: str = ""
onnxModelFile: str = ""
configFile: str = ""
modelSlots: list[ModelSlot] = field(
default_factory=lambda: [
ModelSlot(), ModelSlot(), ModelSlot()
]
)
indexRatio: float = 0
rvcQuality: int = 0
silenceFront: int = 1 # 0:off, 1:on
modelSamplingRate: int = 48000
modelSlotIndex: int = 0
speakers: dict[str, int] = field(
default_factory=lambda: {}
)
# ↓mutableな物だけ列挙
intData = ["gpu", "dstId", "tran", "extraConvertSize", "rvcQuality", "modelSamplingRate", "silenceFront", "modelSlotIndex"]
floatData = ["silentThreshold", "indexRatio"]
strData = ["framework", "f0Detector"]
class RVC: class RVC:
def __init__(self, params): audio_buffer: AudioInOut | None = None
def __init__(self, params: VoiceChangerParams):
self.initialLoad = True self.initialLoad = True
self.settings = RVCSettings() self.settings = RVCSettings()
self.inferenceing: bool = False
self.net_g = None self.net_g = None
self.onnx_session = None self.onnx_session = None
self.feature_file = None self.feature_file = None
self.index_file = None self.index_file = None
# self.net_g2 = None
# self.onnx_session2 = None
# self.feature_file2 = None
# self.index_file2 = None
self.gpu_num = torch.cuda.device_count() self.gpu_num = torch.cuda.device_count()
self.prevVol = 0 self.prevVol = 0
self.params = params self.params = params
self.mps_enabled: bool = getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available()
self.mps_enabled: bool = (
getattr(torch.backends, "mps", None) is not None
and torch.backends.mps.is_available()
)
self.currentSlot = -1 self.currentSlot = -1
print("RVC initialization: ", params) print("RVC initialization: ", params)
print("mps: ", self.mps_enabled) print("mps: ", self.mps_enabled)
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
self.is_half = props["isHalf"] """
self.tmp_slot = props["slot"] loadModelはスロットへのエントリ推論向けにはロードしない
params_str = props["params"] 例外的にまだ一つも推論向けにロードされていない場合はロードする
"""
self.is_half = props.isHalf
tmp_slot = props.slot
params_str = props.params
params = json.loads(params_str) params = json.loads(params_str)
self.settings.modelSlots[self.tmp_slot] = ModelSlot( self.settings.modelSlots[
pyTorchModelFile=props["files"]["pyTorchModelFilename"], tmp_slot
onnxModelFile=props["files"]["onnxModelFilename"], ].pyTorchModelFile = props.files.pyTorchModelFilename
featureFile=props["files"]["featureFilename"], self.settings.modelSlots[tmp_slot].onnxModelFile = props.files.onnxModelFilename
indexFile=props["files"]["indexFilename"], self.settings.modelSlots[tmp_slot].featureFile = props.files.featureFilename
defaultTrans=params["trans"] self.settings.modelSlots[tmp_slot].indexFile = props.files.indexFilename
self.settings.modelSlots[tmp_slot].defaultTrans = params["trans"]
isONNX = (
True
if self.settings.modelSlots[tmp_slot].onnxModelFile is not None
else False
) )
print("[Voice Changer] RVC loading... slot:", self.tmp_slot) # メタデータ設定
if isONNX:
self._setInfoByONNX(
tmp_slot, self.settings.modelSlots[tmp_slot].onnxModelFile
)
else:
self._setInfoByPytorch(
tmp_slot, self.settings.modelSlots[tmp_slot].pyTorchModelFile
)
print(
f"[Voice Changer] RVC loading... slot:{tmp_slot}",
asdict(self.settings.modelSlots[tmp_slot]),
)
# hubertロード
try: try:
hubert_path = self.params["hubert_base"] hubert_path = self.params.hubert_base
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([hubert_path], suffix="",) models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
[hubert_path],
suffix="",
)
model = models[0] model = models[0]
model.eval() model.eval()
if self.is_half: if self.is_half:
@ -133,85 +130,194 @@ class RVC:
except Exception as e: except Exception as e:
print("EXCEPTION during loading hubert/contentvec model", e) print("EXCEPTION during loading hubert/contentvec model", e)
# self.switchModel(self.slot) # 初回のみロード
if self.initialLoad: if self.initialLoad or tmp_slot == self.currentSlot:
self.prepareModel(self.tmp_slot) self.prepareModel(tmp_slot)
self.slot = self.tmp_slot self.settings.modelSlotIndex = tmp_slot
self.currentSlot = self.slot self.currentSlot = self.settings.modelSlotIndex
self.switchModel() self.switchModel()
self.initialLoad = False self.initialLoad = False
return self.get_info() return self.get_info()
def _setInfoByPytorch(self, slot, file):
cpt = torch.load(file, map_location="cpu")
config_len = len(cpt["config"])
if config_len == 18:
self.settings.modelSlots[slot].modelType = RVC_MODEL_TYPE_RVC
self.settings.modelSlots[slot].embChannels = 256
self.settings.modelSlots[slot].embedder = "hubert_base"
else:
self.settings.modelSlots[slot].modelType = RVC_MODEL_TYPE_WEBUI
self.settings.modelSlots[slot].embChannels = cpt["config"][17]
self.settings.modelSlots[slot].embedder = cpt["embedder_name"]
if self.settings.modelSlots[slot].embedder.endswith("768"):
self.settings.modelSlots[slot].embedder = self.settings.modelSlots[
slot
].embedder[:-3]
self.settings.modelSlots[slot].f0 = True if cpt["f0"] == 1 else False
self.settings.modelSlots[slot].samplingRate = cpt["config"][-1]
# self.settings.modelSamplingRate = cpt["config"][-1]
def _setInfoByONNX(self, slot, file):
tmp_onnx_session = ModelWrapper(file)
self.settings.modelSlots[slot].modelType = tmp_onnx_session.getModelType()
self.settings.modelSlots[slot].embChannels = tmp_onnx_session.getEmbChannels()
self.settings.modelSlots[slot].embedder = tmp_onnx_session.getEmbedder()
self.settings.modelSlots[slot].f0 = tmp_onnx_session.getF0()
self.settings.modelSlots[slot].samplingRate = tmp_onnx_session.getSamplingRate()
self.settings.modelSlots[slot].deprecated = tmp_onnx_session.getDeprecated()
def prepareModel(self, slot: int): def prepareModel(self, slot: int):
if slot < 0:
return self.get_info()
print("[Voice Changer] Prepare Model of slot:", slot) print("[Voice Changer] Prepare Model of slot:", slot)
pyTorchModelFile = self.settings.modelSlots[slot].pyTorchModelFile
onnxModelFile = self.settings.modelSlots[slot].onnxModelFile onnxModelFile = self.settings.modelSlots[slot].onnxModelFile
# PyTorchモデル生成 isONNX = (
if pyTorchModelFile != None and pyTorchModelFile != "": True if self.settings.modelSlots[slot].onnxModelFile is not None else False
cpt = torch.load(pyTorchModelFile, map_location="cpu") )
self.settings.modelSamplingRate = cpt["config"][-1]
# モデルのロード
if isONNX:
print("[Voice Changer] Loading ONNX Model...")
self.next_onnx_session = ModelWrapper(onnxModelFile)
self.next_net_g = None
else:
print("[Voice Changer] Loading Pytorch Model...")
torchModelSlot = self.settings.modelSlots[slot]
cpt = torch.load(torchModelSlot.pyTorchModelFile, map_location="cpu")
if (
torchModelSlot.modelType == RVC_MODEL_TYPE_RVC
and torchModelSlot.f0 is True
):
net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=self.is_half) net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=self.is_half)
elif (
torchModelSlot.modelType == RVC_MODEL_TYPE_RVC
and torchModelSlot.f0 is False
):
net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
elif (
torchModelSlot.modelType == RVC_MODEL_TYPE_WEBUI
and torchModelSlot.f0 is True
):
net_g = SynthesizerTrnMsNSFsid_webui(
**cpt["params"], is_half=self.is_half
)
else:
net_g = SynthesizerTrnMsNSFsidNono_webui(
**cpt["params"], is_half=self.is_half
)
net_g.eval() net_g.eval()
net_g.load_state_dict(cpt["weight"], strict=False) net_g.load_state_dict(cpt["weight"], strict=False)
if self.is_half: if self.is_half:
net_g = net_g.half() net_g = net_g.half()
self.next_net_g = net_g
else:
self.next_net_g = None
# ONNXモデル生成 self.next_net_g = net_g
if onnxModelFile != None and onnxModelFile != "":
self.next_onnx_session = ModelWrapper(onnxModelFile)
else:
self.next_onnx_session = None self.next_onnx_session = None
# Indexのロード
print("[Voice Changer] Loading index...")
self.next_feature_file = self.settings.modelSlots[slot].featureFile self.next_feature_file = self.settings.modelSlots[slot].featureFile
self.next_index_file = self.settings.modelSlots[slot].indexFile self.next_index_file = self.settings.modelSlots[slot].indexFile
self.next_trans = self.settings.modelSlots[slot].defaultTrans
if (
self.settings.modelSlots[slot].featureFile is not None
and self.settings.modelSlots[slot].indexFile is not None
):
if (
os.path.exists(self.settings.modelSlots[slot].featureFile) is True
and os.path.exists(self.settings.modelSlots[slot].indexFile) is True
):
try:
self.next_index = faiss.read_index(
self.settings.modelSlots[slot].indexFile
)
self.next_feature = np.load(
self.settings.modelSlots[slot].featureFile
)
except:
print("[Voice Changer] load index failed. Use no index.")
traceback.print_exc()
self.next_index = self.next_feature = None
else:
print("[Voice Changer] Index file is not found. Use no index.")
self.next_index = self.next_feature = None
else:
self.next_index = self.next_feature = None
self.next_trans = self.settings.modelSlots[slot].defaultTrans
self.next_samplingRate = self.settings.modelSlots[slot].samplingRate
self.next_framework = (
"ONNX" if self.next_onnx_session is not None else "PyTorch"
)
print("[Voice Changer] Prepare done.")
return self.get_info() return self.get_info()
def switchModel(self): def switchModel(self):
print("[Voice Changer] Switching model..")
# del self.net_g # del self.net_g
# del self.onnx_session # del self.onnx_session
self.net_g = self.next_net_g self.net_g = self.next_net_g
self.onnx_session = self.next_onnx_session self.onnx_session = self.next_onnx_session
self.feature_file = self.next_feature_file self.feature_file = self.next_feature_file
self.index_file = self.next_index_file self.index_file = self.next_index_file
self.feature = self.next_feature
self.index = self.next_index
self.settings.tran = self.next_trans self.settings.tran = self.next_trans
self.settings.framework = self.next_framework
self.settings.modelSamplingRate = self.next_samplingRate
self.next_net_g = None self.next_net_g = None
self.next_onnx_session = None self.next_onnx_session = None
print(
"[Voice Changer] Switching model..done",
)
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.onnx_session != None: if key == "onnxExecutionProvider" and self.onnx_session is not None:
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
if hasattr(self, "hubert_onnx"): if hasattr(self, "hubert_onnx"):
self.hubert_onnx.set_providers(providers=[val], provider_options=provider_options) self.hubert_onnx.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
if hasattr(self, "hubert_onnx"): if hasattr(self, "hubert_onnx"):
self.hubert_onnx.set_providers(providers=[val]) self.hubert_onnx.set_providers(providers=[val])
elif key == "onnxExecutionProvider" and self.onnx_session == None: elif key == "onnxExecutionProvider" and self.onnx_session is None:
print("Onnx is not enabled. Please load model.") print("Onnx is not enabled. Please load model.")
return False return False
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = cast(int, val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.onnx_session != None: if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.onnx_session is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
if key == "modelSlotIndex": if key == "modelSlotIndex":
# self.switchModel(int(val)) # self.switchModel(int(val))
self.tmp_slot = int(val) val = int(val) % 1000 # Quick hack for same slot is selected
self.prepareModel(self.tmp_slot) self.prepareModel(val)
self.slot = self.tmp_slot self.currentSlot = -1
setattr(self.settings, key, int(val))
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -224,10 +330,12 @@ class RVC:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.onnx_session != None else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers() if self.onnx_session is not None else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -237,22 +345,35 @@ class RVC:
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
return self.settings.modelSamplingRate return self.settings.modelSamplingRate
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / 32768.0 newData = newData.astype(np.float32) / 32768.0
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 # 過去のデータに連結
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0)
else: else:
self.audio_buffer = newData self.audio_buffer = newData
convertSize = inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize convertSize = (
inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize
)
if convertSize % 128 != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % 128 != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (128 - (convertSize % 128)) convertSize = convertSize + (128 - (convertSize % 128))
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
crop = self.audio_buffer[-1 * (inputSize + crossfadeSize):-1 * (crossfadeSize)] # 出力部分だけ切り出して音量を確認。(solaとの関係性について、現状は無考慮) # 出力部分だけ切り出して音量を確認。(TODO:段階的消音にする)
cropOffset = -1 * (inputSize + crossfadeSize)
cropEnd = -1 * (crossfadeSize)
crop = self.audio_buffer[cropOffset:cropEnd]
rms = np.sqrt(np.square(crop).mean(axis=0)) rms = np.sqrt(np.square(crop).mean(axis=0))
vol = max(rms, self.prevVol * 0.0) vol = max(rms, self.prevVol * 0.0)
self.prevVol = vol self.prevVol = vol
@ -260,7 +381,7 @@ class RVC:
return (self.audio_buffer, convertSize, vol) return (self.audio_buffer, convertSize, vol)
def _onnx_inference(self, data): def _onnx_inference(self, data):
if hasattr(self, "onnx_session") == False or self.onnx_session == None: if hasattr(self, "onnx_session") is False or self.onnx_session is None:
print("[Voice Changer] No onnx session.") print("[Voice Changer] No onnx session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
@ -285,41 +406,54 @@ class RVC:
repeat *= self.settings.rvcQuality # 0 or 3 repeat *= self.settings.rvcQuality # 0 or 3
vc = VC(self.settings.modelSamplingRate, dev, self.is_half, repeat) vc = VC(self.settings.modelSamplingRate, dev, self.is_half, repeat)
sid = 0 sid = 0
times = [0, 0, 0]
f0_up_key = self.settings.tran f0_up_key = self.settings.tran
f0_method = self.settings.f0Detector f0_method = self.settings.f0Detector
file_index = self.index_file if self.index_file != None else ""
file_big_npy = self.feature_file if self.feature_file != None else ""
index_rate = self.settings.indexRatio index_rate = self.settings.indexRatio
if_f0 = 1 if_f0 = 1 if self.settings.modelSlots[self.currentSlot].f0 else 0
f0_file = None
audio_out = vc.pipeline(self.hubert_model, self.onnx_session, sid, audio, times, f0_up_key, f0_method, embChannels = self.settings.modelSlots[self.currentSlot].embChannels
file_index, file_big_npy, index_rate, if_f0, f0_file=f0_file) audio_out = vc.pipeline(
self.hubert_model,
self.onnx_session,
sid,
audio,
f0_up_key,
f0_method,
self.index,
self.feature,
index_rate,
if_f0,
silence_front=self.settings.extraConvertSize
/ self.settings.modelSamplingRate,
embChannels=embChannels,
)
result = audio_out * np.sqrt(vol) result = audio_out * np.sqrt(vol)
return result return result
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if hasattr(self, "net_g") == False or self.net_g == None: if hasattr(self, "net_g") is False or self.net_g is None:
print("[Voice Changer] No pyTorch session.") print(
"[Voice Changer] No pyTorch session.",
hasattr(self, "net_g"),
self.net_g,
)
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
if self.settings.gpu < 0 or (self.gpu_num == 0 and self.mps_enabled == False): if self.settings.gpu < 0 or (self.gpu_num == 0 and self.mps_enabled is False):
dev = torch.device("cpu") dev = torch.device("cpu")
elif self.mps_enabled: elif self.mps_enabled:
dev = torch.device("mps") dev = torch.device("mps")
else: else:
dev = torch.device("cuda", index=self.settings.gpu) dev = torch.device("cuda", index=self.settings.gpu)
# print("device:", dev)
self.hubert_model = self.hubert_model.to(dev) self.hubert_model = self.hubert_model.to(dev)
self.net_g = self.net_g.to(dev) self.net_g = self.net_g.to(dev)
audio = data[0] audio = data[0]
convertSize = data[1] convertSize = data[1]
vol = data[2] vol = data[2]
audio = resampy.resample(audio, self.settings.modelSamplingRate, 16000) audio = resampy.resample(audio, self.settings.modelSamplingRate, 16000)
if vol < self.settings.silentThreshold: if vol < self.settings.silentThreshold:
@ -330,29 +464,44 @@ class RVC:
repeat *= self.settings.rvcQuality # 0 or 3 repeat *= self.settings.rvcQuality # 0 or 3
vc = VC(self.settings.modelSamplingRate, dev, self.is_half, repeat) vc = VC(self.settings.modelSamplingRate, dev, self.is_half, repeat)
sid = 0 sid = 0
times = [0, 0, 0]
f0_up_key = self.settings.tran f0_up_key = self.settings.tran
f0_method = self.settings.f0Detector f0_method = self.settings.f0Detector
file_index = self.index_file if self.index_file != None else ""
file_big_npy = self.feature_file if self.feature_file != None else ""
index_rate = self.settings.indexRatio index_rate = self.settings.indexRatio
if_f0 = 1 if_f0 = 1 if self.settings.modelSlots[self.currentSlot].f0 else 0
f0_file = None
if self.settings.silenceFront == 0: embChannels = self.settings.modelSlots[self.currentSlot].embChannels
audio_out = vc.pipeline(self.hubert_model, self.net_g, sid, audio, times, f0_up_key, f0_method, audio_out = vc.pipeline(
file_index, file_big_npy, index_rate, if_f0, f0_file=f0_file, silence_front=0) self.hubert_model,
else: self.net_g,
audio_out = vc.pipeline(self.hubert_model, self.net_g, sid, audio, times, f0_up_key, f0_method, sid,
file_index, file_big_npy, index_rate, if_f0, f0_file=f0_file, silence_front=self.settings.extraConvertSize / self.settings.modelSamplingRate) audio,
f0_up_key,
f0_method,
self.index,
self.feature,
index_rate,
if_f0,
silence_front=self.settings.extraConvertSize
/ self.settings.modelSamplingRate,
embChannels=embChannels,
)
result = audio_out * np.sqrt(vol) result = audio_out * np.sqrt(vol)
return result return result
def inference(self, data): def inference(self, data):
if self.currentSlot != self.slot: if self.settings.modelSlotIndex < 0:
self.currentSlot = self.slot print(
"[Voice Changer] wait for loading model...",
self.settings.modelSlotIndex,
self.currentSlot,
)
raise NoModeLoadedException("model_common")
if self.currentSlot != self.settings.modelSlotIndex:
print(f"Switch model {self.currentSlot} -> {self.settings.modelSlotIndex}")
self.currentSlot = self.settings.modelSlotIndex
self.switchModel() self.switchModel()
if self.settings.framework == "ONNX": if self.settings.framework == "ONNX":
@ -367,7 +516,7 @@ class RVC:
del self.onnx_session del self.onnx_session
remove_path = os.path.join("RVC") remove_path = os.path.join("RVC")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -377,29 +526,63 @@ class RVC:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except Exception as e:
print(e)
pass pass
def export2onnx(self): def export2onnx(self):
if hasattr(self, "net_g") == False or self.net_g == None: if hasattr(self, "net_g") is False or self.net_g is None:
print("[Voice Changer] export2onnx, No pyTorch session.") print("[Voice Changer] export2onnx, No pyTorch session.")
return {"status": "ng", "path": f""} return {"status": "ng", "path": ""}
pyTorchModelFile = self.settings.modelSlots[self.slot].pyTorchModelFile # inference前にexportできるようにcurrentSlotではなくslot pyTorchModelFile = self.settings.modelSlots[
self.settings.modelSlotIndex
].pyTorchModelFile # inference前にexportできるようにcurrentSlotではなくslot
if pyTorchModelFile == None: if pyTorchModelFile is None:
print("[Voice Changer] export2onnx, No pyTorch filepath.") print("[Voice Changer] export2onnx, No pyTorch filepath.")
return {"status": "ng", "path": f""} return {"status": "ng", "path": ""}
import voice_changer.RVC.export2onnx as onnxExporter import voice_changer.RVC.export2onnx as onnxExporter
output_file = os.path.splitext(os.path.basename(pyTorchModelFile))[0] + ".onnx" output_file = os.path.splitext(os.path.basename(pyTorchModelFile))[0] + ".onnx"
output_file_simple = os.path.splitext(os.path.basename(pyTorchModelFile))[0] + "_simple.onnx" output_file_simple = (
os.path.splitext(os.path.basename(pyTorchModelFile))[0] + "_simple.onnx"
)
output_path = os.path.join(TMP_DIR, output_file) output_path = os.path.join(TMP_DIR, output_file)
output_path_simple = os.path.join(TMP_DIR, output_file_simple) output_path_simple = os.path.join(TMP_DIR, output_file_simple)
print(
"embChannels",
self.settings.modelSlots[self.settings.modelSlotIndex].embChannels,
)
metadata = {
"application": "VC_CLIENT",
"version": "1",
"modelType": self.settings.modelSlots[
self.settings.modelSlotIndex
].modelType,
"samplingRate": self.settings.modelSlots[
self.settings.modelSlotIndex
].samplingRate,
"f0": self.settings.modelSlots[self.settings.modelSlotIndex].f0,
"embChannels": self.settings.modelSlots[
self.settings.modelSlotIndex
].embChannels,
"embedder": self.settings.modelSlots[self.settings.modelSlotIndex].embedder,
}
if torch.cuda.device_count() > 0: if torch.cuda.device_count() > 0:
onnxExporter.export2onnx(pyTorchModelFile, output_path, output_path_simple, True) onnxExporter.export2onnx(
pyTorchModelFile, output_path, output_path_simple, True, metadata
)
else: else:
print("[Voice Changer] Warning!!! onnx export with float32. maybe size is doubled.") print(
onnxExporter.export2onnx(pyTorchModelFile, output_path, output_path_simple, False) "[Voice Changer] Warning!!! onnx export with float32. maybe size is doubled."
)
onnxExporter.export2onnx(
pyTorchModelFile, output_path, output_path_simple, False, metadata
)
return {"status": "ok", "path": f"/tmp/{output_file_simple}", "filename": output_file_simple} return {
"status": "ok",
"path": f"/tmp/{output_file_simple}",
"filename": output_file_simple,
}

View File

@ -0,0 +1,44 @@
from dataclasses import dataclass, field
from voice_changer.RVC.ModelSlot import ModelSlot
@dataclass
class RVCSettings:
gpu: int = 0
dstId: int = 0
f0Detector: str = "pm" # pm or harvest
tran: int = 20
silentThreshold: float = 0.00001
extraConvertSize: int = 1024 * 32
clusterInferRatio: float = 0.1
framework: str = "PyTorch" # PyTorch or ONNX
pyTorchModelFile: str = ""
onnxModelFile: str = ""
configFile: str = ""
modelSlots: list[ModelSlot] = field(
default_factory=lambda: [ModelSlot(), ModelSlot(), ModelSlot()]
)
indexRatio: float = 0
rvcQuality: int = 0
silenceFront: int = 1 # 0:off, 1:on
modelSamplingRate: int = 48000
modelSlotIndex: int = -1
speakers: dict[str, int] = field(default_factory=lambda: {})
# ↓mutableな物だけ列挙
intData = [
"gpu",
"dstId",
"tran",
"extraConvertSize",
"rvcQuality",
"modelSamplingRate",
"silenceFront",
"modelSlotIndex",
]
floatData = ["silentThreshold", "indexRatio"]
strData = ["framework", "f0Detector"]

View File

@ -0,0 +1,2 @@
RVC_MODEL_TYPE_RVC = 0
RVC_MODEL_TYPE_WEBUI = 1

View File

@ -1,15 +1,11 @@
import numpy as np import numpy as np
import parselmouth
# import parselmouth
import torch import torch
import pdb
from time import time as ttime
import torch.nn.functional as F import torch.nn.functional as F
from config import x_pad, x_query, x_center, x_max from config import x_query, x_center, x_max # type:ignore
import scipy.signal as signal import scipy.signal as signal
import pyworld import pyworld
import os
import traceback
import faiss
class VC(object): class VC(object):
@ -18,34 +14,27 @@ class VC(object):
self.window = 160 # 每帧点数 self.window = 160 # 每帧点数
self.t_pad = self.sr * x_pad # 每条前后pad时间 self.t_pad = self.sr * x_pad # 每条前后pad时间
self.t_pad_tgt = tgt_sr * x_pad self.t_pad_tgt = tgt_sr * x_pad
self.t_pad2 = self.t_pad * 2
self.t_query = self.sr * x_query # 查询切点前后查询时间 self.t_query = self.sr * x_query # 查询切点前后查询时间
self.t_center = self.sr * x_center # 查询切点位置 self.t_center = self.sr * x_center # 查询切点位置
self.t_max = self.sr * x_max # 免查询时长阈值 self.t_max = self.sr * x_max # 免查询时长阈值
self.device = device self.device = device
self.is_half = is_half self.is_half = is_half
def get_f0(self, audio, p_len, f0_up_key, f0_method, inp_f0=None, silence_front=0): def get_f0(self, audio, p_len, f0_up_key, f0_method, silence_front=0):
n_frames = int(len(audio) // self.window) + 1 n_frames = int(len(audio) // self.window) + 1
start_frame = int(silence_front * self.sr / self.window) start_frame = int(silence_front * self.sr / self.window)
real_silence_front = start_frame * self.window / self.sr real_silence_front = start_frame * self.window / self.sr
audio = audio[int(np.round(real_silence_front * self.sr)):] silence_front_offset = int(np.round(real_silence_front * self.sr))
audio = audio[silence_front_offset:]
time_step = self.window / self.sr * 1000 # time_step = self.window / self.sr * 1000
f0_min = 50 f0_min = 50
f0_max = 1100 f0_max = 1100
f0_mel_min = 1127 * np.log(1 + f0_min / 700) f0_mel_min = 1127 * np.log(1 + f0_min / 700)
f0_mel_max = 1127 * np.log(1 + f0_max / 700) f0_mel_max = 1127 * np.log(1 + f0_max / 700)
if (f0_method == "pm"): if f0_method == "pm":
f0 = parselmouth.Sound(audio, self.sr).to_pitch_ac( print("not implemented. use harvest")
time_step=time_step / 1000, voicing_threshold=0.6,
pitch_floor=f0_min, pitch_ceiling=f0_max).selected_array['frequency']
pad_size = (p_len - len(f0) + 1) // 2
if (pad_size > 0 or p_len - len(f0) - pad_size > 0):
f0 = np.pad(f0, [[pad_size, p_len - len(f0) - pad_size]], mode='constant')
elif (f0_method == "harvest"):
f0, t = pyworld.harvest( f0, t = pyworld.harvest(
audio.astype(np.double), audio.astype(np.double),
fs=self.sr, fs=self.sr,
@ -55,36 +44,98 @@ class VC(object):
f0 = pyworld.stonemask(audio.astype(np.double), f0, t, self.sr) f0 = pyworld.stonemask(audio.astype(np.double), f0, t, self.sr)
f0 = signal.medfilt(f0, 3) f0 = signal.medfilt(f0, 3)
f0 = np.pad(f0.astype('float'), (start_frame, n_frames - len(f0) - start_frame)) f0 = np.pad(
f0.astype("float"), (start_frame, n_frames - len(f0) - start_frame)
)
else: else:
print("[Voice Changer] invalid f0 detector, use pm.", f0_method) f0, t = pyworld.harvest(
f0 = parselmouth.Sound(audio, self.sr).to_pitch_ac( audio.astype(np.double),
time_step=time_step / 1000, voicing_threshold=0.6, fs=self.sr,
pitch_floor=f0_min, pitch_ceiling=f0_max).selected_array['frequency'] f0_ceil=f0_max,
pad_size = (p_len - len(f0) + 1) // 2 frame_period=10,
if (pad_size > 0 or p_len - len(f0) - pad_size > 0): )
f0 = np.pad(f0, [[pad_size, p_len - len(f0) - pad_size]], mode='constant') f0 = pyworld.stonemask(audio.astype(np.double), f0, t, self.sr)
f0 = signal.medfilt(f0, 3)
f0 = np.pad(
f0.astype("float"), (start_frame, n_frames - len(f0) - start_frame)
)
f0 *= pow(2, f0_up_key / 12) f0 *= pow(2, f0_up_key / 12)
# with open("test.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
tf0 = self.sr // self.window # 每秒f0点数
if (inp_f0 is not None):
delta_t = np.round((inp_f0[:, 0].max() - inp_f0[:, 0].min()) * tf0 + 1).astype("int16")
replace_f0 = np.interp(list(range(delta_t)), inp_f0[:, 0] * 100, inp_f0[:, 1])
shape = f0[x_pad * tf0:x_pad * tf0 + len(replace_f0)].shape[0]
f0[x_pad * tf0:x_pad * tf0 + len(replace_f0)] = replace_f0[:shape]
# with open("test_opt.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
f0bak = f0.copy() f0bak = f0.copy()
f0_mel = 1127 * np.log(1 + f0 / 700) f0_mel = 1127 * np.log(1 + f0 / 700)
f0_mel[f0_mel > 0] = (f0_mel[f0_mel > 0] - f0_mel_min) * 254 / (f0_mel_max - f0_mel_min) + 1 f0_mel[f0_mel > 0] = (f0_mel[f0_mel > 0] - f0_mel_min) * 254 / (
f0_mel_max - f0_mel_min
) + 1
f0_mel[f0_mel <= 1] = 1 f0_mel[f0_mel <= 1] = 1
f0_mel[f0_mel > 255] = 255 f0_mel[f0_mel > 255] = 255
f0_coarse = np.rint(f0_mel).astype(np.int) f0_coarse = np.rint(f0_mel).astype(np.int)
return f0_coarse, f0bak # 1-0
def vc(self, model, net_g, sid, audio0, pitch, pitchf, times, index, big_npy, index_rate): # ,file_index,file_big_npy # Volume Extract
feats = torch.from_numpy(audio0) # volume = self.extractVolume(audio, 512)
if (self.is_half == True): # volume = np.pad(
# volume.astype("float"), (start_frame, n_frames - len(volume) - start_frame)
# )
# return f0_coarse, f0bak, volume # 1-0
return f0_coarse, f0bak
# def extractVolume(self, audio, hopsize):
# n_frames = int(len(audio) // hopsize) + 1
# audio2 = audio**2
# audio2 = np.pad(
# audio2,
# (int(hopsize // 2), int((hopsize + 1) // 2)),
# mode="reflect",
# )
# volume = np.array(
# [
# np.mean(audio2[int(n * hopsize) : int((n + 1) * hopsize)]) # noqa:E203
# for n in range(n_frames)
# ]
# )
# volume = np.sqrt(volume)
# return volume
def pipeline(
self,
embedder,
model,
sid,
audio,
f0_up_key,
f0_method,
index,
big_npy,
index_rate,
if_f0,
silence_front=0,
embChannels=256,
):
audio_pad = np.pad(audio, (self.t_pad, self.t_pad), mode="reflect")
p_len = audio_pad.shape[0] // self.window
sid = torch.tensor(sid, device=self.device).unsqueeze(0).long()
# ピッチ検出
pitch, pitchf = None, None
if if_f0 == 1:
pitch, pitchf = self.get_f0(
audio_pad,
p_len,
f0_up_key,
f0_method,
silence_front=silence_front,
)
pitch = pitch[:p_len]
pitchf = pitchf[:p_len]
pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
pitchf = torch.tensor(
pitchf, device=self.device, dtype=torch.float
).unsqueeze(0)
# tensor
feats = torch.from_numpy(audio_pad)
if self.is_half is True:
feats = feats.half() feats = feats.half()
else: else:
feats = feats.float() feats = feats.float()
@ -92,86 +143,95 @@ class VC(object):
feats = feats.mean(-1) feats = feats.mean(-1)
assert feats.dim() == 1, feats.dim() assert feats.dim() == 1, feats.dim()
feats = feats.view(1, -1) feats = feats.view(1, -1)
padding_mask = torch.BoolTensor(feats.shape).to(self.device).fill_(False)
# embedding
padding_mask = torch.BoolTensor(feats.shape).to(self.device).fill_(False)
if embChannels == 256:
inputs = { inputs = {
"source": feats.to(self.device), "source": feats.to(self.device),
"padding_mask": padding_mask, "padding_mask": padding_mask,
"output_layer": 9, # layer 9 "output_layer": 9, # layer 9
} }
t0 = ttime() else:
with torch.no_grad(): inputs = {
logits = model.extract_features(**inputs) "source": feats.to(self.device),
feats = model.final_proj(logits[0]) "padding_mask": padding_mask,
}
if (isinstance(index, type(None)) == False and isinstance(big_npy, type(None)) == False and index_rate != 0): with torch.no_grad():
logits = embedder.extract_features(**inputs)
if embChannels == 256:
feats = embedder.final_proj(logits[0])
else:
feats = logits[0]
# Index - feature抽出
if (
isinstance(index, type(None)) is False
and isinstance(big_npy, type(None)) is False
and index_rate != 0
):
npy = feats[0].cpu().numpy() npy = feats[0].cpu().numpy()
if (self.is_half == True): if self.is_half is True:
npy = npy.astype("float32") npy = npy.astype("float32")
D, I = index.search(npy, 1) D, I = index.search(npy, 1)
npy = big_npy[I.squeeze()] npy = big_npy[I.squeeze()]
if (self.is_half == True): if self.is_half is True:
npy = npy.astype("float16") npy = npy.astype("float16")
feats = torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats
feats = (
torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate
+ (1 - index_rate) * feats
)
#
feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1) feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1)
t1 = ttime() # ピッチ抽出
p_len = audio0.shape[0] // self.window p_len = audio_pad.shape[0] // self.window
if (feats.shape[1] < p_len): if feats.shape[1] < p_len:
p_len = feats.shape[1] p_len = feats.shape[1]
if (pitch != None and pitchf != None): if pitch is not None and pitchf is not None:
pitch = pitch[:, :p_len] pitch = pitch[:, :p_len]
pitchf = pitchf[:, :p_len] pitchf = pitchf[:, :p_len]
p_len = torch.tensor([p_len], device=self.device).long() p_len = torch.tensor([p_len], device=self.device).long()
# 推論実行
with torch.no_grad(): with torch.no_grad():
audio1 = (net_g.infer(feats, p_len, pitch, pitchf, sid)[0][0, 0] * 32768).data.cpu().float().numpy().astype(np.int16) if pitch is not None:
audio1 = (
(model.infer(feats, p_len, pitch, pitchf, sid)[0][0, 0] * 32768)
.data.cpu()
.float()
.numpy()
.astype(np.int16)
)
else:
if hasattr(model, "infer_pitchless"):
audio1 = (
(model.infer_pitchless(feats, p_len, sid)[0][0, 0] * 32768)
.data.cpu()
.float()
.numpy()
.astype(np.int16)
)
else:
audio1 = (
(model.infer(feats, p_len, sid)[0][0, 0] * 32768)
.data.cpu()
.float()
.numpy()
.astype(np.int16)
)
del feats, p_len, padding_mask del feats, p_len, padding_mask
torch.cuda.empty_cache() torch.cuda.empty_cache()
t2 = ttime()
times[0] += (t1 - t0)
times[2] += (t2 - t1)
return audio1
def pipeline(self, model, net_g, sid, audio, times, f0_up_key, f0_method, file_index, file_big_npy, index_rate, if_f0, f0_file=None, silence_front=0): if self.t_pad_tgt != 0:
if (file_big_npy != "" and file_index != "" and os.path.exists(file_big_npy) == True and os.path.exists(file_index) == True and index_rate != 0): offset = self.t_pad_tgt
try: end = -1 * self.t_pad_tgt
index = faiss.read_index(file_index) audio1 = audio1[offset:end]
big_npy = np.load(file_big_npy)
except:
traceback.print_exc()
index = big_npy = None
else:
index = big_npy = None
audio_opt = []
t = None
t1 = ttime()
audio_pad = np.pad(audio, (self.t_pad, self.t_pad), mode='reflect')
p_len = audio_pad.shape[0] // self.window
inp_f0 = None
sid = torch.tensor(sid, device=self.device).unsqueeze(0).long()
pitch, pitchf = None, None
if (if_f0 == 1):
pitch, pitchf = self.get_f0(audio_pad, p_len, f0_up_key, f0_method, inp_f0, silence_front=silence_front)
pitch = pitch[:p_len]
pitchf = pitchf[:p_len]
pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
pitchf = torch.tensor(pitchf, device=self.device, dtype=torch.float).unsqueeze(0)
t2 = ttime()
times[1] += (t2 - t1)
if self.t_pad_tgt == 0:
audio_opt.append(self.vc(model, net_g, sid, audio_pad[t:], pitch[:, t // self.window:]if t is not None else pitch, pitchf[:,
t // self.window:]if t is not None else pitchf, times, index, big_npy, index_rate))
else:
audio_opt.append(self.vc(model, net_g, sid, audio_pad[t:], pitch[:, t // self.window:]if t is not None else pitch, pitchf[:,
t // self.window:]if t is not None else pitchf, times, index, big_npy, index_rate)[self.t_pad_tgt:-self.t_pad_tgt])
audio_opt = np.concatenate(audio_opt)
del pitch, pitchf, sid del pitch, pitchf, sid
torch.cuda.empty_cache() torch.cuda.empty_cache()
return audio_opt return audio1

View File

@ -1,137 +1,78 @@
import sys import json
import os
import argparse
from distutils.util import strtobool
import torch import torch
from torch import nn
from onnxsim import simplify from onnxsim import simplify
import onnx import onnx
from infer_pack.models import TextEncoder256, GeneratorNSF, PosteriorEncoder, ResidualCouplingBlock from voice_changer.RVC.onnx.SynthesizerTrnMs256NSFsid_ONNX import (
SynthesizerTrnMs256NSFsid_ONNX,
)
from voice_changer.RVC.onnx.SynthesizerTrnMs256NSFsid_nono_ONNX import (
SynthesizerTrnMs256NSFsid_nono_ONNX,
)
from voice_changer.RVC.onnx.SynthesizerTrnMsNSFsidNono_webui_ONNX import (
SynthesizerTrnMsNSFsidNono_webui_ONNX,
)
from voice_changer.RVC.onnx.SynthesizerTrnMsNSFsid_webui_ONNX import (
SynthesizerTrnMsNSFsid_webui_ONNX,
)
from .const import RVC_MODEL_TYPE_RVC, RVC_MODEL_TYPE_WEBUI
class SynthesizerTrnMs256NSFsid_ONNX(nn.Module): def export2onnx(input_model, output_model, output_model_simple, is_half, metadata):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
sr,
**kwargs
):
super().__init__()
if (type(sr) == type("strr")):
sr = sr2sr[sr]
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder256(
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
)
self.dec = GeneratorNSF(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels, sr=sr, is_half=kwargs["is_half"]
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def forward(self, phone, phone_lengths, pitch, nsff0, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, pitch, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], nsff0, g=g)
return o, x_mask, (z, z_p, m_p, logs_p)
def export2onnx(input_model, output_model, output_model_simple, is_half):
cpt = torch.load(input_model, map_location="cpu") cpt = torch.load(input_model, map_location="cpu")
if is_half: if is_half:
dev = torch.device("cuda", index=0) dev = torch.device("cuda", index=0)
else: else:
dev = torch.device("cpu") dev = torch.device("cpu")
if metadata["f0"] is True and metadata["modelType"] == RVC_MODEL_TYPE_RVC:
net_g_onnx = SynthesizerTrnMs256NSFsid_ONNX(*cpt["config"], is_half=is_half) net_g_onnx = SynthesizerTrnMs256NSFsid_ONNX(*cpt["config"], is_half=is_half)
elif metadata["f0"] is True and metadata["modelType"] == RVC_MODEL_TYPE_WEBUI:
net_g_onnx = SynthesizerTrnMsNSFsid_webui_ONNX(**cpt["params"], is_half=is_half)
elif metadata["f0"] is False and metadata["modelType"] == RVC_MODEL_TYPE_RVC:
net_g_onnx = SynthesizerTrnMs256NSFsid_nono_ONNX(*cpt["config"])
elif metadata["f0"] is False and metadata["modelType"] == RVC_MODEL_TYPE_WEBUI:
net_g_onnx = SynthesizerTrnMsNSFsidNono_webui_ONNX(**cpt["params"])
net_g_onnx.eval().to(dev) net_g_onnx.eval().to(dev)
net_g_onnx.load_state_dict(cpt["weight"], strict=False) net_g_onnx.load_state_dict(cpt["weight"], strict=False)
if is_half: if is_half:
net_g_onnx = net_g_onnx.half() net_g_onnx = net_g_onnx.half()
if is_half: if is_half:
feats = torch.HalfTensor(1, 2192, 256).to(dev) feats = torch.HalfTensor(1, 2192, metadata["embChannels"]).to(dev)
else: else:
feats = torch.FloatTensor(1, 2192, 256).to(dev) feats = torch.FloatTensor(1, 2192, metadata["embChannels"]).to(dev)
p_len = torch.LongTensor([2192]).to(dev) p_len = torch.LongTensor([2192]).to(dev)
pitch = torch.zeros(1, 2192, dtype=torch.int64).to(dev)
pitchf = torch.FloatTensor(1, 2192).to(dev)
sid = torch.LongTensor([0]).to(dev) sid = torch.LongTensor([0]).to(dev)
if metadata["f0"] is True:
pitch = torch.zeros(1, 2192, dtype=torch.int64).to(dev)
pitchf = torch.FloatTensor(1, 2192).to(dev)
input_names = ["feats", "p_len", "pitch", "pitchf", "sid"] input_names = ["feats", "p_len", "pitch", "pitchf", "sid"]
output_names = ["audio", ] inputs = (
torch.onnx.export(net_g_onnx,
(
feats, feats,
p_len, p_len,
pitch, pitch,
pitchf, pitchf,
sid, sid,
), )
else:
input_names = ["feats", "p_len", "sid"]
inputs = (
feats,
p_len,
sid,
)
output_names = [
"audio",
]
torch.onnx.export(
net_g_onnx,
inputs,
output_model, output_model,
dynamic_axes={ dynamic_axes={
"feats": [1], "feats": [1],
@ -142,8 +83,12 @@ def export2onnx(input_model, output_model, output_model_simple, is_half):
opset_version=17, opset_version=17,
verbose=False, verbose=False,
input_names=input_names, input_names=input_names,
output_names=output_names) output_names=output_names,
)
model_onnx2 = onnx.load(output_model) model_onnx2 = onnx.load(output_model)
model_simp, check = simplify(model_onnx2) model_simp, check = simplify(model_onnx2)
meta = model_simp.metadata_props.add()
meta.key = "metadata"
meta.value = json.dumps(metadata)
onnx.save(model_simp, output_model_simple) onnx.save(model_simp, output_model_simple)

View File

@ -0,0 +1,277 @@
import math
import torch
from torch import nn
from infer_pack.models import ( # type:ignore
GeneratorNSF,
PosteriorEncoder,
ResidualCouplingBlock,
Generator,
)
from infer_pack import commons, attentions # type:ignore
class TextEncoder(nn.Module):
def __init__(
self,
out_channels,
hidden_channels,
filter_channels,
emb_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
f0=True,
):
super().__init__()
self.out_channels = out_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.emb_channels = emb_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.emb_phone = nn.Linear(emb_channels, hidden_channels)
self.lrelu = nn.LeakyReLU(0.1, inplace=True)
if f0 is True:
self.emb_pitch = nn.Embedding(256, hidden_channels) # pitch 256
self.encoder = attentions.Encoder(
hidden_channels, filter_channels, n_heads, n_layers, kernel_size, p_dropout
)
self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)
def forward(self, phone, pitch, lengths):
if pitch is None:
x = self.emb_phone(phone)
else:
x = self.emb_phone(phone) + self.emb_pitch(pitch)
x = x * math.sqrt(self.hidden_channels) # [b, t, h]
x = self.lrelu(x)
x = torch.transpose(x, 1, -1) # [b, h, t]
x_mask = torch.unsqueeze(commons.sequence_mask(lengths, x.size(2)), 1).to(
x.dtype
)
x = self.encoder(x * x_mask, x_mask)
stats = self.proj(x) * x_mask
m, logs = torch.split(stats, self.out_channels, dim=1)
return m, logs, x_mask
class SynthesizerTrnMsNSFsid(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
emb_channels,
sr,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
self.emb_channels = emb_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder(
inter_channels,
hidden_channels,
filter_channels,
emb_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
)
self.dec = GeneratorNSF(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
sr=sr,
is_half=kwargs["is_half"],
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def remove_weight_norm(self):
self.dec.remove_weight_norm()
self.flow.remove_weight_norm()
self.enc_q.remove_weight_norm()
def forward(
self, phone, phone_lengths, pitch, pitchf, y, y_lengths, ds
): # 这里ds是id[bs,1]
# print(1,pitch.shape)#[bs,t]
g = self.emb_g(ds).unsqueeze(-1) # [b, 256, 1]##1是t广播的
m_p, logs_p, x_mask = self.enc_p(phone, pitch, phone_lengths)
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
z_p = self.flow(z, y_mask, g=g)
z_slice, ids_slice = commons.rand_slice_segments(
z, y_lengths, self.segment_size
)
# print(-1,pitchf.shape,ids_slice,self.segment_size,self.hop_length,self.segment_size//self.hop_length)
pitchf = commons.slice_segments2(pitchf, ids_slice, self.segment_size)
# print(-2,pitchf.shape,z_slice.shape)
o = self.dec(z_slice, pitchf, g=g)
return o, ids_slice, x_mask, y_mask, (z, z_p, m_p, logs_p, m_q, logs_q)
def infer(self, phone, phone_lengths, pitch, nsff0, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, pitch, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], nsff0, g=g)
return o, x_mask, (z, z_p, m_p, logs_p)
class SynthesizerTrnMsNSFsidNono(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
emb_channels,
sr=None,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
self.emb_channels = emb_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder(
inter_channels,
hidden_channels,
filter_channels,
emb_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
f0=False,
)
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def remove_weight_norm(self):
self.dec.remove_weight_norm()
self.flow.remove_weight_norm()
self.enc_q.remove_weight_norm()
def forward(self, phone, phone_lengths, y, y_lengths, ds): # 这里ds是id[bs,1]
g = self.emb_g(ds).unsqueeze(-1) # [b, 256, 1]##1是t广播的
m_p, logs_p, x_mask = self.enc_p(phone, None, phone_lengths)
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
z_p = self.flow(z, y_mask, g=g)
z_slice, ids_slice = commons.rand_slice_segments(
z, y_lengths, self.segment_size
)
o = self.dec(z_slice, g=g)
return o, ids_slice, x_mask, y_mask, (z, z_p, m_p, logs_p, m_q, logs_q)
def infer(self, phone, phone_lengths, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, None, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], g=g)
return o, x_mask, (z, z_p, m_p, logs_p)

View File

@ -0,0 +1,95 @@
from torch import nn
from infer_pack.models import ( # type:ignore
TextEncoder256,
GeneratorNSF,
PosteriorEncoder,
ResidualCouplingBlock,
)
import torch
class SynthesizerTrnMs256NSFsid_ONNX(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
sr,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder256(
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
)
self.dec = GeneratorNSF(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
sr=sr,
is_half=kwargs["is_half"],
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def forward(self, phone, phone_lengths, pitch, nsff0, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, pitch, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], nsff0, g=g)
return o, x_mask, (z, z_p, m_p, logs_p)

View File

@ -0,0 +1,94 @@
from torch import nn
from infer_pack.models import ( # type:ignore
TextEncoder256,
PosteriorEncoder,
ResidualCouplingBlock,
Generator,
)
import torch
class SynthesizerTrnMs256NSFsid_nono_ONNX(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
sr=None,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder256(
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
f0=False,
)
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def forward(self, phone, phone_lengths, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, None, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], g=g)
return o, x_mask, (z, z_p, m_p, logs_p)

View File

@ -0,0 +1,97 @@
from torch import nn
from infer_pack.models import ( # type:ignore
PosteriorEncoder,
ResidualCouplingBlock,
Generator,
)
from voice_changer.RVC.models import TextEncoder
import torch
class SynthesizerTrnMsNSFsidNono_webui_ONNX(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
emb_channels,
sr=None,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
self.emb_channels = emb_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder(
inter_channels,
hidden_channels,
filter_channels,
emb_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
f0=False,
)
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def forward(self, phone, phone_lengths, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, None, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], g=g)
return o, x_mask, (z, z_p, m_p, logs_p)

View File

@ -0,0 +1,98 @@
from torch import nn
from infer_pack.models import ( # type:ignore
GeneratorNSF,
PosteriorEncoder,
ResidualCouplingBlock,
)
from voice_changer.RVC.models import TextEncoder
import torch
class SynthesizerTrnMsNSFsid_webui_ONNX(nn.Module):
def __init__(
self,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
spk_embed_dim,
gin_channels,
emb_channels,
sr,
**kwargs
):
super().__init__()
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.gin_channels = gin_channels
self.emb_channels = emb_channels
# self.hop_length = hop_length#
self.spk_embed_dim = spk_embed_dim
self.enc_p = TextEncoder(
inter_channels,
hidden_channels,
filter_channels,
emb_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
)
self.dec = GeneratorNSF(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
sr=sr,
is_half=kwargs["is_half"],
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
self.flow = ResidualCouplingBlock(
inter_channels, hidden_channels, 5, 1, 3, gin_channels=gin_channels
)
self.emb_g = nn.Embedding(self.spk_embed_dim, gin_channels)
print("gin_channels:", gin_channels, "self.spk_embed_dim:", self.spk_embed_dim)
def forward(self, phone, phone_lengths, pitch, nsff0, sid, max_len=None):
g = self.emb_g(sid).unsqueeze(-1)
m_p, logs_p, x_mask = self.enc_p(phone, pitch, phone_lengths)
z_p = (m_p + torch.exp(logs_p) * torch.randn_like(m_p) * 0.66666) * x_mask
z = self.flow(z_p, x_mask, g=g, reverse=True)
o = self.dec((z * x_mask)[:, :, :max_len], nsff0, g=g)
return o, x_mask, (z, z_p, m_p, logs_p)

View File

@ -1,6 +1,11 @@
import sys import sys
import os import os
if sys.platform.startswith('darwin'):
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -12,17 +17,16 @@ else:
import io import io
from dataclasses import dataclass, asdict, field from dataclasses import dataclass, asdict, field
from functools import reduce
import numpy as np import numpy as np
import torch import torch
import onnxruntime import onnxruntime
# onnxruntime.set_default_logger_severity(3) # onnxruntime.set_default_logger_severity(3)
from const import HUBERT_ONNX_MODEL_PATH
import pyworld as pw import pyworld as pw
from models import SynthesizerTrn from models import SynthesizerTrn # type:ignore
import cluster import cluster # type:ignore
import utils import utils
from fairseq import checkpoint_utils from fairseq import checkpoint_utils
import librosa import librosa
@ -30,11 +34,16 @@ import librosa
from Exceptions import NoModeLoadedException from Exceptions import NoModeLoadedException
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
@dataclass @dataclass
class SoVitsSvc40Settings(): class SoVitsSvc40Settings:
gpu: int = 0 gpu: int = 0
dstId: int = 0 dstId: int = 0
@ -51,9 +60,7 @@ class SoVitsSvc40Settings():
onnxModelFile: str = "" onnxModelFile: str = ""
configFile: str = "" configFile: str = ""
speakers: dict[str, int] = field( speakers: dict[str, int] = field(default_factory=lambda: {})
default_factory=lambda: {}
)
# ↓mutableな物だけ列挙 # ↓mutableな物だけ列挙
intData = ["gpu", "dstId", "tran", "predictF0", "extraConvertSize"] intData = ["gpu", "dstId", "tran", "predictF0", "extraConvertSize"]
@ -62,7 +69,9 @@ class SoVitsSvc40Settings():
class SoVitsSvc40: class SoVitsSvc40:
def __init__(self, params): audio_buffer: AudioInOut | None = None
def __init__(self, params: VoiceChangerParams):
self.settings = SoVitsSvc40Settings() self.settings = SoVitsSvc40Settings()
self.net_g = None self.net_g = None
self.onnx_session = None self.onnx_session = None
@ -74,32 +83,30 @@ class SoVitsSvc40:
print("so-vits-svc40 initialization:", params) print("so-vits-svc40 initialization:", params)
# def loadModel(self, config: str, pyTorch_model_file: str = None, onnx_model_file: str = None, clusterTorchModel: str = None): # def loadModel(self, config: str, pyTorch_model_file: str = None, onnx_model_file: str = None, clusterTorchModel: str = None):
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
self.settings.configFile = props["files"]["configFilename"] self.settings.configFile = props.files.configFilename
self.hps = utils.get_hparams_from_file(self.settings.configFile) self.hps = utils.get_hparams_from_file(self.settings.configFile)
self.settings.speakers = self.hps.spk self.settings.speakers = self.hps.spk
self.settings.pyTorchModelFile = props["files"]["pyTorchModelFilename"] self.settings.pyTorchModelFile = props.files.pyTorchModelFilename
self.settings.onnxModelFile = props["files"]["onnxModelFilename"] self.settings.onnxModelFile = props.files.onnxModelFilename
clusterTorchModel = props["files"]["clusterTorchModelFilename"] clusterTorchModel = props.files.clusterTorchModelFilename
content_vec_path = self.params["content_vec_500"] content_vec_path = self.params.content_vec_500
content_vec_onnx_path = self.params["content_vec_500_onnx"] content_vec_onnx_path = self.params.content_vec_500_onnx
content_vec_onnx_on = self.params["content_vec_500_onnx_on"] content_vec_onnx_on = self.params.content_vec_500_onnx_on
hubert_base_path = self.params["hubert_base"] hubert_base_path = self.params.hubert_base
# hubert model # hubert model
try: try:
if os.path.exists(content_vec_path) is False:
if os.path.exists(content_vec_path) == False:
content_vec_path = hubert_base_path content_vec_path = hubert_base_path
if content_vec_onnx_on == True: if content_vec_onnx_on is True:
ort_options = onnxruntime.SessionOptions() ort_options = onnxruntime.SessionOptions()
ort_options.intra_op_num_threads = 8 ort_options.intra_op_num_threads = 8
self.content_vec_onnx = onnxruntime.InferenceSession( self.content_vec_onnx = onnxruntime.InferenceSession(
content_vec_onnx_path, content_vec_onnx_path, providers=providers
providers=providers
) )
else: else:
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task( models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
@ -114,7 +121,7 @@ class SoVitsSvc40:
# cluster # cluster
try: try:
if clusterTorchModel != None and os.path.exists(clusterTorchModel): if clusterTorchModel is not None and os.path.exists(clusterTorchModel):
self.cluster_model = cluster.get_cluster_model(clusterTorchModel) self.cluster_model = cluster.get_cluster_model(clusterTorchModel)
else: else:
self.cluster_model = None self.cluster_model = None
@ -122,22 +129,22 @@ class SoVitsSvc40:
print("EXCEPTION during loading cluster model ", e) print("EXCEPTION during loading cluster model ", e)
# PyTorchモデル生成 # PyTorchモデル生成
if self.settings.pyTorchModelFile != None: if self.settings.pyTorchModelFile is not None:
self.net_g = SynthesizerTrn( net_g = SynthesizerTrn(
self.hps.data.filter_length // 2 + 1, self.hps.data.filter_length // 2 + 1,
self.hps.train.segment_size // self.hps.data.hop_length, self.hps.train.segment_size // self.hps.data.hop_length,
**self.hps.model **self.hps.model,
) )
self.net_g.eval() net_g.eval()
self.net_g = net_g
utils.load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None) utils.load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None)
# ONNXモデル生成 # ONNXモデル生成
if self.settings.onnxModelFile != None: if self.settings.onnxModelFile is not None:
ort_options = onnxruntime.SessionOptions() ort_options = onnxruntime.SessionOptions()
ort_options.intra_op_num_threads = 8 ort_options.intra_op_num_threads = 8
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.settings.onnxModelFile, self.settings.onnxModelFile, providers=providers
providers=providers
) )
# input_info = self.onnx_session.get_inputs() # input_info = self.onnx_session.get_inputs()
# for i in input_info: # for i in input_info:
@ -147,30 +154,43 @@ class SoVitsSvc40:
# print("output", i) # print("output", i)
return self.get_info() return self.get_info()
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.onnx_session != None: if key == "onnxExecutionProvider" and self.onnx_session is not None:
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
if hasattr(self, "content_vec_onnx"): if hasattr(self, "content_vec_onnx"):
self.content_vec_onnx.set_providers(providers=[val], provider_options=provider_options) self.content_vec_onnx.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
if hasattr(self, "content_vec_onnx"): if hasattr(self, "content_vec_onnx"):
self.content_vec_onnx.set_providers(providers=[val]) self.content_vec_onnx.set_providers(providers=[val])
elif key == "onnxExecutionProvider" and self.onnx_session == None: elif key == "onnxExecutionProvider" and self.onnx_session is None:
print("Onnx is not enabled. Please load model.") print("Onnx is not enabled. Please load model.")
return False return False
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = int(val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.onnx_session != None: setattr(self.settings, key, val)
if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.onnx_session is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -183,10 +203,12 @@ class SoVitsSvc40:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.onnx_session != None else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers() if self.onnx_session is not None else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -194,22 +216,30 @@ class SoVitsSvc40:
return data return data
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
if hasattr(self, "hps") == False: if hasattr(self, "hps") is False:
raise NoModeLoadedException("config") raise NoModeLoadedException("config")
return self.hps.data.sampling_rate return self.hps.data.sampling_rate
def get_unit_f0(self, audio_buffer, tran): def get_unit_f0(self, audio_buffer, tran):
wav_44k = audio_buffer wav_44k = audio_buffer
# f0 = utils.compute_f0_parselmouth(wav, sampling_rate=self.target_sample, hop_length=self.hop_size)
# f0 = utils.compute_f0_dio(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length)
if self.settings.f0Detector == "dio": if self.settings.f0Detector == "dio":
f0 = compute_f0_dio(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length) f0 = compute_f0_dio(
wav_44k,
sampling_rate=self.hps.data.sampling_rate,
hop_length=self.hps.data.hop_length,
)
else: else:
f0 = compute_f0_harvest(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length) f0 = compute_f0_harvest(
wav_44k,
sampling_rate=self.hps.data.sampling_rate,
hop_length=self.hps.data.hop_length,
)
if wav_44k.shape[0] % self.hps.data.hop_length != 0: if wav_44k.shape[0] % self.hps.data.hop_length != 0:
print(f" !!! !!! !!! wav size not multiple of hopsize: {wav_44k.shape[0] / self.hps.data.hop_length}") print(
f" !!! !!! !!! wav size not multiple of hopsize: {wav_44k.shape[0] / self.hps.data.hop_length}"
)
f0, uv = utils.interpolate_f0(f0) f0, uv = utils.interpolate_f0(f0)
f0 = torch.FloatTensor(f0) f0 = torch.FloatTensor(f0)
@ -218,11 +248,14 @@ class SoVitsSvc40:
f0 = f0.unsqueeze(0) f0 = f0.unsqueeze(0)
uv = uv.unsqueeze(0) uv = uv.unsqueeze(0)
# wav16k = librosa.resample(audio_buffer, orig_sr=24000, target_sr=16000) wav16k_numpy = librosa.resample(
wav16k_numpy = librosa.resample(audio_buffer, orig_sr=self.hps.data.sampling_rate, target_sr=16000) audio_buffer, orig_sr=self.hps.data.sampling_rate, target_sr=16000
)
wav16k_tensor = torch.from_numpy(wav16k_numpy) wav16k_tensor = torch.from_numpy(wav16k_numpy)
if (self.settings.gpu < 0 or self.gpu_num == 0) or self.settings.framework == "ONNX": if (
self.settings.gpu < 0 or self.gpu_num == 0
) or self.settings.framework == "ONNX":
dev = torch.device("cpu") dev = torch.device("cpu")
else: else:
dev = torch.device("cuda", index=self.settings.gpu) dev = torch.device("cuda", index=self.settings.gpu)
@ -232,53 +265,87 @@ class SoVitsSvc40:
["units"], ["units"],
{ {
"audio": wav16k_numpy.reshape(1, -1), "audio": wav16k_numpy.reshape(1, -1),
}) },
)
c = torch.from_numpy(np.array(c)).squeeze(0).transpose(1, 2) c = torch.from_numpy(np.array(c)).squeeze(0).transpose(1, 2)
# print("onnx hubert:", self.content_vec_onnx.get_providers()) # print("onnx hubert:", self.content_vec_onnx.get_providers())
else: else:
if self.hps.model.ssl_dim == 768: if self.hps.model.ssl_dim == 768:
self.hubert_model = self.hubert_model.to(dev) self.hubert_model = self.hubert_model.to(dev)
wav16k_tensor = wav16k_tensor.to(dev) wav16k_tensor = wav16k_tensor.to(dev)
c = get_hubert_content_layer9(self.hubert_model, wav_16k_tensor=wav16k_tensor) c = get_hubert_content_layer9(
self.hubert_model, wav_16k_tensor=wav16k_tensor
)
else: else:
self.hubert_model = self.hubert_model.to(dev) self.hubert_model = self.hubert_model.to(dev)
wav16k_tensor = wav16k_tensor.to(dev) wav16k_tensor = wav16k_tensor.to(dev)
c = utils.get_hubert_content(self.hubert_model, wav_16k_tensor=wav16k_tensor) c = utils.get_hubert_content(
self.hubert_model, wav_16k_tensor=wav16k_tensor
)
uv = uv.to(dev) uv = uv.to(dev)
f0 = f0.to(dev) f0 = f0.to(dev)
c = utils.repeat_expand_2d(c.squeeze(0), f0.shape[1]) c = utils.repeat_expand_2d(c.squeeze(0), f0.shape[1])
if self.settings.clusterInferRatio != 0 and hasattr(self, "cluster_model") and self.cluster_model != None: if (
speaker = [key for key, value in self.settings.speakers.items() if value == self.settings.dstId] self.settings.clusterInferRatio != 0
and hasattr(self, "cluster_model")
and self.cluster_model is not None
):
speaker = [
key
for key, value in self.settings.speakers.items()
if value == self.settings.dstId
]
if len(speaker) != 1: if len(speaker) != 1:
print("not only one speaker found.", speaker) pass
# print("not only one speaker found.", speaker)
else: else:
cluster_c = cluster.get_cluster_center_result(self.cluster_model, c.cpu().numpy().T, speaker[0]).T cluster_c = cluster.get_cluster_center_result(
self.cluster_model, c.cpu().numpy().T, speaker[0]
).T
cluster_c = torch.FloatTensor(cluster_c).to(dev) cluster_c = torch.FloatTensor(cluster_c).to(dev)
c = c.to(dev) c = c.to(dev)
c = self.settings.clusterInferRatio * cluster_c + (1 - self.settings.clusterInferRatio) * c c = (
self.settings.clusterInferRatio * cluster_c
+ (1 - self.settings.clusterInferRatio) * c
)
c = c.unsqueeze(0) c = c.unsqueeze(0)
return c, f0, uv return c, f0, uv
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / self.hps.data.max_wav_value newData = newData.astype(np.float32) / self.hps.data.max_wav_value
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 self.audio_buffer = np.concatenate(
[self.audio_buffer, newData], 0
) # 過去のデータに連結
else: else:
self.audio_buffer = newData self.audio_buffer = newData
convertSize = inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize convertSize = (
inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize
)
if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)) convertSize = convertSize + (
self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)
)
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
crop = self.audio_buffer[-1 * (inputSize + crossfadeSize):-1 * (crossfadeSize)] cropOffset = -1 * (inputSize + crossfadeSize)
cropEnd = -1 * (crossfadeSize)
crop = self.audio_buffer[cropOffset:cropEnd]
rms = np.sqrt(np.square(crop).mean(axis=0)) rms = np.sqrt(np.square(crop).mean(axis=0))
vol = max(rms, self.prevVol * 0.0) vol = max(rms, self.prevVol * 0.0)
@ -288,38 +355,46 @@ class SoVitsSvc40:
return (c, f0, uv, convertSize, vol) return (c, f0, uv, convertSize, vol)
def _onnx_inference(self, data): def _onnx_inference(self, data):
if hasattr(self, "onnx_session") == False or self.onnx_session == None: if hasattr(self, "onnx_session") is False or self.onnx_session is None:
print("[Voice Changer] No onnx session.") print("[Voice Changer] No onnx session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
convertSize = data[3] convertSize = data[3]
vol = data[4] vol = data[4]
data = (data[0], data[1], data[2],) data = (
data[0],
data[1],
data[2],
)
if vol < self.settings.silentThreshold: if vol < self.settings.silentThreshold:
return np.zeros(convertSize).astype(np.int16) return np.zeros(convertSize).astype(np.int16)
c, f0, uv = [x.numpy() for x in data] c, f0, uv = [x.numpy() for x in data]
sid_target = torch.LongTensor([self.settings.dstId]).unsqueeze(0).numpy() sid_target = torch.LongTensor([self.settings.dstId]).unsqueeze(0).numpy()
audio1 = self.onnx_session.run( audio1 = (
self.onnx_session.run(
["audio"], ["audio"],
{ {
"c": c.astype(np.float32), "c": c.astype(np.float32),
"f0": f0.astype(np.float32), "f0": f0.astype(np.float32),
"uv": uv.astype(np.float32), "uv": uv.astype(np.float32),
"g": sid_target.astype(np.int64), "g": sid_target.astype(np.int64),
"noise_scale": np.array([self.settings.noiseScale]).astype(np.float32), "noise_scale": np.array([self.settings.noiseScale]).astype(
np.float32
),
# "predict_f0": np.array([self.settings.dstId]).astype(np.int64), # "predict_f0": np.array([self.settings.dstId]).astype(np.int64),
},
)[0][0, 0]
})[0][0, 0] * self.hps.data.max_wav_value * self.hps.data.max_wav_value
)
audio1 = audio1 * vol audio1 = audio1 * vol
result = audio1 result = audio1
return result return result
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if hasattr(self, "net_g") == False or self.net_g == None: if hasattr(self, "net_g") is False or self.net_g is None:
print("[Voice Changer] No pyTorch session.") print("[Voice Changer] No pyTorch session.")
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
@ -330,19 +405,29 @@ class SoVitsSvc40:
convertSize = data[3] convertSize = data[3]
vol = data[4] vol = data[4]
data = (data[0], data[1], data[2],) data = (
data[0],
data[1],
data[2],
)
if vol < self.settings.silentThreshold: if vol < self.settings.silentThreshold:
return np.zeros(convertSize).astype(np.int16) return np.zeros(convertSize).astype(np.int16)
with torch.no_grad(): with torch.no_grad():
c, f0, uv = [x.to(dev)for x in data] c, f0, uv = [x.to(dev) for x in data]
sid_target = torch.LongTensor([self.settings.dstId]).to(dev).unsqueeze(0) sid_target = torch.LongTensor([self.settings.dstId]).to(dev).unsqueeze(0)
self.net_g.to(dev) self.net_g.to(dev)
# audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=True, noice_scale=0.1)[0][0, 0].data.float() # audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=True, noice_scale=0.1)[0][0, 0].data.float()
predict_f0_flag = True if self.settings.predictF0 == 1 else False predict_f0_flag = True if self.settings.predictF0 == 1 else False
audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=predict_f0_flag, audio1 = self.net_g.infer(
noice_scale=self.settings.noiseScale) c,
f0=f0,
g=sid_target,
uv=uv,
predict_f0=predict_f0_flag,
noice_scale=self.settings.noiseScale,
)
audio1 = audio1[0][0].data.float() audio1 = audio1[0][0].data.float()
# audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=predict_f0_flag, # audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=predict_f0_flag,
# noice_scale=self.settings.noiceScale)[0][0, 0].data.float() # noice_scale=self.settings.noiceScale)[0][0, 0].data.float()
@ -367,7 +452,7 @@ class SoVitsSvc40:
del self.net_g del self.net_g
del self.onnx_session del self.onnx_session
remove_path = os.path.join("so-vits-svc-40") remove_path = os.path.join("so-vits-svc-40")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -376,14 +461,18 @@ class SoVitsSvc40:
if file_path.find("so-vits-svc-40" + os.path.sep) >= 0: if file_path.find("so-vits-svc-40" + os.path.sep) >= 0:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except Exception: # type:ignore
pass pass
def resize_f0(x, target_len): def resize_f0(x, target_len):
source = np.array(x) source = np.array(x)
source[source < 0.001] = np.nan source[source < 0.001] = np.nan
target = np.interp(np.arange(0, len(source) * target_len, len(source)) / target_len, np.arange(0, len(source)), source) target = np.interp(
np.arange(0, len(source) * target_len, len(source)) / target_len,
np.arange(0, len(source)),
source,
)
res = np.nan_to_num(target) res = np.nan_to_num(target)
return res return res
@ -406,7 +495,13 @@ def compute_f0_dio(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512):
def compute_f0_harvest(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512): def compute_f0_harvest(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512):
if p_len is None: if p_len is None:
p_len = wav_numpy.shape[0] // hop_length p_len = wav_numpy.shape[0] // hop_length
f0, t = pw.harvest(wav_numpy.astype(np.double), fs=sampling_rate, frame_period=5.5, f0_floor=71.0, f0_ceil=1000.0) f0, t = pw.harvest(
wav_numpy.astype(np.double),
fs=sampling_rate,
frame_period=5.5,
f0_floor=71.0,
f0_ceil=1000.0,
)
for index, pitch in enumerate(f0): for index, pitch in enumerate(f0):
f0[index] = round(pitch, 1) f0[index] = round(pitch, 1)

View File

@ -1,6 +1,11 @@
import sys import sys
import os import os
if sys.platform.startswith('darwin'):
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
if sys.platform.startswith("darwin"):
baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")] baseDir = [x for x in sys.path if x.endswith("Contents/MacOS")]
if len(baseDir) != 1: if len(baseDir) != 1:
print("baseDir should be only one ", baseDir) print("baseDir should be only one ", baseDir)
@ -12,25 +17,29 @@ else:
import io import io
from dataclasses import dataclass, asdict, field from dataclasses import dataclass, asdict, field
from functools import reduce
import numpy as np import numpy as np
import torch import torch
import onnxruntime import onnxruntime
import pyworld as pw import pyworld as pw
from models import SynthesizerTrn from models import SynthesizerTrn # type:ignore
import cluster import cluster # type:ignore
import utils import utils
from fairseq import checkpoint_utils from fairseq import checkpoint_utils
import librosa import librosa
from Exceptions import NoModeLoadedException from Exceptions import NoModeLoadedException
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
@dataclass @dataclass
class SoVitsSvc40v2Settings(): class SoVitsSvc40v2Settings:
gpu: int = 0 gpu: int = 0
dstId: int = 0 dstId: int = 0
@ -47,9 +56,7 @@ class SoVitsSvc40v2Settings():
onnxModelFile: str = "" onnxModelFile: str = ""
configFile: str = "" configFile: str = ""
speakers: dict[str, int] = field( speakers: dict[str, int] = field(default_factory=lambda: {})
default_factory=lambda: {}
)
# ↓mutableな物だけ列挙 # ↓mutableな物だけ列挙
intData = ["gpu", "dstId", "tran", "predictF0", "extraConvertSize"] intData = ["gpu", "dstId", "tran", "predictF0", "extraConvertSize"]
@ -58,7 +65,9 @@ class SoVitsSvc40v2Settings():
class SoVitsSvc40v2: class SoVitsSvc40v2:
def __init__(self, params): audio_buffer: AudioInOut | None = None
def __init__(self, params: VoiceChangerParams):
self.settings = SoVitsSvc40v2Settings() self.settings = SoVitsSvc40v2Settings()
self.net_g = None self.net_g = None
self.onnx_session = None self.onnx_session = None
@ -69,23 +78,21 @@ class SoVitsSvc40v2:
self.params = params self.params = params
print("so-vits-svc 40v2 initialization:", params) print("so-vits-svc 40v2 initialization:", params)
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
self.settings.configFile = props["files"]["configFilename"] self.settings.configFile = props.files.configFilename
self.hps = utils.get_hparams_from_file(self.settings.configFile) self.hps = utils.get_hparams_from_file(self.settings.configFile)
self.settings.speakers = self.hps.spk self.settings.speakers = self.hps.spk
self.settings.pyTorchModelFile = props["files"]["pyTorchModelFilename"] self.settings.pyTorchModelFile = props.files.pyTorchModelFilename
self.settings.onnxModelFile = props["files"]["onnxModelFilename"] self.settings.onnxModelFile = props.files.onnxModelFilename
clusterTorchModel = props["files"]["clusterTorchModelFilename"] clusterTorchModel = props.files.clusterTorchModelFilename
content_vec_path = self.params["content_vec_500"] content_vec_path = self.params.content_vec_500
# content_vec_hubert_onnx_path = self.params["content_vec_500_onnx"] hubert_base_path = self.params.hubert_base
# content_vec_hubert_onnx_on = self.params["content_vec_500_onnx_on"]
hubert_base_path = self.params["hubert_base"]
# hubert model # hubert model
try: try:
if os.path.exists(content_vec_path) == False: if os.path.exists(content_vec_path) is False:
content_vec_path = hubert_base_path content_vec_path = hubert_base_path
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task( models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
@ -100,7 +107,7 @@ class SoVitsSvc40v2:
# cluster # cluster
try: try:
if clusterTorchModel != None and os.path.exists(clusterTorchModel): if clusterTorchModel is not None and os.path.exists(clusterTorchModel):
self.cluster_model = cluster.get_cluster_model(clusterTorchModel) self.cluster_model = cluster.get_cluster_model(clusterTorchModel)
else: else:
self.cluster_model = None self.cluster_model = None
@ -108,41 +115,50 @@ class SoVitsSvc40v2:
print("EXCEPTION during loading cluster model ", e) print("EXCEPTION during loading cluster model ", e)
# PyTorchモデル生成 # PyTorchモデル生成
if self.settings.pyTorchModelFile != None: if self.settings.pyTorchModelFile is not None:
self.net_g = SynthesizerTrn( net_g = SynthesizerTrn(self.hps)
self.hps net_g.eval()
) self.net_g = net_g
self.net_g.eval()
utils.load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None) utils.load_checkpoint(self.settings.pyTorchModelFile, self.net_g, None)
# ONNXモデル生成 # ONNXモデル生成
if self.settings.onnxModelFile != None: if self.settings.onnxModelFile is not None:
ort_options = onnxruntime.SessionOptions() ort_options = onnxruntime.SessionOptions()
ort_options.intra_op_num_threads = 8 ort_options.intra_op_num_threads = 8
self.onnx_session = onnxruntime.InferenceSession( self.onnx_session = onnxruntime.InferenceSession(
self.settings.onnxModelFile, self.settings.onnxModelFile, providers=providers
providers=providers
) )
input_info = self.onnx_session.get_inputs() # input_info = self.onnx_session.get_inputs()
return self.get_info() return self.get_info()
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: int | float | str):
if key == "onnxExecutionProvider" and self.onnx_session != None: if key == "onnxExecutionProvider" and self.onnx_session is not None:
if val == "CUDAExecutionProvider": if val == "CUDAExecutionProvider":
if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num: if self.settings.gpu < 0 or self.settings.gpu >= self.gpu_num:
self.settings.gpu = 0 self.settings.gpu = 0
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=[val], provider_options=provider_options) self.onnx_session.set_providers(
providers=[val], provider_options=provider_options
)
else: else:
self.onnx_session.set_providers(providers=[val]) self.onnx_session.set_providers(providers=[val])
elif key in self.settings.intData: elif key in self.settings.intData:
setattr(self.settings, key, int(val)) val = int(val)
if key == "gpu" and val >= 0 and val < self.gpu_num and self.onnx_session != None: setattr(self.settings, key, val)
if (
key == "gpu"
and val >= 0
and val < self.gpu_num
and self.onnx_session is not None
):
providers = self.onnx_session.get_providers() providers = self.onnx_session.get_providers()
print("Providers:", providers) print("Providers:", providers)
if "CUDAExecutionProvider" in providers: if "CUDAExecutionProvider" in providers:
provider_options = [{'device_id': self.settings.gpu}] provider_options = [{"device_id": self.settings.gpu}]
self.onnx_session.set_providers(providers=["CUDAExecutionProvider"], provider_options=provider_options) self.onnx_session.set_providers(
providers=["CUDAExecutionProvider"],
provider_options=provider_options,
)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -155,10 +171,12 @@ class SoVitsSvc40v2:
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
data["onnxExecutionProviders"] = self.onnx_session.get_providers() if self.onnx_session != None else [] data["onnxExecutionProviders"] = (
self.onnx_session.get_providers() if self.onnx_session is not None else []
)
files = ["configFile", "pyTorchModelFile", "onnxModelFile"] files = ["configFile", "pyTorchModelFile", "onnxModelFile"]
for f in files: for f in files:
if data[f] != None and os.path.exists(data[f]): if data[f] is not None and os.path.exists(data[f]):
data[f] = os.path.basename(data[f]) data[f] = os.path.basename(data[f])
else: else:
data[f] = "" data[f] = ""
@ -166,7 +184,7 @@ class SoVitsSvc40v2:
return data return data
def get_processing_sampling_rate(self): def get_processing_sampling_rate(self):
if hasattr(self, "hps") == False: if hasattr(self, "hps") is False:
raise NoModeLoadedException("config") raise NoModeLoadedException("config")
return self.hps.data.sampling_rate return self.hps.data.sampling_rate
@ -175,12 +193,22 @@ class SoVitsSvc40v2:
# f0 = utils.compute_f0_parselmouth(wav, sampling_rate=self.target_sample, hop_length=self.hop_size) # f0 = utils.compute_f0_parselmouth(wav, sampling_rate=self.target_sample, hop_length=self.hop_size)
# f0 = utils.compute_f0_dio(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length) # f0 = utils.compute_f0_dio(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length)
if self.settings.f0Detector == "dio": if self.settings.f0Detector == "dio":
f0 = compute_f0_dio(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length) f0 = compute_f0_dio(
wav_44k,
sampling_rate=self.hps.data.sampling_rate,
hop_length=self.hps.data.hop_length,
)
else: else:
f0 = compute_f0_harvest(wav_44k, sampling_rate=self.hps.data.sampling_rate, hop_length=self.hps.data.hop_length) f0 = compute_f0_harvest(
wav_44k,
sampling_rate=self.hps.data.sampling_rate,
hop_length=self.hps.data.hop_length,
)
if wav_44k.shape[0] % self.hps.data.hop_length != 0: if wav_44k.shape[0] % self.hps.data.hop_length != 0:
print(f" !!! !!! !!! wav size not multiple of hopsize: {wav_44k.shape[0] / self.hps.data.hop_length}") print(
f" !!! !!! !!! wav size not multiple of hopsize: {wav_44k.shape[0] / self.hps.data.hop_length}"
)
f0, uv = utils.interpolate_f0(f0) f0, uv = utils.interpolate_f0(f0)
f0 = torch.FloatTensor(f0) f0 = torch.FloatTensor(f0)
@ -190,10 +218,14 @@ class SoVitsSvc40v2:
uv = uv.unsqueeze(0) uv = uv.unsqueeze(0)
# wav16k = librosa.resample(audio_buffer, orig_sr=24000, target_sr=16000) # wav16k = librosa.resample(audio_buffer, orig_sr=24000, target_sr=16000)
wav16k = librosa.resample(audio_buffer, orig_sr=self.hps.data.sampling_rate, target_sr=16000) wav16k = librosa.resample(
audio_buffer, orig_sr=self.hps.data.sampling_rate, target_sr=16000
)
wav16k = torch.from_numpy(wav16k) wav16k = torch.from_numpy(wav16k)
if (self.settings.gpu < 0 or self.gpu_num == 0) or self.settings.framework == "ONNX": if (
self.settings.gpu < 0 or self.gpu_num == 0
) or self.settings.framework == "ONNX":
dev = torch.device("cpu") dev = torch.device("cpu")
else: else:
dev = torch.device("cuda", index=self.settings.gpu) dev = torch.device("cuda", index=self.settings.gpu)
@ -206,36 +238,64 @@ class SoVitsSvc40v2:
c = utils.get_hubert_content(self.hubert_model, wav_16k_tensor=wav16k) c = utils.get_hubert_content(self.hubert_model, wav_16k_tensor=wav16k)
c = utils.repeat_expand_2d(c.squeeze(0), f0.shape[1]) c = utils.repeat_expand_2d(c.squeeze(0), f0.shape[1])
if self.settings.clusterInferRatio != 0 and hasattr(self, "cluster_model") and self.cluster_model != None: if (
speaker = [key for key, value in self.settings.speakers.items() if value == self.settings.dstId] self.settings.clusterInferRatio != 0
and hasattr(self, "cluster_model")
and self.cluster_model is not None
):
speaker = [
key
for key, value in self.settings.speakers.items()
if value == self.settings.dstId
]
if len(speaker) != 1: if len(speaker) != 1:
print("not only one speaker found.", speaker) pass
# print("not only one speaker found.", speaker)
else: else:
cluster_c = cluster.get_cluster_center_result(self.cluster_model, c.cpu().numpy().T, speaker[0]).T cluster_c = cluster.get_cluster_center_result(
self.cluster_model, c.cpu().numpy().T, speaker[0]
).T
# cluster_c = cluster.get_cluster_center_result(self.cluster_model, c.cpu().numpy().T, self.settings.dstId).T # cluster_c = cluster.get_cluster_center_result(self.cluster_model, c.cpu().numpy().T, self.settings.dstId).T
cluster_c = torch.FloatTensor(cluster_c).to(dev) cluster_c = torch.FloatTensor(cluster_c).to(dev)
# print("cluster DEVICE", cluster_c.device, c.device) # print("cluster DEVICE", cluster_c.device, c.device)
c = self.settings.clusterInferRatio * cluster_c + (1 - self.settings.clusterInferRatio) * c c = (
self.settings.clusterInferRatio * cluster_c
+ (1 - self.settings.clusterInferRatio) * c
)
c = c.unsqueeze(0) c = c.unsqueeze(0)
return c, f0, uv return c, f0, uv
def generate_input(self, newData: any, inputSize: int, crossfadeSize: int, solaSearchFrame: int = 0): def generate_input(
self,
newData: AudioInOut,
inputSize: int,
crossfadeSize: int,
solaSearchFrame: int = 0,
):
newData = newData.astype(np.float32) / self.hps.data.max_wav_value newData = newData.astype(np.float32) / self.hps.data.max_wav_value
if hasattr(self, "audio_buffer"): if self.audio_buffer is not None:
self.audio_buffer = np.concatenate([self.audio_buffer, newData], 0) # 過去のデータに連結 self.audio_buffer = np.concatenate(
[self.audio_buffer, newData], 0
) # 過去のデータに連結
else: else:
self.audio_buffer = newData self.audio_buffer = newData
convertSize = inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize convertSize = (
inputSize + crossfadeSize + solaSearchFrame + self.settings.extraConvertSize
)
if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。 if convertSize % self.hps.data.hop_length != 0: # モデルの出力のホップサイズで切り捨てが発生するので補う。
convertSize = convertSize + (self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)) convertSize = convertSize + (
self.hps.data.hop_length - (convertSize % self.hps.data.hop_length)
)
convertOffset = -1 * convertSize
self.audio_buffer = self.audio_buffer[convertOffset:] # 変換対象の部分だけ抽出
self.audio_buffer = self.audio_buffer[-1 * convertSize:] # 変換対象の部分だけ抽出 cropOffset = -1 * (inputSize + crossfadeSize)
cropEnd = -1 * (crossfadeSize)
crop = self.audio_buffer[-1 * (inputSize + crossfadeSize):-1 * (crossfadeSize)] crop = self.audio_buffer[cropOffset:cropEnd]
rms = np.sqrt(np.square(crop).mean(axis=0)) rms = np.sqrt(np.square(crop).mean(axis=0))
vol = max(rms, self.prevVol * 0.0) vol = max(rms, self.prevVol * 0.0)
@ -245,19 +305,24 @@ class SoVitsSvc40v2:
return (c, f0, uv, convertSize, vol) return (c, f0, uv, convertSize, vol)
def _onnx_inference(self, data): def _onnx_inference(self, data):
if hasattr(self, "onnx_session") == False or self.onnx_session == None: if hasattr(self, "onnx_session") is False or self.onnx_session is None:
print("[Voice Changer] No onnx session.") print("[Voice Changer] No onnx session.")
raise NoModeLoadedException("ONNX") raise NoModeLoadedException("ONNX")
convertSize = data[3] convertSize = data[3]
vol = data[4] vol = data[4]
data = (data[0], data[1], data[2],) data = (
data[0],
data[1],
data[2],
)
if vol < self.settings.silentThreshold: if vol < self.settings.silentThreshold:
return np.zeros(convertSize).astype(np.int16) return np.zeros(convertSize).astype(np.int16)
c, f0, uv = [x.numpy() for x in data] c, f0, uv = [x.numpy() for x in data]
audio1 = self.onnx_session.run( audio1 = (
self.onnx_session.run(
["audio"], ["audio"],
{ {
"c": c, "c": c,
@ -266,9 +331,10 @@ class SoVitsSvc40v2:
"uv": np.array([self.settings.dstId]).astype(np.int64), "uv": np.array([self.settings.dstId]).astype(np.int64),
"predict_f0": np.array([self.settings.dstId]).astype(np.int64), "predict_f0": np.array([self.settings.dstId]).astype(np.int64),
"noice_scale": np.array([self.settings.dstId]).astype(np.int64), "noice_scale": np.array([self.settings.dstId]).astype(np.int64),
},
)[0][0, 0]
})[0][0, 0] * self.hps.data.max_wav_value * self.hps.data.max_wav_value
)
audio1 = audio1 * vol audio1 = audio1 * vol
@ -277,7 +343,7 @@ class SoVitsSvc40v2:
return result return result
def _pyTorch_inference(self, data): def _pyTorch_inference(self, data):
if hasattr(self, "net_g") == False or self.net_g == None: if hasattr(self, "net_g") is False or self.net_g is None:
print("[Voice Changer] No pyTorch session.") print("[Voice Changer] No pyTorch session.")
raise NoModeLoadedException("pytorch") raise NoModeLoadedException("pytorch")
@ -288,19 +354,29 @@ class SoVitsSvc40v2:
convertSize = data[3] convertSize = data[3]
vol = data[4] vol = data[4]
data = (data[0], data[1], data[2],) data = (
data[0],
data[1],
data[2],
)
if vol < self.settings.silentThreshold: if vol < self.settings.silentThreshold:
return np.zeros(convertSize).astype(np.int16) return np.zeros(convertSize).astype(np.int16)
with torch.no_grad(): with torch.no_grad():
c, f0, uv = [x.to(dev)for x in data] c, f0, uv = [x.to(dev) for x in data]
sid_target = torch.LongTensor([self.settings.dstId]).to(dev) sid_target = torch.LongTensor([self.settings.dstId]).to(dev)
self.net_g.to(dev) self.net_g.to(dev)
# audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=True, noice_scale=0.1)[0][0, 0].data.float() # audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=True, noice_scale=0.1)[0][0, 0].data.float()
predict_f0_flag = True if self.settings.predictF0 == 1 else False predict_f0_flag = True if self.settings.predictF0 == 1 else False
audio1 = self.net_g.infer(c, f0=f0, g=sid_target, uv=uv, predict_f0=predict_f0_flag, audio1 = self.net_g.infer(
noice_scale=self.settings.noiseScale)[0][0, 0].data.float() c,
f0=f0,
g=sid_target,
uv=uv,
predict_f0=predict_f0_flag,
noice_scale=self.settings.noiseScale,
)[0][0, 0].data.float()
audio1 = audio1 * self.hps.data.max_wav_value audio1 = audio1 * self.hps.data.max_wav_value
audio1 = audio1 * vol audio1 = audio1 * vol
@ -322,7 +398,7 @@ class SoVitsSvc40v2:
del self.onnx_session del self.onnx_session
remove_path = os.path.join("so-vits-svc-40v2") remove_path = os.path.join("so-vits-svc-40v2")
sys.path = [x for x in sys.path if x.endswith(remove_path) == False] sys.path = [x for x in sys.path if x.endswith(remove_path) is False]
for key in list(sys.modules): for key in list(sys.modules):
val = sys.modules.get(key) val = sys.modules.get(key)
@ -331,14 +407,18 @@ class SoVitsSvc40v2:
if file_path.find("so-vits-svc-40v2" + os.path.sep) >= 0: if file_path.find("so-vits-svc-40v2" + os.path.sep) >= 0:
print("remove", key, file_path) print("remove", key, file_path)
sys.modules.pop(key) sys.modules.pop(key)
except Exception as e: except: # type:ignore
pass pass
def resize_f0(x, target_len): def resize_f0(x, target_len):
source = np.array(x) source = np.array(x)
source[source < 0.001] = np.nan source[source < 0.001] = np.nan
target = np.interp(np.arange(0, len(source) * target_len, len(source)) / target_len, np.arange(0, len(source)), source) target = np.interp(
np.arange(0, len(source) * target_len, len(source)) / target_len,
np.arange(0, len(source)),
source,
)
res = np.nan_to_num(target) res = np.nan_to_num(target)
return res return res
@ -361,7 +441,13 @@ def compute_f0_dio(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512):
def compute_f0_harvest(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512): def compute_f0_harvest(wav_numpy, p_len=None, sampling_rate=44100, hop_length=512):
if p_len is None: if p_len is None:
p_len = wav_numpy.shape[0] // hop_length p_len = wav_numpy.shape[0] // hop_length
f0, t = pw.harvest(wav_numpy.astype(np.double), fs=sampling_rate, frame_period=5.5, f0_floor=71.0, f0_ceil=1000.0) f0, t = pw.harvest(
wav_numpy.astype(np.double),
fs=sampling_rate,
frame_period=5.5,
f0_floor=71.0,
f0_ceil=1000.0,
)
for index, pitch in enumerate(f0): for index, pitch in enumerate(f0):
f0[index] = round(pitch, 1) f0[index] = round(pitch, 1)

View File

@ -1,4 +1,4 @@
from typing import Any, Callable, Optional, Protocol, TypeAlias, Union, cast from typing import Any, Union, cast
from const import TMP_DIR, ModelType from const import TMP_DIR, ModelType
import torch import torch
import os import os
@ -9,23 +9,26 @@ import resampy
from voice_changer.IORecorder import IORecorder from voice_changer.IORecorder import IORecorder
# from voice_changer.IOAnalyzer import IOAnalyzer from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.Timer import Timer from voice_changer.utils.Timer import Timer
from voice_changer.utils.VoiceChangerModel import VoiceChangerModel, AudioInOut from voice_changer.utils.VoiceChangerModel import VoiceChangerModel, AudioInOut
import time
from Exceptions import NoModeLoadedException, ONNXInputArgumentException from Exceptions import NoModeLoadedException, ONNXInputArgumentException
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
providers = ['OpenVINOExecutionProvider', "CUDAExecutionProvider", "DmlExecutionProvider", "CPUExecutionProvider"] providers = [
"OpenVINOExecutionProvider",
"CUDAExecutionProvider",
"DmlExecutionProvider",
"CPUExecutionProvider",
]
STREAM_INPUT_FILE = os.path.join(TMP_DIR, "in.wav") STREAM_INPUT_FILE = os.path.join(TMP_DIR, "in.wav")
STREAM_OUTPUT_FILE = os.path.join(TMP_DIR, "out.wav") STREAM_OUTPUT_FILE = os.path.join(TMP_DIR, "out.wav")
STREAM_ANALYZE_FILE_DIO = os.path.join(TMP_DIR, "analyze-dio.png")
STREAM_ANALYZE_FILE_HARVEST = os.path.join(TMP_DIR, "analyze-harvest.png")
@dataclass @dataclass
class VoiceChangerSettings(): class VoiceChangerSettings:
inputSampleRate: int = 48000 # 48000 or 24000 inputSampleRate: int = 48000 # 48000 or 24000
crossFadeOffsetRate: float = 0.1 crossFadeOffsetRate: float = 0.1
@ -41,35 +44,40 @@ class VoiceChangerSettings():
floatData: list[str] = field( floatData: list[str] = field(
default_factory=lambda: ["crossFadeOffsetRate", "crossFadeEndRate"] default_factory=lambda: ["crossFadeOffsetRate", "crossFadeEndRate"]
) )
strData: list[str] = field( strData: list[str] = field(default_factory=lambda: [])
default_factory=lambda: []
)
class VoiceChanger(): class VoiceChanger:
settings: VoiceChangerSettings settings: VoiceChangerSettings
voiceChanger: VoiceChangerModel voiceChanger: VoiceChangerModel
ioRecorder: IORecorder
sola_buffer: AudioInOut
def __init__(self, params): def __init__(self, params: VoiceChangerParams):
# 初期化 # 初期化
self.settings = VoiceChangerSettings() self.settings = VoiceChangerSettings()
self.onnx_session = None self.onnx_session = None
self.currentCrossFadeOffsetRate = 0 self.currentCrossFadeOffsetRate = 0.0
self.currentCrossFadeEndRate = 0 self.currentCrossFadeEndRate = 0.0
self.currentCrossFadeOverlapSize = 0 # setting self.currentCrossFadeOverlapSize = 0 # setting
self.crossfadeSize = 0 # calculated self.crossfadeSize = 0 # calculated
self.voiceChanger = None self.voiceChanger = None
self.modelType = None self.modelType: ModelType | None = None
self.params = params self.params = params
self.gpu_num = torch.cuda.device_count() self.gpu_num = torch.cuda.device_count()
self.prev_audio = np.zeros(4096) self.prev_audio = np.zeros(4096)
self.mps_enabled: bool = getattr(torch.backends, "mps", None) is not None and torch.backends.mps.is_available() self.mps_enabled: bool = (
getattr(torch.backends, "mps", None) is not None
and torch.backends.mps.is_available()
)
print(f"VoiceChanger Initialized (GPU_NUM:{self.gpu_num}, mps_enabled:{self.mps_enabled})") print(
f"VoiceChanger Initialized (GPU_NUM:{self.gpu_num}, mps_enabled:{self.mps_enabled})"
)
def switchModelType(self, modelType: ModelType): def switchModelType(self, modelType: ModelType):
if hasattr(self, "voiceChanger") and self.voiceChanger != None: if hasattr(self, "voiceChanger") and self.voiceChanger is not None:
# return {"status": "ERROR", "msg": "vc is already selected. currently re-select is not implemented"} # return {"status": "ERROR", "msg": "vc is already selected. currently re-select is not implemented"}
del self.voiceChanger del self.voiceChanger
self.voiceChanger = None self.voiceChanger = None
@ -77,58 +85,49 @@ class VoiceChanger():
self.modelType = modelType self.modelType = modelType
if self.modelType == "MMVCv15": if self.modelType == "MMVCv15":
from voice_changer.MMVCv15.MMVCv15 import MMVCv15 from voice_changer.MMVCv15.MMVCv15 import MMVCv15
self.voiceChanger = MMVCv15() # type: ignore self.voiceChanger = MMVCv15() # type: ignore
elif self.modelType == "MMVCv13": elif self.modelType == "MMVCv13":
from voice_changer.MMVCv13.MMVCv13 import MMVCv13 from voice_changer.MMVCv13.MMVCv13 import MMVCv13
self.voiceChanger = MMVCv13() self.voiceChanger = MMVCv13()
elif self.modelType == "so-vits-svc-40v2": elif self.modelType == "so-vits-svc-40v2":
from voice_changer.SoVitsSvc40v2.SoVitsSvc40v2 import SoVitsSvc40v2 from voice_changer.SoVitsSvc40v2.SoVitsSvc40v2 import SoVitsSvc40v2
self.voiceChanger = SoVitsSvc40v2(self.params) self.voiceChanger = SoVitsSvc40v2(self.params)
elif self.modelType == "so-vits-svc-40" or self.modelType == "so-vits-svc-40_c": elif self.modelType == "so-vits-svc-40" or self.modelType == "so-vits-svc-40_c":
from voice_changer.SoVitsSvc40.SoVitsSvc40 import SoVitsSvc40 from voice_changer.SoVitsSvc40.SoVitsSvc40 import SoVitsSvc40
self.voiceChanger = SoVitsSvc40(self.params) self.voiceChanger = SoVitsSvc40(self.params)
elif self.modelType == "DDSP-SVC": elif self.modelType == "DDSP-SVC":
from voice_changer.DDSP_SVC.DDSP_SVC import DDSP_SVC from voice_changer.DDSP_SVC.DDSP_SVC import DDSP_SVC
self.voiceChanger = DDSP_SVC(self.params) self.voiceChanger = DDSP_SVC(self.params)
elif self.modelType == "RVC": elif self.modelType == "RVC":
from voice_changer.RVC.RVC import RVC from voice_changer.RVC.RVC import RVC
self.voiceChanger = RVC(self.params) self.voiceChanger = RVC(self.params)
else: else:
from voice_changer.MMVCv13.MMVCv13 import MMVCv13 from voice_changer.MMVCv13.MMVCv13 import MMVCv13
self.voiceChanger = MMVCv13() self.voiceChanger = MMVCv13()
return {"status": "OK", "msg": "vc is switched."} return {"status": "OK", "msg": "vc is switched."}
def getModelType(self): def getModelType(self):
if self.modelType != None: if self.modelType is not None:
return {"status": "OK", "vc": self.modelType} return {"status": "OK", "vc": self.modelType}
else: else:
return {"status": "OK", "vc": "none"} return {"status": "OK", "vc": "none"}
def loadModel( def loadModel(self, props: LoadModelParams):
self,
props,
):
try: try:
return self.voiceChanger.loadModel(props) return self.voiceChanger.loadModel(props)
except Exception as e: except Exception as e:
print(traceback.format_exc())
print("[Voice Changer] Model Load Error! Check your model is valid.", e) print("[Voice Changer] Model Load Error! Check your model is valid.", e)
return {"status": "NG"} return {"status": "NG"}
# try:
# if self.modelType == "MMVCv15" or self.modelType == "MMVCv13":
# return self.voiceChanger.loadModel(config, pyTorch_model_file, onnx_model_file)
# elif self.modelType == "so-vits-svc-40" or self.modelType == "so-vits-svc-40_c" or self.modelType == "so-vits-svc-40v2":
# return self.voiceChanger.loadModel(config, pyTorch_model_file, onnx_model_file, clusterTorchModel)
# elif self.modelType == "RVC":
# return self.voiceChanger.loadModel(slot, config, pyTorch_model_file, onnx_model_file, feature_file, index_file, is_half)
# else:
# return self.voiceChanger.loadModel(config, pyTorch_model_file, onnx_model_file, clusterTorchModel)
# except Exception as e:
# print("[Voice Changer] Model Load Error! Check your model is valid.", e)
# return {"status": "NG"}
def get_info(self): def get_info(self):
data = asdict(self.settings) data = asdict(self.settings)
if hasattr(self, "voiceChanger"): if hasattr(self, "voiceChanger"):
@ -143,7 +142,9 @@ class VoiceChanger():
if key == "recordIO" and val == 1: if key == "recordIO" and val == 1:
if hasattr(self, "ioRecorder"): if hasattr(self, "ioRecorder"):
self.ioRecorder.close() self.ioRecorder.close()
self.ioRecorder = IORecorder(STREAM_INPUT_FILE, STREAM_OUTPUT_FILE, self.settings.inputSampleRate) self.ioRecorder = IORecorder(
STREAM_INPUT_FILE, STREAM_OUTPUT_FILE, self.settings.inputSampleRate
)
if key == "recordIO" and val == 0: if key == "recordIO" and val == 0:
if hasattr(self, "ioRecorder"): if hasattr(self, "ioRecorder"):
self.ioRecorder.close() self.ioRecorder.close()
@ -152,14 +153,6 @@ class VoiceChanger():
if hasattr(self, "ioRecorder"): if hasattr(self, "ioRecorder"):
self.ioRecorder.close() self.ioRecorder.close()
# if hasattr(self, "ioAnalyzer") == False:
# self.ioAnalyzer = IOAnalyzer()
# try:
# self.ioAnalyzer.analyze(STREAM_INPUT_FILE, STREAM_ANALYZE_FILE_DIO, STREAM_ANALYZE_FILE_HARVEST, self.settings.inputSampleRate)
# except Exception as e:
# print("recordIO exception", e)
elif key in self.settings.floatData: elif key in self.settings.floatData:
setattr(self.settings, key, float(val)) setattr(self.settings, key, float(val))
elif key in self.settings.strData: elif key in self.settings.strData:
@ -167,19 +160,19 @@ class VoiceChanger():
else: else:
if hasattr(self, "voiceChanger"): if hasattr(self, "voiceChanger"):
ret = self.voiceChanger.update_settings(key, val) ret = self.voiceChanger.update_settings(key, val)
if ret == False: if ret is False:
print(f"{key} is not mutable variable or unknown variable!") print(f"{key} is not mutable variable or unknown variable!")
else: else:
print(f"voice changer is not initialized!") print("voice changer is not initialized!")
return self.get_info() return self.get_info()
def _generate_strength(self, crossfadeSize: int): def _generate_strength(self, crossfadeSize: int):
if (
if self.crossfadeSize != crossfadeSize or \ self.crossfadeSize != crossfadeSize
self.currentCrossFadeOffsetRate != self.settings.crossFadeOffsetRate or \ or self.currentCrossFadeOffsetRate != self.settings.crossFadeOffsetRate
self.currentCrossFadeEndRate != self.settings.crossFadeEndRate or \ or self.currentCrossFadeEndRate != self.settings.crossFadeEndRate
self.currentCrossFadeOverlapSize != self.settings.crossFadeOverlapSize: or self.currentCrossFadeOverlapSize != self.settings.crossFadeOverlapSize
):
self.crossfadeSize = crossfadeSize self.crossfadeSize = crossfadeSize
self.currentCrossFadeOffsetRate = self.settings.crossFadeOffsetRate self.currentCrossFadeOffsetRate = self.settings.crossFadeOffsetRate
self.currentCrossFadeEndRate = self.settings.crossFadeEndRate self.currentCrossFadeEndRate = self.settings.crossFadeEndRate
@ -193,30 +186,54 @@ class VoiceChanger():
np_prev_strength = np.cos(percent * 0.5 * np.pi) ** 2 np_prev_strength = np.cos(percent * 0.5 * np.pi) ** 2
np_cur_strength = np.cos((1 - percent) * 0.5 * np.pi) ** 2 np_cur_strength = np.cos((1 - percent) * 0.5 * np.pi) ** 2
self.np_prev_strength = np.concatenate([np.ones(cf_offset), np_prev_strength, self.np_prev_strength = np.concatenate(
np.zeros(crossfadeSize - cf_offset - len(np_prev_strength))]) [
self.np_cur_strength = np.concatenate([np.zeros(cf_offset), np_cur_strength, np.ones(crossfadeSize - cf_offset - len(np_cur_strength))]) np.ones(cf_offset),
np_prev_strength,
np.zeros(crossfadeSize - cf_offset - len(np_prev_strength)),
]
)
self.np_cur_strength = np.concatenate(
[
np.zeros(cf_offset),
np_cur_strength,
np.ones(crossfadeSize - cf_offset - len(np_cur_strength)),
]
)
print(f"Generated Strengths: for prev:{self.np_prev_strength.shape}, for cur:{self.np_cur_strength.shape}") print(
f"Generated Strengths: for prev:{self.np_prev_strength.shape}, for cur:{self.np_cur_strength.shape}"
)
# ひとつ前の結果とサイズが変わるため、記録は消去する。 # ひとつ前の結果とサイズが変わるため、記録は消去する。
if hasattr(self, 'np_prev_audio1') == True: if hasattr(self, "np_prev_audio1") is True:
delattr(self, "np_prev_audio1") delattr(self, "np_prev_audio1")
if hasattr(self, "sola_buffer"): if hasattr(self, "sola_buffer") is True:
del self.sola_buffer del self.sola_buffer
# receivedData: tuple of short # receivedData: tuple of short
def on_request(self, receivedData: AudioInOut) -> tuple[AudioInOut, list[Union[int, float]]]: def on_request(
self, receivedData: AudioInOut
) -> tuple[AudioInOut, list[Union[int, float]]]:
return self.on_request_sola(receivedData) return self.on_request_sola(receivedData)
def on_request_sola(self, receivedData: AudioInOut) -> tuple[AudioInOut, list[Union[int, float]]]: def on_request_sola(
self, receivedData: AudioInOut
) -> tuple[AudioInOut, list[Union[int, float]]]:
try: try:
processing_sampling_rate = self.voiceChanger.get_processing_sampling_rate() processing_sampling_rate = self.voiceChanger.get_processing_sampling_rate()
# 前処理 # 前処理
with Timer("pre-process") as t: with Timer("pre-process") as t:
if self.settings.inputSampleRate != processing_sampling_rate: if self.settings.inputSampleRate != processing_sampling_rate:
newData = cast(AudioInOut, resampy.resample(receivedData, self.settings.inputSampleRate, processing_sampling_rate)) newData = cast(
AudioInOut,
resampy.resample(
receivedData,
self.settings.inputSampleRate,
processing_sampling_rate,
),
)
else: else:
newData = receivedData newData = receivedData
@ -226,7 +243,9 @@ class VoiceChanger():
crossfade_frame = min(self.settings.crossFadeOverlapSize, block_frame) crossfade_frame = min(self.settings.crossFadeOverlapSize, block_frame)
self._generate_strength(crossfade_frame) self._generate_strength(crossfade_frame)
data = self.voiceChanger.generate_input(newData, block_frame, crossfade_frame, sola_search_frame) data = self.voiceChanger.generate_input(
newData, block_frame, crossfade_frame, sola_search_frame
)
preprocess_time = t.secs preprocess_time = t.secs
# 変換処理 # 変換処理
@ -234,15 +253,31 @@ class VoiceChanger():
# Inference # Inference
audio = self.voiceChanger.inference(data) audio = self.voiceChanger.inference(data)
if hasattr(self, 'sola_buffer') == True: if hasattr(self, "sola_buffer") is True:
np.set_printoptions(threshold=10000) np.set_printoptions(threshold=10000)
audio = audio[-sola_search_frame - crossfade_frame - block_frame:] audio_offset = -1 * (
sola_search_frame + crossfade_frame + block_frame
)
audio = audio[audio_offset:]
a = 0
audio = audio[a:]
# SOLA algorithm from https://github.com/yxlllc/DDSP-SVC, https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI # SOLA algorithm from https://github.com/yxlllc/DDSP-SVC, https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI
cor_nom = np.convolve(audio[: crossfade_frame + sola_search_frame], np.flip(self.sola_buffer), 'valid') cor_nom = np.convolve(
cor_den = np.sqrt(np.convolve(audio[: crossfade_frame + sola_search_frame] ** 2, np.ones(crossfade_frame), 'valid') + 1e-3) audio[: crossfade_frame + sola_search_frame],
sola_offset = np.argmax(cor_nom / cor_den) np.flip(self.sola_buffer),
"valid",
output_wav = audio[sola_offset: sola_offset + block_frame].astype(np.float64) )
cor_den = np.sqrt(
np.convolve(
audio[: crossfade_frame + sola_search_frame] ** 2,
np.ones(crossfade_frame),
"valid",
)
+ 1e-3
)
sola_offset = int(np.argmax(cor_nom / cor_den))
sola_end = sola_offset + block_frame
output_wav = audio[sola_offset:sola_end].astype(np.float64)
output_wav[:crossfade_frame] *= self.np_cur_strength output_wav[:crossfade_frame] *= self.np_cur_strength
output_wav[:crossfade_frame] += self.sola_buffer[:] output_wav[:crossfade_frame] += self.sola_buffer[:]
@ -251,11 +286,16 @@ class VoiceChanger():
print("[Voice Changer] no sola buffer. (You can ignore this.)") print("[Voice Changer] no sola buffer. (You can ignore this.)")
result = np.zeros(4096).astype(np.int16) result = np.zeros(4096).astype(np.int16)
if hasattr(self, 'sola_buffer') == True and sola_offset < sola_search_frame: if (
sola_buf_org = audio[- sola_search_frame - crossfade_frame + sola_offset: -sola_search_frame + sola_offset] hasattr(self, "sola_buffer") is True
and sola_offset < sola_search_frame
):
offset = -1 * (sola_search_frame + crossfade_frame - sola_offset)
end = -1 * (sola_search_frame - sola_offset)
sola_buf_org = audio[offset:end]
self.sola_buffer = sola_buf_org * self.np_prev_strength self.sola_buffer = sola_buf_org * self.np_prev_strength
else: else:
self.sola_buffer = audio[- crossfade_frame:] * self.np_prev_strength self.sola_buffer = audio[-crossfade_frame:] * self.np_prev_strength
# self.sola_buffer = audio[- crossfade_frame:] # self.sola_buffer = audio[- crossfade_frame:]
mainprocess_time = t.secs mainprocess_time = t.secs
@ -263,12 +303,20 @@ class VoiceChanger():
with Timer("post-process") as t: with Timer("post-process") as t:
result = result.astype(np.int16) result = result.astype(np.int16)
if self.settings.inputSampleRate != processing_sampling_rate: if self.settings.inputSampleRate != processing_sampling_rate:
outputData = cast(AudioInOut, resampy.resample(result, processing_sampling_rate, self.settings.inputSampleRate).astype(np.int16)) outputData = cast(
AudioInOut,
resampy.resample(
result,
processing_sampling_rate,
self.settings.inputSampleRate,
).astype(np.int16),
)
else: else:
outputData = result outputData = result
print_convert_processing( print_convert_processing(
f" Output data size of {result.shape[0]}/{processing_sampling_rate}hz {outputData.shape[0]}/{self.settings.inputSampleRate}hz") f" Output data size of {result.shape[0]}/{processing_sampling_rate}hz {outputData.shape[0]}/{self.settings.inputSampleRate}hz"
)
if self.settings.recordIO == 1: if self.settings.recordIO == 1:
self.ioRecorder.writeInput(receivedData) self.ioRecorder.writeInput(receivedData)
@ -281,7 +329,9 @@ class VoiceChanger():
# # f" Padded!, Output data size of {result.shape[0]}/{processing_sampling_rate}hz {outputData.shape[0]}/{self.settings.inputSampleRate}hz") # # f" Padded!, Output data size of {result.shape[0]}/{processing_sampling_rate}hz {outputData.shape[0]}/{self.settings.inputSampleRate}hz")
postprocess_time = t.secs postprocess_time = t.secs
print_convert_processing(f" [fin] Input/Output size:{receivedData.shape[0]},{outputData.shape[0]}") print_convert_processing(
f" [fin] Input/Output size:{receivedData.shape[0]},{outputData.shape[0]}"
)
perf = [preprocess_time, mainprocess_time, postprocess_time] perf = [preprocess_time, mainprocess_time, postprocess_time]
return outputData, perf return outputData, perf
@ -299,14 +349,15 @@ class VoiceChanger():
def export2onnx(self): def export2onnx(self):
return self.voiceChanger.export2onnx() return self.voiceChanger.export2onnx()
############## ##############
PRINT_CONVERT_PROCESSING: bool = False PRINT_CONVERT_PROCESSING: bool = False
# PRINT_CONVERT_PROCESSING = True # PRINT_CONVERT_PROCESSING = True
def print_convert_processing(mess: str): def print_convert_processing(mess: str):
if PRINT_CONVERT_PROCESSING == True: if PRINT_CONVERT_PROCESSING is True:
print(mess) print(mess)
@ -318,5 +369,7 @@ def pad_array(arr: AudioInOut, target_length: int):
pad_width = target_length - current_length pad_width = target_length - current_length
pad_left = pad_width // 2 pad_left = pad_width // 2
pad_right = pad_width - pad_left pad_right = pad_width - pad_left
padded_arr = np.pad(arr, (pad_left, pad_right), 'constant', constant_values=(0, 0)) padded_arr = np.pad(
arr, (pad_left, pad_right), "constant", constant_values=(0, 0)
)
return padded_arr return padded_arr

View File

@ -1,17 +1,23 @@
import numpy as np import numpy as np
from voice_changer.VoiceChanger import VoiceChanger from voice_changer.VoiceChanger import VoiceChanger
from const import ModelType from const import ModelType
from voice_changer.utils.LoadModelParams import LoadModelParams
from voice_changer.utils.VoiceChangerModel import AudioInOut
from voice_changer.utils.VoiceChangerParams import VoiceChangerParams
class VoiceChangerManager(): class VoiceChangerManager(object):
_instance = None
voiceChanger: VoiceChanger = None
@classmethod @classmethod
def get_instance(cls, params): def get_instance(cls, params: VoiceChangerParams):
if not hasattr(cls, "_instance"): if cls._instance is None:
cls._instance = cls() cls._instance = cls()
cls._instance.voiceChanger = VoiceChanger(params) cls._instance.voiceChanger = VoiceChanger(params)
return cls._instance return cls._instance
def loadModel(self, props): def loadModel(self, props: LoadModelParams):
info = self.voiceChanger.loadModel(props) info = self.voiceChanger.loadModel(props)
if hasattr(info, "status") and info["status"] == "NG": if hasattr(info, "status") and info["status"] == "NG":
return info return info
@ -20,23 +26,23 @@ class VoiceChangerManager():
return info return info
def get_info(self): def get_info(self):
if hasattr(self, 'voiceChanger'): if hasattr(self, "voiceChanger"):
info = self.voiceChanger.get_info() info = self.voiceChanger.get_info()
info["status"] = "OK" info["status"] = "OK"
return info return info
else: else:
return {"status": "ERROR", "msg": "no model loaded"} return {"status": "ERROR", "msg": "no model loaded"}
def update_settings(self, key: str, val: any): def update_settings(self, key: str, val: str | int | float):
if hasattr(self, 'voiceChanger'): if hasattr(self, "voiceChanger"):
info = self.voiceChanger.update_settings(key, val) info = self.voiceChanger.update_settings(key, val)
info["status"] = "OK" info["status"] = "OK"
return info return info
else: else:
return {"status": "ERROR", "msg": "no model loaded"} return {"status": "ERROR", "msg": "no model loaded"}
def changeVoice(self, receivedData: any): def changeVoice(self, receivedData: AudioInOut):
if hasattr(self, 'voiceChanger') == True: if hasattr(self, "voiceChanger") is True:
return self.voiceChanger.on_request(receivedData) return self.voiceChanger.on_request(receivedData)
else: else:
print("Voice Change is not loaded. Did you load a correct model?") print("Voice Change is not loaded. Did you load a correct model?")

View File

@ -0,0 +1,19 @@
from dataclasses import dataclass
@dataclass
class FilePaths:
configFilename: str | None
pyTorchModelFilename: str | None
onnxModelFilename: str | None
clusterTorchModelFilename: str | None
featureFilename: str | None
indexFilename: str | None
@dataclass
class LoadModelParams:
slot: int
isHalf: bool
files: FilePaths
params: str

View File

@ -1,14 +1,30 @@
from typing import Any, Callable, Protocol, TypeAlias from typing import Any, Protocol, TypeAlias
import numpy as np import numpy as np
from voice_changer.utils.LoadModelParams import LoadModelParams
AudioInOut: TypeAlias = np.ndarray[Any, np.dtype[np.int16]] AudioInOut: TypeAlias = np.ndarray[Any, np.dtype[np.int16]]
class VoiceChangerModel(Protocol): class VoiceChangerModel(Protocol):
loadModel: Callable[..., dict[str, Any]] # loadModel: Callable[..., dict[str, Any]]
def get_processing_sampling_rate(self) -> int: ... def loadModel(self, params: LoadModelParams):
def get_info(self) -> dict[str, Any]: ... ...
def inference(self, data: tuple[Any, ...]) -> Any: ...
def generate_input(self, newData: AudioInOut, inputSize: int, crossfadeSize: int) -> tuple[Any, ...]: ... def get_processing_sampling_rate(self) -> int:
def update_settings(self, key: str, val: Any) -> bool: ... ...
def get_info(self) -> dict[str, Any]:
...
def inference(self, data: tuple[Any, ...]) -> Any:
...
def generate_input(
self, newData: AudioInOut, inputSize: int, crossfadeSize: int
) -> tuple[Any, ...]:
...
def update_settings(self, key: str, val: Any) -> bool:
...

View File

@ -0,0 +1,11 @@
from dataclasses import dataclass
@dataclass
class VoiceChangerParams():
content_vec_500: str
content_vec_500_onnx: str
content_vec_500_onnx_on: bool
hubert_base: str
hubert_soft: str
nsf_hifigan: str

View File

@ -1,7 +1,7 @@
#!/bin/bash #!/bin/bash
set -eu set -eu
DOCKER_IMAGE=dannadori/vcclient:20230420_003000 DOCKER_IMAGE=dannadori/vcclient:20230428_190513
#DOCKER_IMAGE=vcclient #DOCKER_IMAGE=vcclient
### DEFAULT VAR ### ### DEFAULT VAR ###