mirror of
https://github.com/w-okada/voice-changer.git
synced 2025-02-02 16:23:58 +03:00
update readme
This commit is contained in:
parent
7d261a14c2
commit
1ad44bf98f
@ -77,7 +77,7 @@ We offer Windows and Mac versions.
|
|||||||
|
|
||||||
- The encoder of DDPS-SVC only supports hubert-soft.
|
- The encoder of DDPS-SVC only supports hubert-soft.
|
||||||
|
|
||||||
- Please refer to [here](tutorials/tutorial_rvc_ja.md) for the description of each item of GUI to be used in RVC.
|
- Please refer to [here](tutorials/tutorial_rvc_en.md) for the description of each item of GUI to be used in RVC.
|
||||||
|
|
||||||
- Download (When you cannot download from google drive, try [hugging_face](https://huggingface.co/wok000/vcclient/tree/main))
|
- Download (When you cannot download from google drive, try [hugging_face](https://huggingface.co/wok000/vcclient/tree/main))
|
||||||
|
|
||||||
|
@ -1,119 +1,146 @@
|
|||||||
Realtime Voice Changer Client for RVC Tutorial (v.1.5.2.4)
|
# Realtime Voice Changer Client for RVC Tutorial (v.1.5.2.4)
|
||||||
================================================================
|
|
||||||
|
|
||||||
# Introduction
|
# Introduction
|
||||||
|
|
||||||
This application is client software for real-time voice conversion that supports various voice conversion models. This document provides a description for voice conversion limited to [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI).
|
This application is client software for real-time voice conversion that supports various voice conversion models. This document provides a description for voice conversion limited to [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI).
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Model training must be done separately.
|
- Model training must be done separately.
|
||||||
- If you want to learn by yourself, please go to [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI).
|
- If you want to learn by yourself, please go to [RVC(Retrieval-based-Voice-Conversion)](https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI).
|
||||||
- [Recording app on Github Pages](https://w-okada.github.io/voice-changer/) is convenient for preparing voice for learning on the browser.
|
- [Recording app on Github Pages](https://w-okada.github.io/voice-changer/) is convenient for preparing voice for learning on the browser.
|
||||||
- [Commentary video] (https://youtu.be/s_GirFEGvaA)
|
- [Commentary video] (https://youtu.be/s_GirFEGvaA)
|
||||||
- [TIPS for training](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/training_tips_en.md) has been published, so please refer to it.
|
- [TIPS for training](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/training_tips_en.md) has been published, so please refer to it.
|
||||||
|
|
||||||
# 起動まで
|
|
||||||
## HuBERTのインストール
|
|
||||||
RVCの実行にはHuBERTが必要です。
|
|
||||||
[このリポジトリ](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main)から`hubert_base.pt`をダウンロードして、バッチファイルがあるフォルダに格納してください。
|
|
||||||
|
|
||||||
# Steps up to startup
|
# Steps up to startup
|
||||||
|
|
||||||
## Installing HuBERT
|
## Installing HuBERT
|
||||||
|
|
||||||
HuBERT is required to run RVC.
|
HuBERT is required to run RVC.
|
||||||
Download `hubert_base.pt` from [this repository](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main) and store it in the folder containing the batch file.
|
Download `hubert_base.pt` from [this repository](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main) and store it in the folder containing the batch file.
|
||||||
|
|
||||||
## Start GUI
|
## Start GUI
|
||||||
|
|
||||||
### Windows version,
|
### Windows version,
|
||||||
|
|
||||||
Unzip the downloaded zip file and run `start_http.bat`.
|
Unzip the downloaded zip file and run `start_http.bat`.
|
||||||
|
|
||||||
### Mac version
|
### Mac version
|
||||||
|
|
||||||
After extracting the download file, execute `startHttp.command`. If it shows that the developer cannot be verified, press the control key again and click to execute (or right-click to execute).
|
After extracting the download file, execute `startHttp.command`. If it shows that the developer cannot be verified, press the control key again and click to execute (or right-click to execute).
|
||||||
|
|
||||||
### Precautions when connecting remotely
|
### Precautions when connecting remotely
|
||||||
|
|
||||||
When connecting remotely, please use `.bat` file (win) and `.command` file (mac) where http is replaced with https.
|
When connecting remotely, please use `.bat` file (win) and `.command` file (mac) where http is replaced with https.
|
||||||
|
|
||||||
## client selection
|
## client selection
|
||||||
|
|
||||||
It is successful if the Launcher screen like the one below appears. Select RVC from this screen.
|
It is successful if the Launcher screen like the one below appears. Select RVC from this screen.
|
||||||
|
|
||||||
<img src="/tutorials/images/launcher.png" alt="launcher" width="800" loading="lazy">
|
<img src="/tutorials/images/launcher.png" alt="launcher" width="800" loading="lazy">
|
||||||
|
|
||||||
## Screen for RVC
|
## Screen for RVC
|
||||||
|
|
||||||
It is successful if the following screen appears.
|
It is successful if the following screen appears.
|
||||||
|
|
||||||
<img src="/tutorials/images/RVC_GUI.png" alt="launcher" width="800" loading="lazy">
|
<img src="/tutorials/images/RVC_GUI.png" alt="launcher" width="800" loading="lazy">
|
||||||
|
|
||||||
## GUI item details
|
## GUI item details
|
||||||
|
|
||||||
## server control
|
## server control
|
||||||
|
|
||||||
### start
|
### start
|
||||||
|
|
||||||
`start` starts the server, `stop` stops the server
|
`start` starts the server, `stop` stops the server
|
||||||
|
|
||||||
### monitor
|
### monitor
|
||||||
|
|
||||||
Indicates the status of real-time conversion.
|
Indicates the status of real-time conversion.
|
||||||
|
|
||||||
The lag from voicing to conversion is `buf + res seconds`. Adjust so that the buf time is longer than res.
|
The lag from voicing to conversion is `buf + res seconds`. Adjust so that the buf time is longer than res.
|
||||||
|
|
||||||
#### vol
|
#### vol
|
||||||
|
|
||||||
This is the volume after voice conversion.
|
This is the volume after voice conversion.
|
||||||
|
|
||||||
#### buf
|
#### buf
|
||||||
|
|
||||||
It is the length (ms) of one section to cut the audio. Shortening the Input Chunk reduces this number.
|
It is the length (ms) of one section to cut the audio. Shortening the Input Chunk reduces this number.
|
||||||
|
|
||||||
#### res
|
#### res
|
||||||
|
|
||||||
This is the time it takes to convert data that is the sum of Input Chunk and Extra Data Length. Shortening both Input Chunk and Extra Data Length will reduce the number.
|
This is the time it takes to convert data that is the sum of Input Chunk and Extra Data Length. Shortening both Input Chunk and Extra Data Length will reduce the number.
|
||||||
|
|
||||||
### Model Info
|
### Model Info
|
||||||
|
|
||||||
Get information held by the server. If information synchronization between server and client seems not to be successful, please press the Reload button.
|
Get information held by the server. If information synchronization between server and client seems not to be successful, please press the Reload button.
|
||||||
|
|
||||||
### Switch Model
|
### Switch Model
|
||||||
|
|
||||||
You can switch between uploaded models.
|
You can switch between uploaded models.
|
||||||
|
|
||||||
## Model Setting
|
## Model Setting
|
||||||
|
|
||||||
### Model Uploader
|
### Model Uploader
|
||||||
|
|
||||||
If enable PyTorch is turned on, you can select the PyTorch model (extension is pth). If you turn this on when using a model converted from RVC, the PyTorch item will appear. (From the next version, you can only choose either PyTorch or ONNX for each slot.
|
If enable PyTorch is turned on, you can select the PyTorch model (extension is pth). If you turn this on when using a model converted from RVC, the PyTorch item will appear. (From the next version, you can only choose either PyTorch or ONNX for each slot.
|
||||||
|
|
||||||
#### Model Slot
|
#### Model Slot
|
||||||
|
|
||||||
You can choose which frame to set the model in. The set model can be switched with Switch Model in Server Control.
|
You can choose which frame to set the model in. The set model can be switched with Switch Model in Server Control.
|
||||||
|
|
||||||
#### Onnx(.onnx)
|
#### Onnx(.onnx)
|
||||||
|
|
||||||
Specify the model in .onnx format here. This or PyTorch (.pth) is required.
|
Specify the model in .onnx format here. This or PyTorch (.pth) is required.
|
||||||
|
|
||||||
#### PyTorch(.pth)
|
#### PyTorch(.pth)
|
||||||
|
|
||||||
Specify the model in .pth format here. This or Onnx (.onnx) is required.
|
Specify the model in .pth format here. This or Onnx (.onnx) is required.
|
||||||
If you trained with RVC-WebUI, it's in `/logs/weights`.
|
If you trained with RVC-WebUI, it's in `/logs/weights`.
|
||||||
|
|
||||||
#### feature(.npy)
|
#### feature(.npy)
|
||||||
|
|
||||||
This is an additional function that brings the features extracted by HuBERT closer to the training data. Used in pairs with index(.index).
|
This is an additional function that brings the features extracted by HuBERT closer to the training data. Used in pairs with index(.index).
|
||||||
If you trained with RVC-WebUI, it is saved as `/logs/weights/total_fea.npy`.
|
If you trained with RVC-WebUI, it is saved as `/logs/weights/total_fea.npy`.
|
||||||
|
|
||||||
#### index(.index)
|
#### index(.index)
|
||||||
|
|
||||||
This is an additional function that brings the features extracted by HuBERT closer to the training data. Used in pairs with feature(.npy).
|
This is an additional function that brings the features extracted by HuBERT closer to the training data. Used in pairs with feature(.npy).
|
||||||
If you trained with RVC-WebUI, it is saved as `/logs/weights/add_XXX.index`.
|
If you trained with RVC-WebUI, it is saved as `/logs/weights/add_XXX.index`.
|
||||||
|
|
||||||
#### half-precision
|
#### half-precision
|
||||||
|
|
||||||
You can choose to infer precision as float32 or float16.
|
You can choose to infer precision as float32 or float16.
|
||||||
This selection can be speeded up at the expense of accuracy.
|
This selection can be speeded up at the expense of accuracy.
|
||||||
Turn it off if it doesn't work.
|
Turn it off if it doesn't work.
|
||||||
|
|
||||||
#### Default Tune
|
#### Default Tune
|
||||||
|
|
||||||
Enter the default value for how much the pitch of the voice should be converted. You can also convert during inference. Below is a guideline for the settings.
|
Enter the default value for how much the pitch of the voice should be converted. You can also convert during inference. Below is a guideline for the settings.
|
||||||
|
|
||||||
- +12 for male voice to female voice conversion
|
- +12 for male voice to female voice conversion
|
||||||
- -12 for female voice to male voice conversion
|
- -12 for female voice to male voice conversion
|
||||||
|
|
||||||
#### upload
|
#### upload
|
||||||
|
|
||||||
After setting the above items, press to make the model ready for use.
|
After setting the above items, press to make the model ready for use.
|
||||||
|
|
||||||
#### Framework
|
#### Framework
|
||||||
|
|
||||||
Choose which of the uploaded model files to use (PyTorch or ONNX). It will be gone in the next version.
|
Choose which of the uploaded model files to use (PyTorch or ONNX). It will be gone in the next version.
|
||||||
|
|
||||||
## Device Setting
|
## Device Setting
|
||||||
|
|
||||||
### Audio Input
|
### Audio Input
|
||||||
|
|
||||||
Choose an input device
|
Choose an input device
|
||||||
|
|
||||||
### Audio Output
|
### Audio Output
|
||||||
|
|
||||||
Choose an output terminal
|
Choose an output terminal
|
||||||
|
|
||||||
#### output record
|
#### output record
|
||||||
|
|
||||||
Audio is recorded from when you press start until you press stop.
|
Audio is recorded from when you press start until you press stop.
|
||||||
Pressing this button does not start real-time conversion.
|
Pressing this button does not start real-time conversion.
|
||||||
Press Server Control for real-time conversion
|
Press Server Control for real-time conversion
|
||||||
@ -121,50 +148,62 @@ Press Server Control for real-time conversion
|
|||||||
## Quality Control
|
## Quality Control
|
||||||
|
|
||||||
### Noise Supression
|
### Noise Supression
|
||||||
|
|
||||||
On/Off of the browser's built-in noise removal function.
|
On/Off of the browser's built-in noise removal function.
|
||||||
|
|
||||||
### Gain Control
|
### Gain Control
|
||||||
|
|
||||||
- input: Increase or decrease the volume of the input audio to the model. 1 is the default value
|
- input: Increase or decrease the volume of the input audio to the model. 1 is the default value
|
||||||
- output: Increase or decrease the volume of the output audio from the model. 1 is the default value
|
- output: Increase or decrease the volume of the output audio from the model. 1 is the default value
|
||||||
|
|
||||||
### F0Detector
|
### F0Detector
|
||||||
|
|
||||||
Choose an algorithm for extracting the pitch. You can choose from the following two types.
|
Choose an algorithm for extracting the pitch. You can choose from the following two types.
|
||||||
|
|
||||||
- Lightweight `pm`
|
- Lightweight `pm`
|
||||||
- Highly accurate `harvest`
|
- Highly accurate `harvest`
|
||||||
|
|
||||||
### Analyzer(Experimental)
|
### Analyzer(Experimental)
|
||||||
|
|
||||||
Record input and output on the server side.
|
Record input and output on the server side.
|
||||||
As for the input, the sound of the microphone is sent to the server and recorded as it is. It can be used to check the communication path from the microphone to the server.
|
As for the input, the sound of the microphone is sent to the server and recorded as it is. It can be used to check the communication path from the microphone to the server.
|
||||||
For output, the data output from the model is recorded in the server. You can see how the model behaves (once you've verified that your input is correct).
|
For output, the data output from the model is recorded in the server. You can see how the model behaves (once you've verified that your input is correct).
|
||||||
|
|
||||||
|
|
||||||
## Speaker Setting
|
## Speaker Setting
|
||||||
|
|
||||||
### Destination Speaker Id
|
### Destination Speaker Id
|
||||||
|
|
||||||
It seems to be a setting when supporting multiple speakers, but it is not used at present because the RVC head office does not support it (it is unlikely).
|
It seems to be a setting when supporting multiple speakers, but it is not used at present because the RVC head office does not support it (it is unlikely).
|
||||||
|
|
||||||
### Tuning
|
### Tuning
|
||||||
|
|
||||||
Adjust the pitch of your voice. Below is a guideline for the settings.
|
Adjust the pitch of your voice. Below is a guideline for the settings.
|
||||||
|
|
||||||
- +12 for male voice to female voice conversion
|
- +12 for male voice to female voice conversion
|
||||||
- -12 for female voice to male voice conversion
|
- -12 for female voice to male voice conversion
|
||||||
|
|
||||||
### index ratio
|
### index ratio
|
||||||
|
|
||||||
Specify the ratio to shift to the features used in training. Effective when both feature and index are set in Model Setting.
|
Specify the ratio to shift to the features used in training. Effective when both feature and index are set in Model Setting.
|
||||||
0 uses the output of HuBERT as it is, 1 brings it all back to the original features.
|
0 uses the output of HuBERT as it is, 1 brings it all back to the original features.
|
||||||
If the index ratio is greater than 0, the search may take a long time.
|
If the index ratio is greater than 0, the search may take a long time.
|
||||||
|
|
||||||
### Silent Threshold
|
### Silent Threshold
|
||||||
|
|
||||||
The volume threshold for audio conversion. If the rms is smaller than this value, no voice conversion is performed and silence is returned.
|
The volume threshold for audio conversion. If the rms is smaller than this value, no voice conversion is performed and silence is returned.
|
||||||
(In this case, the conversion process is skipped, so the load is less.)
|
(In this case, the conversion process is skipped, so the load is less.)
|
||||||
|
|
||||||
## Converter Setting
|
## Converter Setting
|
||||||
|
|
||||||
### InputChunk Num(128sample / chunk)
|
### InputChunk Num(128sample / chunk)
|
||||||
|
|
||||||
Decide how much length to cut and convert in one conversion. The higher the value, the more efficient the conversion, but the larger the buf value, the longer the maximum time before the conversion starts. The approximate time is displayed in buff:.
|
Decide how much length to cut and convert in one conversion. The higher the value, the more efficient the conversion, but the larger the buf value, the longer the maximum time before the conversion starts. The approximate time is displayed in buff:.
|
||||||
|
|
||||||
### Extra Data Length
|
### Extra Data Length
|
||||||
|
|
||||||
Determines how much past audio to include in the input when converting audio. The longer the past voice is, the better the accuracy of the conversion, but the longer the res is, the longer the calculation takes.
|
Determines how much past audio to include in the input when converting audio. The longer the past voice is, the better the accuracy of the conversion, but the longer the res is, the longer the calculation takes.
|
||||||
(Probably because Transformer is a bottleneck, the calculation time will increase by the square of this length)
|
(Probably because Transformer is a bottleneck, the calculation time will increase by the square of this length)
|
||||||
|
|
||||||
### GPU
|
### GPU
|
||||||
|
|
||||||
If you have 2 or more GPUs, you can choose your GPU here.
|
If you have 2 or more GPUs, you can choose your GPU here.
|
||||||
|
Loading…
Reference in New Issue
Block a user