sensevoice

2024-07-16 14:45:02 +08:00 · 2024-07-16 14:45:02 +08:00 · 21d3cf9f23
commit 21d3cf9f23
parent a1ffeaf0b7
1 changed files with 59 additions and 44 deletions
--- a/README.md
+++ b/README.md
@ -80,10 +80,67 @@ rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/i
 print(rec_result)
 ```
 ### 使用funasr推理
 支持任意格式音频输入，支持任意时长输入
 ```python
 from funasr import AutoModel
 from funasr.utils.postprocess_utils import rich_transcription_postprocess
 model_dir = "iic/SenseVoiceSmall"
 model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cpu",
 )
 # en
 res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
 )
 text = rich_transcription_postprocess(res[0]["text"])
 print(text)
 ```
 参数说明：
 - `model_dir`：模型名称，或本地磁盘中的模型路径。
 - `max_single_segment_time`: 表示`vad_model`最大切割音频时长, 单位是毫秒ms。
 - `use_itn`：输出结果中是否包含标点与逆文本正则化。
 - `batch_size_s` 表示采用动态batch，batch中总音频时长，单位为秒s。
 - `merge_vad`：是否将 vad 模型切割的短音频碎片合成，合并后长度为`merge_length_s`，单位为秒s。
 如果输入均为短音频（小于30s），并且需要批量化推理，为了加快推理效率，可以移除vad模型，并设置`batch_size`
 ```python
 model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
 res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size=64, 
 )
 ```
 更多详细用法，请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md)
 ### 直接推理
 支持任意格式音频输入，输入音频时长限制在30s以下
 ```python
 from model import SenseVoiceSmall
 from funasr.utils.postprocess_utils import rich_transcription_postprocess
 model_dir = "iic/SenseVoiceSmall"
 m, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir)
@ -96,52 +153,10 @@ res = m.inference(
    **kwargs,
 )
-print(res)
+text = rich_transcription_postprocess(res[0]["text"])
 print(text)
 ```
 ### 使用funasr推理
 ```python
 from funasr import AutoModel
 model_dir = "iic/SenseVoiceSmall"
 input_file = (
    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav"
 )
 model = AutoModel(model=model_dir,
                  vad_model="fsmn-vad",
                  vad_kwargs={"max_single_segment_time": 30000},
                  trust_remote_code=True, device="cuda:0")
 res = model.generate(
    input=input_file,
    cache={},
    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=False,
    batch_size_s=0,
 )
 print(res)
 ```
 funasr版本已经集成了vad模型，支持任意时长音频输入，`batch_size_s`单位为秒。
 如果输入均为短音频，并且需要批量化推理，为了加快推理效率，可以移除vad模型，并设置`batch_size`
 ```python
 model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
 res = model.generate(
    input=input_file,
    cache={},
    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=False,
    batch_size=64,
 )
 ```
 更多详细用法，请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md)
 ## 模型下载