sensevoice

2024-07-16 14:45:02 +08:00 · 2024-07-16 14:45:02 +08:00 · 21d3cf9f23
commit 21d3cf9f23
parent a1ffeaf0b7
1 changed files with 59 additions and 44 deletions
--- a/README.md
+++ b/README.md
@ -80,10 +80,67 @@ rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/i
 print(rec_result)
 ```

+### 使用funasr推理
+
+支持任意格式音频输入，支持任意时长输入
+
+```python
+from funasr import AutoModel
+from funasr.utils.postprocess_utils import rich_transcription_postprocess
+
+model_dir = "iic/SenseVoiceSmall"
+
+
+model = AutoModel(
+    model=model_dir,
+    vad_model="fsmn-vad",
+    vad_kwargs={"max_single_segment_time": 30000},
+    device="cpu",
+)
+
+# en
+res = model.generate(
+    input=f"{model.model_path}/example/en.mp3",
+    cache={},
+    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
+    use_itn=True,
+    batch_size_s=60,
+    merge_vad=True,  #
+    merge_length_s=15,
+)
+text = rich_transcription_postprocess(res[0]["text"])
+print(text)
+```
+参数说明：
+- `model_dir`：模型名称，或本地磁盘中的模型路径。
+- `max_single_segment_time`: 表示`vad_model`最大切割音频时长, 单位是毫秒ms。
+- `use_itn`：输出结果中是否包含标点与逆文本正则化。
+- `batch_size_s` 表示采用动态batch，batch中总音频时长，单位为秒s。
+- `merge_vad`：是否将 vad 模型切割的短音频碎片合成，合并后长度为`merge_length_s`，单位为秒s。
+
+如果输入均为短音频（小于30s），并且需要批量化推理，为了加快推理效率，可以移除vad模型，并设置`batch_size`
+
+```python
+model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
+
+res = model.generate(
+    input=f"{model.model_path}/example/en.mp3",
+    cache={},
+    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
+    use_itn=True,
+    batch_size=64, 
+)
+```
+
+更多详细用法，请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md)
+
 ### 直接推理

+支持任意格式音频输入，输入音频时长限制在30s以下
+
 ```python
 from model import SenseVoiceSmall
+from funasr.utils.postprocess_utils import rich_transcription_postprocess

 model_dir = "iic/SenseVoiceSmall"
 m, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir)
@ -96,52 +153,10 @@ res = m.inference(
    **kwargs,
 )

-print(res)
+text = rich_transcription_postprocess(res[0]["text"])
+print(text)
 ```

-### 使用funasr推理
-
-```python
-from funasr import AutoModel
-
-model_dir = "iic/SenseVoiceSmall"
-input_file = (
-    "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav"
-)
-
-model = AutoModel(model=model_dir,
-                  vad_model="fsmn-vad",
-                  vad_kwargs={"max_single_segment_time": 30000},
-                  trust_remote_code=True, device="cuda:0")
-
-res = model.generate(
-    input=input_file,
-    cache={},
-    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
-    use_itn=False,
-    batch_size_s=0,
-)
-
-print(res)
-```
-
-funasr版本已经集成了vad模型，支持任意时长音频输入，`batch_size_s`单位为秒。
-如果输入均为短音频，并且需要批量化推理，为了加快推理效率，可以移除vad模型，并设置`batch_size`
-
-```python
-model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
-
-res = model.generate(
-    input=input_file,
-    cache={},
-    language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
-    use_itn=False,
-    batch_size=64,
-)
-```
-
-更多详细用法，请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md)
-
 ## 模型下载