diff --git a/README.md b/README.md index 179c613..3def17a 100644 --- a/README.md +++ b/README.md @@ -80,10 +80,67 @@ rec_result = inference_pipeline('https://isv-data.oss-cn-hangzhou.aliyuncs.com/i print(rec_result) ``` +### 使用funasr推理 + +支持任意格式音频输入,支持任意时长输入 + +```python +from funasr import AutoModel +from funasr.utils.postprocess_utils import rich_transcription_postprocess + +model_dir = "iic/SenseVoiceSmall" + + +model = AutoModel( + model=model_dir, + vad_model="fsmn-vad", + vad_kwargs={"max_single_segment_time": 30000}, + device="cpu", +) + +# en +res = model.generate( + input=f"{model.model_path}/example/en.mp3", + cache={}, + language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" + use_itn=True, + batch_size_s=60, + merge_vad=True, # + merge_length_s=15, +) +text = rich_transcription_postprocess(res[0]["text"]) +print(text) +``` +参数说明: +- `model_dir`:模型名称,或本地磁盘中的模型路径。 +- `max_single_segment_time`: 表示`vad_model`最大切割音频时长, 单位是毫秒ms。 +- `use_itn`:输出结果中是否包含标点与逆文本正则化。 +- `batch_size_s` 表示采用动态batch,batch中总音频时长,单位为秒s。 +- `merge_vad`:是否将 vad 模型切割的短音频碎片合成,合并后长度为`merge_length_s`,单位为秒s。 + +如果输入均为短音频(小于30s),并且需要批量化推理,为了加快推理效率,可以移除vad模型,并设置`batch_size` + +```python +model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0") + +res = model.generate( + input=f"{model.model_path}/example/en.mp3", + cache={}, + language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" + use_itn=True, + batch_size=64, +) +``` + +更多详细用法,请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md) + ### 直接推理 +支持任意格式音频输入,输入音频时长限制在30s以下 + ```python from model import SenseVoiceSmall +from funasr.utils.postprocess_utils import rich_transcription_postprocess model_dir = "iic/SenseVoiceSmall" m, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir) @@ -96,52 +153,10 @@ res = m.inference( **kwargs, ) -print(res) +text = rich_transcription_postprocess(res[0]["text"]) +print(text) ``` -### 使用funasr推理 - -```python -from funasr import AutoModel - -model_dir = "iic/SenseVoiceSmall" -input_file = ( - "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav" -) - -model = AutoModel(model=model_dir, - vad_model="fsmn-vad", - vad_kwargs={"max_single_segment_time": 30000}, - trust_remote_code=True, device="cuda:0") - -res = model.generate( - input=input_file, - cache={}, - language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" - use_itn=False, - batch_size_s=0, -) - -print(res) -``` - -funasr版本已经集成了vad模型,支持任意时长音频输入,`batch_size_s`单位为秒。 -如果输入均为短音频,并且需要批量化推理,为了加快推理效率,可以移除vad模型,并设置`batch_size` - -```python -model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0") - -res = model.generate( - input=input_file, - cache={}, - language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" - use_itn=False, - batch_size=64, -) -``` - -更多详细用法,请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md) - ## 模型下载