diff --git a/README.md b/README.md index 765a466..3763ad6 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,85 @@ tasks: | [**模型库**](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo) | [**联系我们**](https://github.com/alibaba-damo-academy/FunASR#contact) +# 模型结构图 +SenseVoice多语言音频理解模型,支持语音识别、语种识别、语音情感识别、声学事件检测、逆文本正则化等能力,采用工业级数十万小时的标注音频进行模型训练,保证了模型的通用识别效果。模型可以被应用于中文、粤语、英语、日语、韩语音频识别,并输出带有情感和事件的富文本转写结果。 + +

+SenseVoice模型结构 + + + # 用法 + +# 用法 + +## 推理 + +### 直接推理 + +```python +from model import SenseVoiceSmall + +model_dir = "iic/SenseVoiceSmall" +m, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir) + + +res = m.inference( + data_in="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav", + language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" + use_itn=False, + **kwargs, +) + +print(res) +``` + +### 使用funasr推理 + +```python +from funasr import AutoModel + +model_dir = "iic/SenseVoiceSmall" +input_file = ( + "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav" +) + +model = AutoModel(model=model_dir, + vad_model="fsmn-vad", + vad_kwargs={"max_single_segment_time": 30000}, + trust_remote_code=True, device="cuda:0") + +res = model.generate( + input=input_file, + cache={}, + language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" + use_itn=False, + batch_size_s=0, +) + +print(res) +``` + +funasr版本已经集成了vad模型,支持任意时长音频输入,`batch_size_s`单位为秒。 +如果输入均为短音频,并且需要批量化推理,为了加快推理效率,可以移除vad模型,并设置`batch_size` + +```python +model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0") + +res = model.generate( + input=input_file, + cache={}, + language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech" + use_itn=False, + batch_size=64, +) +``` + +更多详细用法,请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md) + +## 模型下载 + + SDK下载 ```bash #安装ModelScope @@ -73,4 +151,9 @@ Git下载 git clone https://www.modelscope.cn/iic/SenseVoiceSmall.git ``` +## 服务部署 + +Undo + +

如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。

diff --git a/fig/sensevoice.png b/fig/sensevoice.png new file mode 100644 index 0000000..8b8786b Binary files /dev/null and b/fig/sensevoice.png differ