diff --git a/README.md b/README.md index 765a466..3763ad6 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,85 @@ tasks: | [**模型库**](https://github.com/alibaba-damo-academy/FunASR/tree/main/model_zoo) | [**联系我们**](https://github.com/alibaba-damo-academy/FunASR#contact) +# 模型结构图 +SenseVoice多语言音频理解模型,支持语音识别、语种识别、语音情感识别、声学事件检测、逆文本正则化等能力,采用工业级数十万小时的标注音频进行模型训练,保证了模型的通用识别效果。模型可以被应用于中文、粤语、英语、日语、韩语音频识别,并输出带有情感和事件的富文本转写结果。 + +
+
+
+
+
# 用法
+
+# 用法
+
+## 推理
+
+### 直接推理
+
+```python
+from model import SenseVoiceSmall
+
+model_dir = "iic/SenseVoiceSmall"
+m, kwargs = SenseVoiceSmall.from_pretrained(model=model_dir)
+
+
+res = m.inference(
+ data_in="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
+ language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
+ use_itn=False,
+ **kwargs,
+)
+
+print(res)
+```
+
+### 使用funasr推理
+
+```python
+from funasr import AutoModel
+
+model_dir = "iic/SenseVoiceSmall"
+input_file = (
+ "https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav"
+)
+
+model = AutoModel(model=model_dir,
+ vad_model="fsmn-vad",
+ vad_kwargs={"max_single_segment_time": 30000},
+ trust_remote_code=True, device="cuda:0")
+
+res = model.generate(
+ input=input_file,
+ cache={},
+ language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
+ use_itn=False,
+ batch_size_s=0,
+)
+
+print(res)
+```
+
+funasr版本已经集成了vad模型,支持任意时长音频输入,`batch_size_s`单位为秒。
+如果输入均为短音频,并且需要批量化推理,为了加快推理效率,可以移除vad模型,并设置`batch_size`
+
+```python
+model = AutoModel(model=model_dir, trust_remote_code=True, device="cuda:0")
+
+res = model.generate(
+ input=input_file,
+ cache={},
+ language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
+ use_itn=False,
+ batch_size=64,
+)
+```
+
+更多详细用法,请参考 [文档](https://github.com/modelscope/FunASR/blob/main/docs/tutorial/README.md)
+
+## 模型下载
+
+
SDK下载
```bash
#安装ModelScope
@@ -73,4 +151,9 @@ Git下载
git clone https://www.modelscope.cn/iic/SenseVoiceSmall.git
```
+## 服务部署
+
+Undo
+
+
如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。
diff --git a/fig/sensevoice.png b/fig/sensevoice.png new file mode 100644 index 0000000..8b8786b Binary files /dev/null and b/fig/sensevoice.png differ