System update meta information

2025-06-26 03:07:25 +00:00 · 2025-06-26 03:07:25 +00:00 · 6f383ea19e
commit 6f383ea19e
parent 5e3cec687a
1 changed files with 47 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,47 @@
+---
+license: apache-2.0
+base_model:
+- Qwen/Qwen3-32B
+base_model_relation: quantized
+library_name: transformers
+tags:
+- Qwen
+- fp4
+---
+## Evaluation
+
+The test results in the following table are based on the MMLU benchmark.
+
+In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain.
+
+In our experiment, **the accuracy of the FP4 quantized version is almost the same as the BF16 version, and it can be used for faster inference.**
+
+| Data Format | MMLU Score |
+|:---|:---|
+| BF16 Official | 88.21 |
+| FP4 Quantized | 87.43 |
+## Quickstart
+We recommend using the Chitu inference framework(https://github.com/thu-pacman/chitu) to run this model.
+Here provides a simple command to show you how to run Qwen3-32B-fp4.
+```bash
+torchrun --nproc_per_node 1 \
+    --master_port=22525 \
+    -m chitu \
+    serve.port=21002 \
+    infer.cache_type=paged \
+    infer.pp_size=1 \
+    infer.tp_size=1 \
+    models=Qwen3-32B-fp4 \
+    models.ckpt_dir="your model path" \
+    models.tokenizer_path="your model path" \
+    dtype=float16 \
+    infer.do_load=True \
+    infer.max_reqs=1 \
+    scheduler.prefill_first.num_tasks=100 \
+    infer.max_seq_len=4096 \
+    request.max_new_tokens=100 \
+    infer.use_cuda_graph=True
+```
+## Contact
+
+solution@qingcheng.ai