model-qwen3-32b-fp4

README for model-qwen3-32b-fp4

Go to file

zhufq fa99ee240c solve conflict		2025-09-23 14:02:29 +08:00
.gitattributes	solve conflict	2025-09-23 14:02:29 +08:00
added_tokens.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
config.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
configuration.json	System init configuration.json	2025-06-26 03:07:25 +00:00
generation_config.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
merges.txt	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model-00001-of-00005.safetensors	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model-00002-of-00005.safetensors	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model-00003-of-00005.safetensors	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model-00004-of-00005.safetensors	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model-00005-of-00005.safetensors	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
model.safetensors.index.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
quant_config.yaml	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
README.md	System update meta information	2025-06-26 03:07:25 +00:00
special_tokens_map.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
tokenizer_config.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00
vocab.json	Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub	2025-06-26 03:14:25 +00:00

README.md

license

base_model

base_model_relation

library_name

Evaluation

The test results in the following table are based on the MMLU benchmark.

In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain.

In our experiment, the accuracy of the FP4 quantized version is almost the same as the BF16 version, and it can be used for faster inference.

Data Format	MMLU Score
BF16 Official	88.21
FP4 Quantized	87.43

Quickstart

We recommend using the Chitu inference framework(https://github.com/thu-pacman/chitu) to run this model. Here provides a simple command to show you how to run Qwen3-32B-fp4.

torchrun --nproc_per_node 1 \
    --master_port=22525 \
    -m chitu \
    serve.port=21002 \
    infer.cache_type=paged \
    infer.pp_size=1 \
    infer.tp_size=1 \
    models=Qwen3-32B-fp4 \
    models.ckpt_dir="your model path" \
    models.tokenizer_path="your model path" \
    dtype=float16 \
    infer.do_load=True \
    infer.max_reqs=1 \
    scheduler.prefill_first.num_tasks=100 \
    infer.max_seq_len=4096 \
    request.max_new_tokens=100 \
    infer.use_cuda_graph=True

Contact

solution@qingcheng.ai