README for model-qwen3-32b-fp4
Go to file
2025-09-23 14:02:29 +08:00
.gitattributes solve conflict 2025-09-23 14:02:29 +08:00
added_tokens.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
config.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
configuration.json System init configuration.json 2025-06-26 03:07:25 +00:00
generation_config.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
merges.txt Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model-00001-of-00005.safetensors Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model-00002-of-00005.safetensors Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model-00003-of-00005.safetensors Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model-00004-of-00005.safetensors Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model-00005-of-00005.safetensors Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
model.safetensors.index.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
quant_config.yaml Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
README.md System update meta information 2025-06-26 03:07:25 +00:00
special_tokens_map.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
tokenizer_config.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00
vocab.json Upload to qingcheng-ai/Qwen3-32B-fp4 on ModelScope hub 2025-06-26 03:14:25 +00:00

license base_model base_model_relation library_name tags
apache-2.0
Qwen/Qwen3-32B
quantized transformers
Qwen
fp4

Evaluation

The test results in the following table are based on the MMLU benchmark.

In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain.

In our experiment, the accuracy of the FP4 quantized version is almost the same as the BF16 version, and it can be used for faster inference.

Data Format MMLU Score
BF16 Official 88.21
FP4 Quantized 87.43

Quickstart

We recommend using the Chitu inference framework(https://github.com/thu-pacman/chitu) to run this model. Here provides a simple command to show you how to run Qwen3-32B-fp4.

torchrun --nproc_per_node 1 \
    --master_port=22525 \
    -m chitu \
    serve.port=21002 \
    infer.cache_type=paged \
    infer.pp_size=1 \
    infer.tp_size=1 \
    models=Qwen3-32B-fp4 \
    models.ckpt_dir="your model path" \
    models.tokenizer_path="your model path" \
    dtype=float16 \
    infer.do_load=True \
    infer.max_reqs=1 \
    scheduler.prefill_first.num_tasks=100 \
    infer.max_seq_len=4096 \
    request.max_new_tokens=100 \
    infer.use_cuda_graph=True

Contact

solution@qingcheng.ai