System update meta information
This commit is contained in:
		
							parent
							
								
									5e3cec687a
								
							
						
					
					
						commit
						6f383ea19e
					
				
							
								
								
									
										47
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										47
									
								
								README.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,47 @@ | ||||
| --- | ||||
| license: apache-2.0 | ||||
| base_model: | ||||
| - Qwen/Qwen3-32B | ||||
| base_model_relation: quantized | ||||
| library_name: transformers | ||||
| tags: | ||||
| - Qwen | ||||
| - fp4 | ||||
| --- | ||||
| ## Evaluation | ||||
| 
 | ||||
| The test results in the following table are based on the MMLU benchmark. | ||||
| 
 | ||||
| In order to speed up the test, we prevent the model from generating too long thought chains, so the score may be different from that with longer thought chain. | ||||
| 
 | ||||
| In our experiment, **the accuracy of the FP4 quantized version is almost the same as the BF16 version, and it can be used for faster inference.** | ||||
| 
 | ||||
| | Data Format | MMLU Score | | ||||
| |:---|:---| | ||||
| | BF16 Official | 88.21 | | ||||
| | FP4 Quantized | 87.43 | | ||||
| ## Quickstart | ||||
| We recommend using the Chitu inference framework(https://github.com/thu-pacman/chitu) to run this model. | ||||
| Here provides a simple command to show you how to run Qwen3-32B-fp4. | ||||
| ```bash | ||||
| torchrun --nproc_per_node 1 \ | ||||
|     --master_port=22525 \ | ||||
|     -m chitu \ | ||||
|     serve.port=21002 \ | ||||
|     infer.cache_type=paged \ | ||||
|     infer.pp_size=1 \ | ||||
|     infer.tp_size=1 \ | ||||
|     models=Qwen3-32B-fp4 \ | ||||
|     models.ckpt_dir="your model path" \ | ||||
|     models.tokenizer_path="your model path" \ | ||||
|     dtype=float16 \ | ||||
|     infer.do_load=True \ | ||||
|     infer.max_reqs=1 \ | ||||
|     scheduler.prefill_first.num_tasks=100 \ | ||||
|     infer.max_seq_len=4096 \ | ||||
|     request.max_new_tokens=100 \ | ||||
|     infer.use_cuda_graph=True | ||||
| ``` | ||||
| ## Contact | ||||
| 
 | ||||
| solution@qingcheng.ai | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 llaama
						llaama