Upload folder using ModelScope SDK

This commit is contained in:
Cherrytest 2025-06-04 06:15:51 +00:00
parent 7330a56b96
commit c0cad47273
16 changed files with 152283 additions and 38 deletions

4
.gitattributes vendored
View File

@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

208
README.md
View File

@ -1,47 +1,181 @@
---
license: Apache License 2.0
# Qwen3-Embedding-8B
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
<p align="center">
<img src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/logo_qwen3.png" width="400"/>
<p>
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
## Highlights
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
The Qwen3 Embedding series model is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
**Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of May 26, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios.
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
**Multilingual Capability**: The Qwen3 Embedding series support over 100 languages, including various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Model Overview
**Qwen3-Embedding-8B** has the following features:
- Model Type: Text Embedding
- Supported Languages: 100+ Languages
- Number of Paramaters: 8B
- Context Length: 32k
- Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 32 to 4096
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-Embedding/), [GitHub](https://github.com/QwenLM/Qwen3-Embedding).
## Qwen3 Embedding Series Model list
| Model Type | Models | Size | Layers | Sequence Length | Embedding Dimension | MRL Support | Instruct Aware |
|------------------|----------------------|------|--------|-----------------|---------------------|-------------|----------------|
| Text Embedding | [Qwen3-Embedding-0.6B](https://modelscope.cn/models/tongyi/Qwen3-Embedding-0.6B) | 0.6B | 28 | 32K | 1024 | Yes | Yes |
| Text Embedding | [Qwen3-Embedding-4B](https://modelscope.cn/models/tongyi/Qwen3-Embedding-4B) | 4B | 36 | 32K | 2560 | Yes | Yes |
| Text Embedding | [Qwen3-Embedding-8B](https://modelscope.cn/models/tongyi/Qwen3-Embedding-8B) | 8B | 36 | 32K | 4096 | Yes | Yes |
| Text Reranking | [Qwen3-Reranker-0.6B](https://modelscope.cn/models/tongyi/Qwen3-Reranker-0.6B) | 0.6B | 28 | 32K | - | - | Yes |
| Text Reranking | [Qwen3-Reranker-4B](https://modelscope.cn/models/tongyi/Qwen3-Reranker-4B) | 4B | 36 | 32K | - | - | Yes |
| Text Reranking | [Qwen3-Reranker-8B](https://modelscope.cn/models/tongyi/Qwen3-Reranker-8B) | 8B | 36 | 32K | - | - | Yes |
> **Note**:: `MRL Support` indicates whether the embedding model supports custom dimensions for the final embedding. `Instruct Aware` notes whether the embedding or reranking model supports customizing the input instruction according to different tasks.
## Usage
With Transformers versions earlier than 4.51.0, you may encounter the following error:
```
KeyError: 'qwen3'
```
### Transformers Usage
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('Qwen/Qwen3-Embedding-8B')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/Qwen/Qwen3-Embedding-8B.git
```
# Requires transformers>=4.51.0
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
import torch
import torch.nn.functional as F
from torch import Tensor
from modelscope import AutoTokenizer, AutoModel
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery:{query}'
def tokenize(tokenizer, input_texts, eod_id, max_length):
batch_dict = tokenizer(input_texts, padding=False, truncation=True, max_length=max_length-2)
for seq, att in zip(batch_dict["input_ids"], batch_dict["attention_mask"]):
seq.append(eod_id)
att.append(1)
batch_dict = tokenizer.pad(batch_dict, padding=True, return_tensors="pt")
return batch_dict
# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = [
get_detailed_instruct(task, 'What is the capital of China?'),
get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents
tokenizer = AutoTokenizer.from_pretrained('tongyi/Qwen3-Embedding-8B', padding_side='left')
model = AutoModel.from_pretrained('tongyi/Qwen3-Embedding-8B')
# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('tongyi/Qwen3-Embedding-8B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()
eod_id = tokenizer.convert_tokens_to_ids("<|endoftext|>")
max_length = 8192
# Tokenize the input texts
batch_dict = tokenize(tokenizer, input_texts, eod_id, max_length)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
```
📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
## Evaluation
### MTEB (Multilingual)
| Model | Size | Mean (Task) | Mean (Type) | Bitxt Mining | Class. | Clust. | Inst. Retri. | Multi. Class. | Pair. Class. | Rerank | Retri. | STS |
|----------------------------------|:-------:|:-------------:|:-------------:|:--------------:|:--------:|:--------:|:--------------:|:---------------:|:--------------:|:--------:|:--------:|:------:|
| NV-Embed-v2 | 7B | 56.29 | 49.58 | 57.84 | 57.29 | 40.80 | 1.04 | 18.63 | 78.94 | 63.82 | 56.72 | 71.10|
| GritLM-7B | 7B | 60.92 | 53.74 | 70.53 | 61.83 | 49.75 | 3.45 | 22.77 | 79.94 | 63.78 | 58.31 | 73.33|
| BGE-M3 | 0.6B | 59.56 | 52.18 | 79.11 | 60.35 | 40.88 | -3.11 | 20.1 | 80.76 | 62.79 | 54.60 | 74.12|
| multilingual-e5-large-instruct | 0.6B | 63.22 | 55.08 | 80.13 | 64.94 | 50.75 | -0.40 | 22.91 | 80.86 | 62.61 | 57.12 | 76.81|
| gte-Qwen2-1.5B-instruct | 1.5B | 59.45 | 52.69 | 62.51 | 58.32 | 52.05 | 0.74 | 24.02 | 81.58 | 62.58 | 60.78 | 71.61|
| gte-Qwen2-7b-Instruct | 7B | 62.51 | 55.93 | 73.92 | 61.55 | 52.77 | 4.94 | 25.48 | 85.13 | 65.55 | 60.08 | 73.98|
| text-embedding-3-large | - | 58.93 | 51.41 | 62.17 | 60.27 | 46.89 | -2.68 | 22.03 | 79.17 | 63.89 | 59.27 | 71.68|
| Cohere-embed-multilingual-v3.0 | - | 61.12 | 53.23 | 70.50 | 62.95 | 46.89 | -1.89 | 22.74 | 79.88 | 64.07 | 59.16 | 74.80|
| gemini-embedding-exp-03-07 | - | 68.37 | 59.59 | 79.28 | 71.82 | 54.59 | 5.18 | **29.16** | 83.63 | 65.58 | 67.71 | 79.40|
| **Qwen3-Embedding-0.6B** | 0.6B | 64.33 | 56.00 | 72.22 | 66.83 | 52.33 | 5.09 | 24.59 | 80.83 | 61.41 | 64.64 | 76.17|
| **Qwen3-Embedding-4B** | 4B | 69.45 | 60.86 | 79.36 | 72.33 | 57.15 | **11.56** | 26.77 | 85.05 | 65.08 | 69.60 | 80.86|
| **Qwen3-Embedding-8B** | 8B | **70.58** | **61.69** | **80.89** | **74.00** | **57.65** | 10.06 | 28.66 | **86.40** | **65.63** | **70.88** | **81.08** |
> **Note**: For compared models, the scores are retrieved from MTEB online [leaderboard](https://huggingface.co/spaces/mteb/leaderboard) on May 24th, 2025.
### MTEB (Eng v2)
| MTEB English / Models | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retri. | STS | Summ. |
|--------------------------------|:--------:|:------------:|:------------:|:--------:|:--------:|:-------------:|:---------:|:--------:|:-------:|:-------:|
| multilingual-e5-large-instruct | 0.6B | 65.53 | 61.21 | 75.54 | 49.89 | 86.24 | 48.74 | 53.47 | 84.72 | 29.89 |
| NV-Embed-v2 | 7.8B | 69.81 | 65.00 | 87.19 | 47.66 | 88.69 | 49.61 | 62.84 | 83.82 | 35.21 |
| GritLM-7B | 7.2B | 67.07 | 63.22 | 81.25 | 50.82 | 87.29 | 49.59 | 54.95 | 83.03 | 35.65 |
| gte-Qwen2-1.5B-instruct | 1.5B | 67.20 | 63.26 | 85.84 | 53.54 | 87.52 | 49.25 | 50.25 | 82.51 | 33.94 |
| stella_en_1.5B_v5 | 1.5B | 69.43 | 65.32 | 89.38 | 57.06 | 88.02 | 50.19 | 52.42 | 83.27 | 36.91 |
| gte-Qwen2-7B-instruct | 7.6B | 70.72 | 65.77 | 88.52 | 58.97 | 85.9 | 50.47 | 58.09 | 82.69 | 35.74 |
| gemini-embedding-exp-03-07 | - | 73.3 | 67.67 | 90.05 | **59.39** | **87.7** | 48.59 | 64.35 | 85.29 | **38.28** |
| **Qwen3-Embedding-0.6B** | 0.6B | 70.70 | 64.88 | 85.76 | 54.05 | 84.37 | 48.18 | 61.83 | 86.57 | 33.43 |
| **Qwen3-Embedding-4B** | 4B | 74.60 | 68.10 | 89.84 | 57.51 | 87.01 | 50.76 | 68.46 | **88.72** | 34.39 |
| **Qwen3-Embedding-8B** | 8B | **75.22** | **68.71** | **90.43** | 58.57 | 87.52 | **51.56** | **69.44** | 88.58 | 34.83 |
### C-MTEB (MTEB Chinese)
| C-MTEB | Param. | Mean(Task) | Mean(Type) | Class. | Clust. | Pair Class. | Rerank. | Retr. | STS |
|------------------|--------|------------|------------|--------|--------|-------------|---------|-------|-------|
| multilingual-e5-large-instruct | 0.6B | 58.08 | 58.24 | 69.80 | 48.23 | 64.52 | 57.45 | 63.65 | 45.81 |
| bge-multilingual-gemma2 | 9B | 67.64 |68.52 | 75.31 | 59.30 | 86.67 | 68.28 | 73.73 | 55.19 |
| gte-Qwen2-1.5B-instruct | 1.5B | 67.12 | 67.79 | 72.53 | 54.61 | 79.5 | 68.21 | 71.86 | 60.05 |
| gte-Qwen2-7B-instruct | 7.6B | 71.62 | 72.19 | 75.77 | 66.06 | 81.16 | 69.24 | 75.70 | 65.20 |
| ritrieve_zh_v1 | 0.3B | 72.71 | 73.85 | 76.88 | 66.5 | **85.98** | **72.86** | 76.97 | **63.92** |
| **Qwen3-Embedding-0.6B** | 0.6B | 66.33 | 67.45 | 71.40 | 68.74 | 76.42 | 62.58 | 71.03 | 54.52 |
| **Qwen3-Embedding-4B** | 4B | 72.27 | 73.51 | 75.46 | 77.89 | 83.34 | 66.05 | 77.03 | 61.26 |
| **Qwen3-Embedding-8B** | 8B | **73.84** | **75.00** | **76.97** | **80.08** | 84.23 | 66.99 | **78.21** | 63.53 |
## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3-embedding,
title = {Qwen3-Embedding},
url = {https://qwenlm.github.io/blog/qwen3/},
author = {Qwen Team},
month = {May},
year = {2025}
}
```

24
added_tokens.json Normal file
View File

@ -0,0 +1,24 @@
{
"</tool_call>": 151658,
"<tool_call>": 151657,
"<|box_end|>": 151649,
"<|box_start|>": 151648,
"<|endoftext|>": 151643,
"<|file_sep|>": 151664,
"<|fim_middle|>": 151660,
"<|fim_pad|>": 151662,
"<|fim_prefix|>": 151659,
"<|fim_suffix|>": 151661,
"<|im_end|>": 151645,
"<|im_start|>": 151644,
"<|image_pad|>": 151655,
"<|object_ref_end|>": 151647,
"<|object_ref_start|>": 151646,
"<|quad_end|>": 151651,
"<|quad_start|>": 151650,
"<|repo_name|>": 151663,
"<|video_pad|>": 151656,
"<|vision_end|>": 151653,
"<|vision_pad|>": 151654,
"<|vision_start|>": 151652
}

30
config.json Normal file
View File

@ -0,0 +1,30 @@
{
"architectures": [
"Qwen3Model"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 12288,
"max_position_embeddings": 40960,
"max_window_layers": 36,
"model_type": "qwen3",
"num_attention_heads": 32,
"num_hidden_layers": 36,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 1000000,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.51.2",
"use_cache": true,
"use_sliding_window": false,
"vocab_size": 151665
}

1
configuration.json Normal file
View File

@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}

6
generation_config.json Normal file
View File

@ -0,0 +1,6 @@
{
"bos_token_id": 151643,
"eos_token_id": 151643,
"max_new_tokens": 2048,
"transformers_version": "4.51.3"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:36d51a63bec203fa535d5e0e4a660333b4f4712605c0e9d65825d7b9a8526d09
size 135

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6c06ac12cf9291485c3d33726b2b86ee698e44d7d8056fc2183ac1466fc7f4c5
size 135

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cfbd4dc045c348879e434df2720e654825f7f633b7216a79c3a69fec0b8a0eed
size 135

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aca092b8436b4416fc19992c77d7abbff26f38a7b01021bbde9101b833c49a91
size 134

View File

@ -0,0 +1,405 @@
{
"metadata": {
"total_size": 15134590976
},
"weight_map": {
"embed_tokens.weight": "model-00001-of-00004.safetensors",
"layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.19.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.20.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.20.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.21.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.22.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
"layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
"layers.22.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
"layers.22.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
"layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
"layers.22.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
"layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.31.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.32.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.32.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.33.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.33.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.input_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.34.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.34.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.35.input_layernorm.weight": "model-00004-of-00004.safetensors",
"layers.35.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
"layers.35.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
"layers.35.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
"layers.35.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
"layers.35.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
"layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
"layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
"layers.35.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
"layers.35.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
"layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
"layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
"layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
"layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
"layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
"layers.9.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
"layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
"layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
"layers.9.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
"layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
"layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
"norm.weight": "model-00004-of-00004.safetensors"
}
}

31
special_tokens_map.json Normal file
View File

@ -0,0 +1,31 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
size 11421896

208
tokenizer_config.json Normal file
View File

@ -0,0 +1,208 @@
{
"add_bos_token": false,
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151646": {
"content": "<|object_ref_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151647": {
"content": "<|object_ref_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151648": {
"content": "<|box_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151649": {
"content": "<|box_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151650": {
"content": "<|quad_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151651": {
"content": "<|quad_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151652": {
"content": "<|vision_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151653": {
"content": "<|vision_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151654": {
"content": "<|vision_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151655": {
"content": "<|image_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151656": {
"content": "<|video_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151657": {
"content": "<tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151658": {
"content": "</tool_call>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151659": {
"content": "<|fim_prefix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151660": {
"content": "<|fim_middle|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151661": {
"content": "<|fim_suffix|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151662": {
"content": "<|fim_pad|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151663": {
"content": "<|repo_name|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
},
"151664": {
"content": "<|file_sep|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": false
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>",
"<|object_ref_start|>",
"<|object_ref_end|>",
"<|box_start|>",
"<|box_end|>",
"<|quad_start|>",
"<|quad_end|>",
"<|vision_start|>",
"<|vision_end|>",
"<|vision_pad|>",
"<|image_pad|>",
"<|video_pad|>"
],
"bos_token": null,
"chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"extra_special_tokens": {},
"model_max_length": 131072,
"pad_token": "<|endoftext|>",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long