diff --git a/.gitattributes b/.gitattributes
index 53d7257..21b3632 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
\ No newline at end of file
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
\ No newline at end of file
diff --git a/README.md b/README.md
index c35c9e1..c36def4 100644
--- a/README.md
+++ b/README.md
@@ -1,47 +1,627 @@
 ---
-license: Apache License 2.0
-
-#model-type:
-##如 gpt、phi、llama、chatglm、baichuan 等
-#- gpt
-
-#domain:
-##如 nlp、cv、audio、multi-modal
-#- nlp
-
-#language:
-##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
-#- cn 
-
-#metrics:
-##如 CIDEr、Blue、ROUGE 等
-#- CIDEr
-
-#tags:
-##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
-#- pretrained
-
-#tools:
-##如 vllm、fastchat、llamacpp、AdaSeq 等
-#- vllm
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- vllm
+language:
+- en
+- zh
+base_model:
+- ByteDance-Seed/Seed-OSS-36B-Base
 ---
-### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重，可浏览“模型文件”页面获取。
-#### 您可以通过如下git clone命令，或者ModelScope SDK来下载模型
 
-SDK下载
-```bash
-#安装ModelScope
-pip install modelscope
+<div align="center">
+ 👋 Hi, everyone!
+    <br>
+    We are <b>ByteDance Seed Team.</b>
+</div>
+
+<p align="center">
+  You can get to know us better through the following channels👇
+  <br>
+  <a href="https://seed.bytedance.com/">
+    <img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
+</p>
+
+![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)
+
+
+# Seed-OSS Open-Source Models
+<p align="center">
+  <a href="https://github.com/ByteDance-Seed/seed-oss">
+    <img src="https://img.shields.io/badge/Seed-Project Page-yellow"></a>
+  <a href="https://github.com/ByteDance-Seed/seed-oss">
+    <img src="https://img.shields.io/badge/Seed-Tech Report Coming Soon-red"></a>
+  <a href="https://huggingface.co/ByteDance-Seed">
+    <img src="https://img.shields.io/badge/Seed-Hugging Face-orange"></a>
+  <br>
+  <a href="./LICENSE">
+    <img src="https://img.shields.io/badge/License-Apache2.0-blue"></a>
+</p>
+
+> [!NOTE]
+> This model card is dedicated to the `Seed-OSS-36B-Instruct` model.
+
+## News
+- [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`.
+
+## Introduction
+Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks.
+
+We release this series of models to the open-source community under the Apache-2.0 license.
+
+> [!NOTE]
+> Seed-OSS is primarily optimized for international (i18n) use cases.
+
+### Key Features
+- **Flexible Control of Thinking Budget**: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios.
+- **Enhanced Reasoning Capability**: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities.
+- **Agentic Intelligence**: Performs exceptionally well in agentic tasks such as tool-using and issue resolving.
+- **Research-Friendly**: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options.
+- **Native Long Context**: Trained with up-to-512K long context natively.
+
+### Model Summary
+
+Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation.
+
+<div align="center">
+
+| | |
+|:---:|:---:|
+| | **Seed-OSS-36B** |
+| **Parameters** | 36B |
+| **Attention** | GQA |
+| **Activation Function** | SwiGLU |
+| **Number of Layers** | 64 |
+| **Number of QKV Heads** | 80 / 8 / 8 |
+| **Head Size** | 128 |
+| **Hidden Size** | 5120 |
+| **Vocabulary Size** | 155K |
+| **Context Length** | 512K |
+| **RoPE Base Frequency** | 1e7 |
+
+</div>
+
+
+## Evaluation Results
+
+### Seed-OSS-36B-Base
+
+Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., *w/ syn.*) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., *w/o syn.*), offering the community a high-performance foundation model unaffected by synthetic instruction data.
+
+<div align="center">
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup><a href="https://seed.bytedance.com/en/seed1_6">Seed1.6-Base</a></sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Base-2507*</sup></th>
+<th align="center"><sup>Qwen2.5-32B-Base*</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Base<br>(<i>w/ syn.</i>)</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Base-woSyn<br>(<i>w/o syn.</i>)</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=6><strong>Knowledge</strong></td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">70</td>
+<td align="center">59.8</td>
+<td align="center">58.5 (55.1)</td>
+<td align="center"><b>65.1</b></td>
+<td align="center">60.4</td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">88.8</td>
+<td align="center">82.7</td>
+<td align="center">84 (83.3)</td>
+<td align="center"><b>84.9</b></td>
+<td align="center">84.8</td>
+</tr>
+<tr>
+<td align="center">TriviaQA</td>
+<td align="center">91</td>
+<td align="center">76.2</td>
+<td align="center">76</td>
+<td align="center"><b>82.1</b></td>
+<td align="center">81.9</td>
+</tr>
+<tr>
+<td align="center">GPQA-D</td>
+<td align="center">43.4</td>
+<td align="center"><b>37</b></td>
+<td align="center">29.3</td>
+<td align="center">31.7</td>
+<td align="center">35.2</td>
+</tr>
+<tr>
+<td align="center">SimpleQA</td>
+<td align="center">17.1</td>
+<td align="center">7.2</td>
+<td align="center">6.1</td>
+<td align="center">5.8</td>
+<td align="center"><b>7.4</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Reasoning</strong></td>
+</tr>
+<tr>
+<td align="center">BBH</td>
+<td align="center">92.1</td>
+<td align="center">81.4</td>
+<td align="center">79.1 (84.5)</td>
+<td align="center"><b>87.7</b></td>
+<td align="center">87.2</td>
+</tr>
+<tr>
+<td align="center">AGIEval-en</td>
+<td align="center">78</td>
+<td align="center">66.4</td>
+<td align="center">65.6</td>
+<td align="center"><b>70.7</b></td>
+<td align="center">70.1</td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Math</strong></td>
+</tr>
+<tr>
+<td align="center">GSM8K</td>
+<td align="center">93.1</td>
+<td align="center">87</td>
+<td align="center">87.5 (92.9)</td>
+<td align="center"><b>90.8</b></td>
+<td align="center">90.3</td>
+</tr>
+<tr>
+<td align="center">MATH</td>
+<td align="center">72.9</td>
+<td align="center">61.1</td>
+<td align="center">63.5 (57.7)</td>
+<td align="center"><b>81.7</b></td>
+<td align="center">61.3</td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center">MBPP</td>
+<td align="center">83.6</td>
+<td align="center">78.8</td>
+<td align="center">77.8 (84.5)</td>
+<td align="center"><b>80.6</b></td>
+<td align="center">74.6</td>
+</tr>
+<tr>
+<td align="center">HumanEval</td>
+<td align="center">78</td>
+<td align="center">70.7</td>
+<td align="center">47.6 (58.5)</td>
+<td align="center"><b>76.8</b></td>
+<td align="center">75.6</td>
+</tr>
+</tbody>
+</table>
+</div>
+
+<sup>
+- <b>Bold</b> denotes open-source SOTA.
+</sup><br/><sup>
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)".
+</sup>
+
+### Seed-OSS-36B-Instruct
+
+<div align="center">
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup><a href="https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-seed-1-6-thinking">Seed1.6-Thinking-0715</a></sup></th>
+<th align="center"><sup>OAI-OSS-20B*</sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Thinking-2507*</sup></th>
+<th align="center"><sup>Qwen3-32B*</sup></th>
+<th align="center"><sup>Gemma3-27B</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Instruct</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=7><strong>Knowledge</strong></td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">86.6</td>
+<td align="center">76.2</td>
+<td align="center"><ins>81.9</ins> (80.9)</td>
+<td align="center">81.8</td>
+<td align="center">67.5</td>
+<td align="center"><b>82.7</b></td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">90.6</td>
+<td align="center">81.7 (85.3)</td>
+<td align="center"><ins>86.9</ins></td>
+<td align="center">86.2</td>
+<td align="center">76.9</td>
+<td align="center"><b>87.4</b></td>
+</tr>
+<tr>
+<td align="center">GPQA-D</td>
+<td align="center">80.7</td>
+<td align="center"><b>72.2</b> (71.5)</td>
+<td align="center"><ins>71.4</ins> (73.4)</td>
+<td align="center">66.7 (68.4)</td>
+<td align="center">42.4</td>
+<td align="center"><ins>71.4</ins></td>
+</tr>
+<tr>
+<td align="center">SuperGPQA</td>
+<td align="center">63.4</td>
+<td align="center">50.1</td>
+<td align="center"><b>57.3</b> (56.8)</td>
+<td align="center">49.3</td>
+<td align="center">-</td>
+<td align="center"><ins>55.7</ins></td>
+</tr>
+<tr>
+<td align="center">SimpleQA</td>
+<td align="center">23.7</td>
+<td align="center">6.7</td>
+<td align="center"><b>23.6</b></td>
+<td align="center">8.6</td>
+<td align="center"><ins>10</ins></td>
+<td align="center">9.7</td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Math</strong></td>
+</tr>
+<tr>
+<td align="center">AIME24</td>
+<td align="center">90.3</td>
+<td align="center"><b>92.7</b> (92.1)</td>
+<td align="center">87.7</td>
+<td align="center">82.7 (81.4)</td>
+<td align="center">-</td>
+<td align="center"><ins>91.7</ins></td>
+</tr>
+<tr>
+<td align="center">AIME25</td>
+<td align="center">86</td>
+<td align="center"><b>90.3</b> (91.7)</td>
+<td align="center">81.3 (85)</td>
+<td align="center">73.3 (72.9)</td>
+<td align="center">-</td>
+<td align="center"><ins>84.7</ins></td>
+</tr>
+<tr>
+<td align="center">BeyondAIME</td>
+<td align="center">60</td>
+<td align="center"><b>69</b></td>
+<td align="center">56</td>
+<td align="center">29</td>
+<td align="center">-</td>
+<td align="center"><ins>65</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Reasoning</strong></td>
+</tr>
+<tr>
+<td align="center">ArcAGI V2</td>
+<td align="center">50.3</td>
+<td align="center"><b>41.7</b></td>
+<td align="center">37.8</td>
+<td align="center">14.4</td>
+<td align="center">-</td>
+<td align="center"><ins>40.6</ins></td>
+</tr>
+<tr>
+<td align="center">KORBench</td>
+<td align="center">74.8</td>
+<td align="center"><b>72.3</b></td>
+<td align="center">70.2</td>
+<td align="center">65.4</td>
+<td align="center">-</td>
+<td align="center"><ins>70.6</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center">LiveCodeBench v6<br/><sup>(02/2025-05/2025)</sup></td>
+<td align="center">66.8</td>
+<td align="center"><ins>63.8</ins></td>
+<td align="center">60.3 (66)</td>
+<td align="center">53.4</td>
+<td align="center">-</td>
+<td align="center"><b>67.4</b></td>
+</tr>
+<tr>
+<td align="center">HLE</td>
+<td align="center">13.9</td>
+<td align="center"><b>12.7</b> (10.9)</td>
+<td align="center">8.7</td>
+<td align="center">6.9</td>
+<td align="center">-</td>
+<td align="center"><ins>10.1</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Instruction Following</strong></td>
+</tr>
+<tr>
+<td align="center">IFEval</td>
+<td align="center">86.3</td>
+<td align="center"><b>92.8</b></td>
+<td align="center">88 (88.9)</td>
+<td align="center">88.4 (85)</td>
+<td align="center"><ins>90.4</ins></td>
+<td align="center">85.8</td>
+</tr>
+
+
+<tr>
+<td align="center" colspan=7><strong>Agent</strong></td>
+</tr>
+<tr>
+<td align="center">TAU1-Retail</td>
+<td align="center">63</td>
+<td align="center">(54.8)</td>
+<td align="center"><ins>58.7</ins> (67.8)</td>
+<td align="center">40.9</td>
+<td align="center">-</td>
+<td align="center"><b>70.4</b></td>
+</tr>
+<tr>
+<td align="center">TAU1-Airline</td>
+<td align="center">49</td>
+<td align="center">(38)</td>
+<td align="center"><b>47</b> (48)</td>
+<td align="center">38</td>
+<td align="center">-</td>
+<td align="center"><ins>46</ins></td>
+</tr>
+<tr>
+<td align="center">SWE-Bench Verified<br/><sup>(OpenHands)</sup></td>
+<td align="center">41.8</td>
+<td align="center"><b>(60.7)</b></td>
+<td align="center">31</td>
+<td align="center">23.4</td>
+<td align="center">-</td>
+<td align="center"><ins>56</ins></td>
+</tr>
+<tr>
+<td align="center">SWE-Bench Verified<br/><sup>(AgentLess 4*10)</sup></td>
+<td align="center">48.4</td>
+<td align="center">-</td>
+<td align="center">33.5</td>
+<td align="center"><ins>39.7</ins></td>
+<td align="center">-</td>
+<td align="center"><b>47</b></td>
+</tr>
+<tr>
+<td align="center">Multi-SWE-Bench</td>
+<td align="center">17.7</td>
+<td align="center">-</td>
+<td align="center"><ins>9.5</ins></td>
+<td align="center">7.7</td>
+<td align="center">-</td>
+<td align="center"><b>17</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Multilingualism</strong></td>
+</tr>
+<tr>
+<td align="center">MMMLU</td>
+<td align="center">84.3</td>
+<td align="center">77.4 (75.7)</td>
+<td align="center"><b>79</b></td>
+<td align="center"><b>79</b> (80.6)</td>
+<td align="center">-</td>
+<td align="center"><ins>78.4</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Long Context</strong></td>
+</tr>
+<tr>
+<td align="center">RULER<br/><sup>(128K)</sup></td>
+<td align="center">94.5</td>
+<td align="center">78.7</td>
+<td align="center"><ins>94.5</ins></td>
+<td align="center">77.5</td>
+<td align="center">-</td>
+<td align="center"><b>94.6</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Safety</strong></td>
+</tr>
+<tr>
+<td align="center">AIR-Bench</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">75.6</td>
+</tr>
+</tbody>
+</table>
+</div>
+
+<sup>
+- <b>Bold</b> denotes open-source SOTA. <ins>Underlined</ins> indicates the second place in the open-source model.
+</sup><br/><sup>
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". Some results have been omitted due to the failure of the evaluation run.
+</sup><br/><sup>
+- The results of Gemma3-27B are sourced directly from its technical report.
+</sup><br/><sup>
+- Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, top_p=0.95. Specifically, for Taubench, temperature=1, top_p=0.7.
+</sup><br/><sup>
+</sup>
+
+> [!NOTE]
+> We recommend sampling with `temperature=1.1` and `top_p=0.95`.
+
+### Thinking Budget
+
+Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget.
+
+![thinking_budget](./thinking_budget.png)
+
+Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes.
 ```
+<seed:think>
+Got it, let's try to solve this problem step by step. The problem says ... ...
+<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
+Using the power rule, ... ...
+<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
+Alternatively, remember that ... ...
+<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
+Because if ... ...
+<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
+</seed:think>
+To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).
+```
+
+If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value.
+
+## Quick Start
+```shell
+pip3 install -r requirements.txt
+pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
+```
+
 ```python
-#SDK模型下载
-from modelscope import snapshot_download
-model_dir = snapshot_download('ByteDance-Seed/Seed-OSS-36B-Instruct')
-```
-Git下载
-```
-#Git模型下载
-git clone https://www.modelscope.cn/ByteDance-Seed/Seed-OSS-36B-Instruct.git
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import os
+import re
+
+model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")  # You may want to use bfloat16 and/or move to GPU here
+messages = [
+    {"role": "user", "content": "How to make pasta?"},
+]
+tokenized_chat = tokenizer.apply_chat_template(
+  messages, 
+  tokenize=True, 
+  add_generation_prompt=True, 
+  return_tensors="pt", 
+  thinking_budget=512 # control the thinking budget
+)
+
+outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
+
+output_text = tokenizer.decode(outputs[0])
 ```
 
-<p style="color: lightgrey;">如果您是本模型的贡献者，我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>，及时完善模型卡片内容。</p>
\ No newline at end of file
+## Inference
+
+### Download Model
+
+Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct`
+
+### Transformers
+The `generate.py` script provides a simple interface for model inference with configurable options.
+
+#### Basic Usage
+```shell
+cd inference
+python3 generate.py --model_path /path/to/model
+```
+
+#### Key Parameters
+| Parameter | Description |
+|-----------|-------------|
+| `--model_path` | Path to the pretrained model directory (required) |
+| `--prompts` | Input prompts (default: sample cooking/code questions) |
+| `--max_new_tokens` | Maximum tokens to generate (default: 4096) |
+| `--attn_implementation` | Attention mechanism: `flash_attention_2` (default) or `eager` |
+| `--load_in_4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) |
+| `--thinking_budget` | Thinking budget in tokens (default: -1 for unlimited budget) |
+
+#### Quantization Examples
+```shell
+# 8-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_8bit True
+
+# 4-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_4bit True
+```
+
+#### Custom Prompts
+```shell
+python3 generate.py --model_path /path/to/model --prompts "['What is machine learning?', 'Explain quantum computing']"
+```
+
+### vLLM
+Use vllm >= 0.10.0 or higher for inference.
+
+- First install vLLM with Seed-OSS support version:
+```shell
+VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss
+```
+
+- Start vLLM API server:
+```shell
+python3 -m vllm.entrypoints.openai.api_server \
+    --host localhost \
+    --port 4321 \
+    --enable-auto-tool-choice \
+    --tool-call-parser seed_oss \
+    --trust-remote-code \
+    --model ./Seed-OSS-36B-Instruct \
+    --chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \
+    --tensor-parallel-size 8 \
+    --dtype bfloat16 \
+    --served-model-name seed_oss
+```
+
+- Test with OpenAI client:
+
+Chat
+
+```shell
+python3 inference/vllm_chat.py
+```
+
+Tool Call
+```shell
+python3 inference/vllm_tool_call.py
+```
+
+
+## Model Card
+See [MODEL_CARD](./MODEL_CARD.md).
+
+## License
+This project is licensed under Apache-2.0. See the [LICENSE](./LICENSE) flie for details.
+
+## Citation
+
+```bibtex
+@misc{seed2025seed-oss,
+  author={ByteDance Seed Team},
+  title={Seed-OSS Open-Source Models},
+  year={2025},
+  howpublished={\url{https://github.com/ByteDance-Seed/seed-oss}}
+}
+```
+
+## About [ByteDance Seed Team](https://seed.bytedance.com/)
+
+Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
\ No newline at end of file
diff --git a/chat_template.jinja b/chat_template.jinja
new file mode 100644
index 0000000..d92d003
--- /dev/null
+++ b/chat_template.jinja
@@ -0,0 +1,171 @@
+{# ----------‑‑‑ special token variables ‑‑‑---------- #}
+{%- set bos_token              = '<seed:bos>'               -%}
+{%- set eos_token              = '<seed:eos>'               -%}
+{%- set pad_token              = '<seed:pad>'               -%}
+{%- set toolcall_begin_token   = '<seed:tool_call>'         -%}
+{%- set toolcall_end_token     = '</seed:tool_call>'        -%}
+{%- set think_begin_token      = '<seed:think>'             -%}
+{%- set think_end_token        = '</seed:think>'            -%}
+{%- set budget_begin_token     = '<seed:cot_budget_reflect>'-%}
+{%- set budget_end_token       = '</seed:cot_budget_reflect>'-%}
+{# -------------- reflection-interval lookup -------------- #}
+{%- if not thinking_budget is defined %}
+{%- set thinking_budget = -1 -%}
+{%- endif -%}
+{%- set budget_reflections_v05 = {
+     0:      0,
+     512:    128,
+     1024:   256,
+     2048:   512,
+     4096:   512,
+     8192:   1024,
+     16384:  1024
+} -%}
+{# 找到 “大于等于 thinking_budget” 的第一个档位 #}
+{%- set ns = namespace(interval = None) -%}
+{%- for k, v in budget_reflections_v05 | dictsort -%}
+    {%- if ns.interval is none and thinking_budget <= k -%}
+        {%- set ns.interval = v -%}
+    {%- endif -%}
+{%- endfor -%}
+{# 若超过最大档位，则用最后一个档位的值 #}
+{%- if ns.interval is none -%}
+    {%- set ns.interval = budget_reflections_v05[16384] -%}
+{%- endif -%}
+{# ---------- 预处理 system 消息 ---------- #}
+{%- if messages[0]["role"] == "system" %}
+{%- set system_message = messages[0]["content"] %}
+{%- set loop_messages = messages[1:] %}
+{%- else %}
+{%- set loop_messages = messages %}
+{%- endif %}
+{# ---------- 确保 tools 存在 ---------- #}
+{%- if not tools is defined or tools is none %}
+{%- set tools = [] %}
+{%- endif %}
+{# tools2doc.jinja #}
+{%- macro py_type(t) -%}
+    {%- if t == "string" -%}str
+    {%- elif t in ("number", "integer") -%}int
+    {%- elif t == "boolean" -%}bool
+    {%- elif t == "array" -%}list
+    {%- else -%}Any{%- endif -%}
+{%- endmacro -%}
+{# ---------- 输出 system 块 ---------- #}
+{%- if system_message is defined %}
+{{ bos_token + "system\n" + system_message }}
+{%- else %}
+{%- if tools is iterable and tools | length > 0 %}
+{{ bos_token + "system\nYou are Doubao, a helpful AI assistant. You may call one or more functions to assist with the user query." }}
+{%- endif %}
+{%- endif %}
+{%- if use_json_tooldef is defined and use_json_tooldef %}
+
+{{"Tool List:\nYou are authorized to use the following tools (described in JSON Schema format). Before performing any task, you must decide how to call them based on the descriptions and parameters of these tools."}}
+{{ tools | tojson(ensure_ascii=False) }}
+{%- else %}
+{%- for item in tools if item.type == "function" %}
+
+
+Function:
+def {{ item.function.name }}(
+{%- for name, spec in item.function.parameters.properties.items() %}
+        {{- name }}: {{ py_type(spec.type) }}{% if not loop.last %},{% endif %}
+{%- endfor %}):
+    """
+    {{ item.function.description | trim }}
+
+    {# ---------- Args ---------- #}
+    {%- if item.function.parameters.properties %}
+    Args:
+    {%- for name, spec in item.function.parameters.properties.items() %}
+
+    - {{ name }} ({{ py_type(spec.type) }})
+      {%- if name in item.function.parameters.required %} [必填]{% else %} [选填]{% endif %}:
+      {{- " " ~ (spec.description or "") }}
+    {%- endfor %}
+    {%- endif %}
+
+    {# ---------- Returns ---------- #}
+    {%- if item.function.returns is defined
+          and item.function.returns.properties is defined
+          and item.function.returns.properties %}
+    Returns:
+    {%- for name, spec in item.function.returns.properties.items() %}
+
+    - {{ name }} ({{ py_type(spec.type) }}):
+      {{- " " ~ (spec.description or "") }}
+    {%- endfor %}
+    {%- endif %}
+
+    """
+{%- endfor %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+
+{{"工具调用请遵循如下格式:\n<seed:tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>value_1</parameter>\n<parameter=example_parameter_2>This is the value for the second parameter\nthat can span\nmultiple lines</parameter>\n</function>\n</seed:tool_call>\n"}}
+{%- endif %}
+{# 结束 system 块行尾 #}
+{%- if system_message is defined or tools is iterable and tools | length > 0 %}
+{{ eos_token }}
+{%- endif %}
+{# ---------- Thinking Budget ---------- #}
+{%- if thinking_budget is defined %}
+{%- if thinking_budget == 0 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions." }}
+{{ eos_token }}
+{%- elif not thinking_budget == -1 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is "}}{{thinking_budget}}{{". That is, you need to complete your thinking within "}}{{thinking_budget}}{{" tokens and start answering the user's questions. You will reflect on your thinking process every "}}{{ns.interval}}{{" tokens, stating how many tokens have been used and how many are left."}}
+{{ eos_token }}
+{%- endif %}
+{%- endif %}
+{# ---------- 逐条写出历史消息 ---------- #}
+{%- for message in loop_messages %}
+{%- if message.role == "assistant"
+  and message.tool_calls is defined
+  and message.tool_calls is iterable
+  and message.tool_calls | length > 0 %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + "\n" }}
+{%- endif %}
+{%- for tool_call in message.tool_calls %}
+{%- if tool_call.function is defined %}{% set tool_call = tool_call.function %}{% endif %}
+{{ "\n" + toolcall_begin_token + "\n<function=" + tool_call.name + ">\n" }}
+{%- if tool_call.arguments is defined %}
+{%- for arg_name, arg_value in tool_call.arguments | items %}
+{{ "<parameter=" + arg_name + ">" }}
+{%- set arg_value = arg_value if arg_value is string else arg_value | string %}
+{{ arg_value+"</parameter>\n" }}
+{%- endfor %}
+{%- endif %}
+{{ "</function>\n" + toolcall_end_token }}
+{%- endfor %}
+{{ eos_token }}
+{%- elif message.role in ["user", "system"] %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- elif message.role == "assistant" %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + eos_token }}
+{%- endif %}
+{# 包括 tool 角色，在这个逻辑 #}
+{%- else %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- endif %}
+{%- endfor %}
+{# ---------- 控制模型开始续写 ---------- #}
+{%- if add_generation_prompt %}
+{{ bos_token+"assistant\n" }}
+{%- if thinking_budget == 0 %}
+{{ think_begin_token+budget_begin_token }}
+{%- endif %}
+{%- endif %}
\ No newline at end of file
diff --git a/config.json b/config.json
new file mode 100644
index 0000000..e094445
--- /dev/null
+++ b/config.json
@@ -0,0 +1,33 @@
+{
+  "architectures": [
+    "SeedOssForCausalLM"
+  ],
+  "attention_bias": true,
+  "attention_dropout": 0.1,
+  "attention_out_bias": false,
+  "bos_token_id": 0,
+  "pad_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 27648,
+  "max_position_embeddings": 524288,
+  "mlp_bias": false,
+  "model_type": "seed_oss",
+  "num_attention_heads": 80,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "residual_dropout": 0.1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "rope_type": "default"
+  },
+  "rope_theta": 10000000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.55.0",
+  "use_cache": true,
+  "vocab_size": 155136
+}
\ No newline at end of file
diff --git a/configuration.json b/configuration.json
new file mode 100644
index 0000000..bbeeda1
--- /dev/null
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
\ No newline at end of file
diff --git a/generation_config.json b/generation_config.json
new file mode 100644
index 0000000..3a7b67b
--- /dev/null
+++ b/generation_config.json
@@ -0,0 +1,10 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "pad_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.55.0",
+  "temperature": 1.1,
+  "top_p": 0.95
+}
+ 
\ No newline at end of file
diff --git a/model-00001-of-00015.safetensors b/model-00001-of-00015.safetensors
new file mode 100644
index 0000000..b69abcc
--- /dev/null
+++ b/model-00001-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a6387b80f12db915254cbe82c26d393f0f5a10600ce7bda028e3ee90c256eecc
+size 135
diff --git a/model-00002-of-00015.safetensors b/model-00002-of-00015.safetensors
new file mode 100644
index 0000000..854a48a
--- /dev/null
+++ b/model-00002-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2d0b95a5d785f8e2a18329296773e042b8caa9a3f0a1d9e8ef2c9bb4a14eea
+size 135
diff --git a/model-00003-of-00015.safetensors b/model-00003-of-00015.safetensors
new file mode 100644
index 0000000..7c82696
--- /dev/null
+++ b/model-00003-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1a3e358505119541fa85625546348a60f39685fba7549bd94c8e982d407a0555
+size 135
diff --git a/model-00004-of-00015.safetensors b/model-00004-of-00015.safetensors
new file mode 100644
index 0000000..7adcec1
--- /dev/null
+++ b/model-00004-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0d6bbfb4ab754f2cb391caa40f67dd9d349b5381b402574a0440813606a348c5
+size 135
diff --git a/model-00005-of-00015.safetensors b/model-00005-of-00015.safetensors
new file mode 100644
index 0000000..bcb869e
--- /dev/null
+++ b/model-00005-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:107cce88b60faf9bad30769172dce01cd1764570f92cb0a80dece2e238167f23
+size 135
diff --git a/model-00006-of-00015.safetensors b/model-00006-of-00015.safetensors
new file mode 100644
index 0000000..ead20d5
--- /dev/null
+++ b/model-00006-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e71fa75e94020a23d9a15da86ed328bdc01462a0a3f09ecdd614f047a802301a
+size 135
diff --git a/model-00007-of-00015.safetensors b/model-00007-of-00015.safetensors
new file mode 100644
index 0000000..96a8710
--- /dev/null
+++ b/model-00007-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a04d657585986417b4957ae284b889c2b58083e39a90994a068ea4a25cfa27ae
+size 135
diff --git a/model-00008-of-00015.safetensors b/model-00008-of-00015.safetensors
new file mode 100644
index 0000000..017e25f
--- /dev/null
+++ b/model-00008-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1ef369c73695b6d4ea90e68154005d90a2733f67053b10211830a8d85e9263c4
+size 135
diff --git a/model-00009-of-00015.safetensors b/model-00009-of-00015.safetensors
new file mode 100644
index 0000000..5057972
--- /dev/null
+++ b/model-00009-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:63e354190fef1698af8cf2b2b6eb3ceb4627be4e15c886fcefae04c40046811e
+size 135
diff --git a/model-00010-of-00015.safetensors b/model-00010-of-00015.safetensors
new file mode 100644
index 0000000..d1d46d7
--- /dev/null
+++ b/model-00010-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a4781bad8d0e3bee0f1adda8017b951edd34a57638420cadaabf433e6bde8d0c
+size 135
diff --git a/model-00011-of-00015.safetensors b/model-00011-of-00015.safetensors
new file mode 100644
index 0000000..ced0535
--- /dev/null
+++ b/model-00011-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:223165c90a98f80f66a5f2dcb94e6f09e3454974473fe14c6822c0628ee55f56
+size 135
diff --git a/model-00012-of-00015.safetensors b/model-00012-of-00015.safetensors
new file mode 100644
index 0000000..7ffc30a
--- /dev/null
+++ b/model-00012-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8db709a2c461316819593bef8ae9e252cdf5da323f4361be62dd7f4d3c4c8f18
+size 135
diff --git a/model-00013-of-00015.safetensors b/model-00013-of-00015.safetensors
new file mode 100644
index 0000000..3c9b063
--- /dev/null
+++ b/model-00013-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4e6c7c009da0d562231304d6eef141a64f95a73e37b4d2576aa587a82b5713ec
+size 135
diff --git a/model-00014-of-00015.safetensors b/model-00014-of-00015.safetensors
new file mode 100644
index 0000000..6c95ff9
--- /dev/null
+++ b/model-00014-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6d233a72fe9dc4cbea98e275729541d9ebf06a7d0ecf4edd68e0f86d8b021339
+size 135
diff --git a/model-00015-of-00015.safetensors b/model-00015-of-00015.safetensors
new file mode 100644
index 0000000..72710eb
--- /dev/null
+++ b/model-00015-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:edabb4aa838885534911083fa9d7c00468f9e43103eb1bf61dc4a033af42d1c8
+size 135
diff --git a/model.safetensors.index.json b/model.safetensors.index.json
new file mode 100644
index 0000000..4757f05
--- /dev/null
+++ b/model.safetensors.index.json
@@ -0,0 +1,779 @@
+{
+  "metadata": {
+    "total_parameters": 36151104512,
+    "total_size": 72302209024
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00015-of-00015.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.norm.weight": "model-00015-of-00015.safetensors"
+  }
+}
diff --git a/special_tokens_map.json b/special_tokens_map.json
new file mode 100644
index 0000000..7dd43a5
--- /dev/null
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
+{
+  "bos_token": {
+    "content": "<seed:bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<seed:eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<seed:pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
diff --git a/thinking_budget.png b/thinking_budget.png
new file mode 100644
index 0000000..ab0237b
Binary files /dev/null and b/thinking_budget.png differ
diff --git a/tokenizer.json b/tokenizer.json
new file mode 100644
index 0000000..dc0d8c9
--- /dev/null
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f6bd848f52451824a3033a9f1e67eea5b399a13c90f845a332d3a29537e05827
+size 11883696
diff --git a/tokenizer_config.json b/tokenizer_config.json
new file mode 100644
index 0000000..c72b8f0
--- /dev/null
+++ b/tokenizer_config.json
@@ -0,0 +1,1035 @@
+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<seed:bos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<seed:pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<seed:eos>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<seed:think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "4": {
+      "content": "</seed:think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "5": {
+      "content": "<seed:cot_budget_reflect>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "6": {
+      "content": "</seed:cot_budget_reflect>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "7": {
+      "content": "<seed:tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "8": {
+      "content": "</seed:tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "9": {
+      "content": "<[PLHD9_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<[PLHD10_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<[PLHD11_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<[PLHD12_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<[PLHD13_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<[PLHD14_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<[PLHD15_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<[PLHD16_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "17": {
+      "content": "<[PLHD17_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "18": {
+      "content": "<[PLHD18_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "19": {
+      "content": "<[PLHD19_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "20": {
+      "content": "<[PLHD20_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "21": {
+      "content": "<[PLHD21_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "22": {
+      "content": "<[PLHD22_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "23": {
+      "content": "<[PLHD23_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "24": {
+      "content": "<[PLHD24_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "25": {
+      "content": "<[PLHD25_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "26": {
+      "content": "<[PLHD26_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "27": {
+      "content": "<[PLHD27_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "28": {
+      "content": "<[PLHD28_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "29": {
+      "content": "<[PLHD29_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30": {
+      "content": "<[PLHD30_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "31": {
+      "content": "<[PLHD31_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "32": {
+      "content": "<[PLHD32_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "33": {
+      "content": "<[PLHD33_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "34": {
+      "content": "<[PLHD34_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "35": {
+      "content": "<[PLHD35_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "36": {
+      "content": "<[PLHD36_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "37": {
+      "content": "<[PLHD37_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "38": {
+      "content": "<[PLHD38_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "39": {
+      "content": "<[PLHD39_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "40": {
+      "content": "<[PLHD40_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "41": {
+      "content": "<[PLHD41_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "42": {
+      "content": "<[PLHD42_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "43": {
+      "content": "<[PLHD43_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "44": {
+      "content": "<[PLHD44_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "45": {
+      "content": "<[PLHD45_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "46": {
+      "content": "<[PLHD46_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "47": {
+      "content": "<[PLHD47_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "48": {
+      "content": "<[PLHD48_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "49": {
+      "content": "<[PLHD49_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50": {
+      "content": "<[PLHD50_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "51": {
+      "content": "<[PLHD51_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "52": {
+      "content": "<[PLHD52_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "53": {
+      "content": "<[PLHD53_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "54": {
+      "content": "<[PLHD54_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "55": {
+      "content": "<[PLHD55_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "56": {
+      "content": "<[PLHD56_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "57": {
+      "content": "<[PLHD57_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "58": {
+      "content": "<[PLHD58_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "59": {
+      "content": "<[PLHD59_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "60": {
+      "content": "<[PLHD60_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "61": {
+      "content": "<[PLHD61_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "62": {
+      "content": "<[PLHD62_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "63": {
+      "content": "<[PLHD63_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "64": {
+      "content": "<[PLHD64_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "65": {
+      "content": "<[PLHD65_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "66": {
+      "content": "<[PLHD66_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "67": {
+      "content": "<[PLHD67_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "68": {
+      "content": "<[PLHD68_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "69": {
+      "content": "<[PLHD69_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "70": {
+      "content": "<[PLHD70_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "71": {
+      "content": "<[PLHD71_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "72": {
+      "content": "<[PLHD72_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "73": {
+      "content": "<[PLHD73_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "74": {
+      "content": "<[PLHD74_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "75": {
+      "content": "<[PLHD75_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "76": {
+      "content": "<[PLHD76_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "77": {
+      "content": "<[PLHD77_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "78": {
+      "content": "<[PLHD78_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "79": {
+      "content": "<[PLHD79_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "80": {
+      "content": "<[PLHD80_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "81": {
+      "content": "<[PLHD81_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "82": {
+      "content": "<[PLHD82_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "83": {
+      "content": "<[PLHD83_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "84": {
+      "content": "<[PLHD84_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "85": {
+      "content": "<[PLHD85_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "86": {
+      "content": "<[PLHD86_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "87": {
+      "content": "<[PLHD87_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "88": {
+      "content": "<[PLHD88_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "89": {
+      "content": "<[PLHD89_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "90": {
+      "content": "<[PLHD90_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "91": {
+      "content": "<[PLHD91_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "92": {
+      "content": "<[PLHD92_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "93": {
+      "content": "<[PLHD93_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "94": {
+      "content": "<[PLHD94_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "95": {
+      "content": "<[PLHD95_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96": {
+      "content": "<[PLHD96_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "97": {
+      "content": "<[PLHD97_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "98": {
+      "content": "<[PLHD98_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "99": {
+      "content": "<[PLHD99_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "<[PLHD100_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "<[PLHD101_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "<[PLHD102_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "<[PLHD103_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "<[PLHD104_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "105": {
+      "content": "<[PLHD105_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "106": {
+      "content": "<[PLHD106_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "107": {
+      "content": "<[PLHD107_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "108": {
+      "content": "<[PLHD108_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "109": {
+      "content": "<[PLHD109_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "110": {
+      "content": "<[PLHD110_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "111": {
+      "content": "<[PLHD111_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "112": {
+      "content": "<[PLHD112_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "113": {
+      "content": "<[PLHD113_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "114": {
+      "content": "<[PLHD114_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "115": {
+      "content": "<[PLHD115_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "116": {
+      "content": "<[PLHD116_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "117": {
+      "content": "<[PLHD117_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "118": {
+      "content": "<[PLHD118_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "119": {
+      "content": "<[PLHD119_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "120": {
+      "content": "<[PLHD120_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "121": {
+      "content": "<[PLHD121_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "122": {
+      "content": "<[PLHD122_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "123": {
+      "content": "<[PLHD123_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "124": {
+      "content": "<[PLHD124_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "125": {
+      "content": "<[PLHD125_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "126": {
+      "content": "<[PLHD126_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "127": {
+      "content": "<[PLHD127_never_used]>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<seed:bos>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<seed:eos>",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<seed:pad>",
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}

Benchmark	^Seed1.6-Base	^{Qwen3-30B-A3B-Base-2507*}	^{Qwen2.5-32B-Base*}	^{Seed-OSS-36B-Base (w/ syn.)}	^{Seed-OSS-36B-Base-woSyn (w/o syn.)}
Knowledge
MMLU-Pro	70	59.8	58.5 (55.1)	65.1	60.4
MMLU	88.8	82.7	84 (83.3)	84.9	84.8
TriviaQA	91	76.2	76	82.1	81.9
GPQA-D	43.4	37	29.3	31.7	35.2
SimpleQA	17.1	7.2	6.1	5.8	7.4
Reasoning
BBH	92.1	81.4	79.1 (84.5)	87.7	87.2
AGIEval-en	78	66.4	65.6	70.7	70.1
Math
GSM8K	93.1	87	87.5 (92.9)	90.8	90.3
MATH	72.9	61.1	63.5 (57.7)	81.7	61.3
Coding
MBPP	83.6	78.8	77.8 (84.5)	80.6	74.6
HumanEval	78	70.7	47.6 (58.5)	76.8	75.6
Benchmark	^{Seed1.6-Thinking-0715}	^OAI-OSS-20B*	^{Qwen3-30B-A3B-Thinking-2507*}	^Qwen3-32B*	^Gemma3-27B	^{Seed-OSS-36B-Instruct}
Knowledge
MMLU-Pro	86.6	76.2	81.9 (80.9)	81.8	67.5	82.7
MMLU	90.6	81.7 (85.3)	86.9	86.2	76.9	87.4
GPQA-D	80.7	72.2 (71.5)	71.4 (73.4)	66.7 (68.4)	42.4	71.4
SuperGPQA	63.4	50.1	57.3 (56.8)	49.3	-	55.7
SimpleQA	23.7	6.7	23.6	8.6	10	9.7
Math
AIME24	90.3	92.7 (92.1)	87.7	82.7 (81.4)	-	91.7
AIME25	86	90.3 (91.7)	81.3 (85)	73.3 (72.9)	-	84.7
BeyondAIME	60	69	56	29	-	65
Reasoning
ArcAGI V2	50.3	41.7	37.8	14.4	-	40.6
KORBench	74.8	72.3	70.2	65.4	-	70.6
Coding
LiveCodeBench v6 ^{(02/2025-05/2025)}	66.8	63.8	60.3 (66)	53.4	-	67.4
HLE	13.9	12.7 (10.9)	8.7	6.9	-	10.1
Instruction Following
IFEval	86.3	92.8	88 (88.9)	88.4 (85)	90.4	85.8
Agent
TAU1-Retail	63	(54.8)	58.7 (67.8)	40.9	-	70.4
TAU1-Airline	49	(38)	47 (48)	38	-	46
SWE-Bench Verified ^(OpenHands)	41.8	(60.7)	31	23.4	-	56
SWE-Bench Verified ^{(AgentLess 4*10)}	48.4	-	33.5	39.7	-	47
Multi-SWE-Bench	17.7	-	9.5	7.7	-	17
Multilingualism
MMMLU	84.3	77.4 (75.7)	79	79 (80.6)	-	78.4
Long Context
RULER ^(128K)	94.5	78.7	94.5	77.5	-	94.6
Safety
AIR-Bench	-	-	-	-	-	75.6