Upload folder using ModelScope SDK

2025-08-20 16:47:52 +00:00 · 2025-08-20 16:47:52 +00:00 · 16e5e3e3e0
commit 16e5e3e3e0
parent 250d85ddde
26 changed files with 2723 additions and 41 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@ -1,47 +1,627 @@
 ---
-license: Apache License 2.0
-
-#model-type:
-##如 gpt、phi、llama、chatglm、baichuan 等
-#- gpt
-
-#domain:
-##如 nlp、cv、audio、multi-modal
-#- nlp
-
-#language:
-##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
-#- cn 
-
-#metrics:
-##如 CIDEr、Blue、ROUGE 等
-#- CIDEr
-
-#tags:
-##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
-#- pretrained
-
-#tools:
-##如 vllm、fastchat、llamacpp、AdaSeq 等
-#- vllm
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- vllm
+language:
+- en
+- zh
+base_model:
+- ByteDance-Seed/Seed-OSS-36B-Base
 ---
-### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重，可浏览“模型文件”页面获取。
-#### 您可以通过如下git clone命令，或者ModelScope SDK来下载模型

-SDK下载
-```bash
-#安装ModelScope
-pip install modelscope
+<div align="center">
+ 👋 Hi, everyone!
+    <br>
+    We are <b>ByteDance Seed Team.</b>
+</div>
+
+<p align="center">
+  You can get to know us better through the following channels👇
+  <br>
+  <a href="https://seed.bytedance.com/">
+    <img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
+</p>
+
+![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216)
+
+
+# Seed-OSS Open-Source Models
+<p align="center">
+  <a href="https://github.com/ByteDance-Seed/seed-oss">
+    <img src="https://img.shields.io/badge/Seed-Project Page-yellow"></a>
+  <a href="https://github.com/ByteDance-Seed/seed-oss">
+    <img src="https://img.shields.io/badge/Seed-Tech Report Coming Soon-red"></a>
+  <a href="https://huggingface.co/ByteDance-Seed">
+    <img src="https://img.shields.io/badge/Seed-Hugging Face-orange"></a>
+  <br>
+  <a href="./LICENSE">
+    <img src="https://img.shields.io/badge/License-Apache2.0-blue"></a>
+</p>
+
+> [!NOTE]
+> This model card is dedicated to the `Seed-OSS-36B-Instruct` model.
+
+## News
+- [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`.
+
+## Introduction
+Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks.
+
+We release this series of models to the open-source community under the Apache-2.0 license.
+
+> [!NOTE]
+> Seed-OSS is primarily optimized for international (i18n) use cases.
+
+### Key Features
+- **Flexible Control of Thinking Budget**: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios.
+- **Enhanced Reasoning Capability**: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities.
+- **Agentic Intelligence**: Performs exceptionally well in agentic tasks such as tool-using and issue resolving.
+- **Research-Friendly**: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options.
+- **Native Long Context**: Trained with up-to-512K long context natively.
+
+### Model Summary
+
+Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation.
+
+<div align="center">
+
+| | |
+|:---:|:---:|
+| | **Seed-OSS-36B** |
+| **Parameters** | 36B |
+| **Attention** | GQA |
+| **Activation Function** | SwiGLU |
+| **Number of Layers** | 64 |
+| **Number of QKV Heads** | 80 / 8 / 8 |
+| **Head Size** | 128 |
+| **Hidden Size** | 5120 |
+| **Vocabulary Size** | 155K |
+| **Context Length** | 512K |
+| **RoPE Base Frequency** | 1e7 |
+
+</div>
+
+
+## Evaluation Results
+
+### Seed-OSS-36B-Base
+
+Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., *w/ syn.*) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., *w/o syn.*), offering the community a high-performance foundation model unaffected by synthetic instruction data.
+
+<div align="center">
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup><a href="https://seed.bytedance.com/en/seed1_6">Seed1.6-Base</a></sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Base-2507*</sup></th>
+<th align="center"><sup>Qwen2.5-32B-Base*</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Base<br>(<i>w/ syn.</i>)</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Base-woSyn<br>(<i>w/o syn.</i>)</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=6><strong>Knowledge</strong></td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">70</td>
+<td align="center">59.8</td>
+<td align="center">58.5 (55.1)</td>
+<td align="center"><b>65.1</b></td>
+<td align="center">60.4</td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">88.8</td>
+<td align="center">82.7</td>
+<td align="center">84 (83.3)</td>
+<td align="center"><b>84.9</b></td>
+<td align="center">84.8</td>
+</tr>
+<tr>
+<td align="center">TriviaQA</td>
+<td align="center">91</td>
+<td align="center">76.2</td>
+<td align="center">76</td>
+<td align="center"><b>82.1</b></td>
+<td align="center">81.9</td>
+</tr>
+<tr>
+<td align="center">GPQA-D</td>
+<td align="center">43.4</td>
+<td align="center"><b>37</b></td>
+<td align="center">29.3</td>
+<td align="center">31.7</td>
+<td align="center">35.2</td>
+</tr>
+<tr>
+<td align="center">SimpleQA</td>
+<td align="center">17.1</td>
+<td align="center">7.2</td>
+<td align="center">6.1</td>
+<td align="center">5.8</td>
+<td align="center"><b>7.4</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Reasoning</strong></td>
+</tr>
+<tr>
+<td align="center">BBH</td>
+<td align="center">92.1</td>
+<td align="center">81.4</td>
+<td align="center">79.1 (84.5)</td>
+<td align="center"><b>87.7</b></td>
+<td align="center">87.2</td>
+</tr>
+<tr>
+<td align="center">AGIEval-en</td>
+<td align="center">78</td>
+<td align="center">66.4</td>
+<td align="center">65.6</td>
+<td align="center"><b>70.7</b></td>
+<td align="center">70.1</td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Math</strong></td>
+</tr>
+<tr>
+<td align="center">GSM8K</td>
+<td align="center">93.1</td>
+<td align="center">87</td>
+<td align="center">87.5 (92.9)</td>
+<td align="center"><b>90.8</b></td>
+<td align="center">90.3</td>
+</tr>
+<tr>
+<td align="center">MATH</td>
+<td align="center">72.9</td>
+<td align="center">61.1</td>
+<td align="center">63.5 (57.7)</td>
+<td align="center"><b>81.7</b></td>
+<td align="center">61.3</td>
+</tr>
+
+<tr>
+<td align="center" colspan=6><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center">MBPP</td>
+<td align="center">83.6</td>
+<td align="center">78.8</td>
+<td align="center">77.8 (84.5)</td>
+<td align="center"><b>80.6</b></td>
+<td align="center">74.6</td>
+</tr>
+<tr>
+<td align="center">HumanEval</td>
+<td align="center">78</td>
+<td align="center">70.7</td>
+<td align="center">47.6 (58.5)</td>
+<td align="center"><b>76.8</b></td>
+<td align="center">75.6</td>
+</tr>
+</tbody>
+</table>
+</div>
+
+<sup>
+- <b>Bold</b> denotes open-source SOTA.
+</sup><br/><sup>
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)".
+</sup>
+
+### Seed-OSS-36B-Instruct
+
+<div align="center">
+<table>
+<thead>
+<tr>
+<th align="center">Benchmark</th>
+<th align="center"><sup><a href="https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-seed-1-6-thinking">Seed1.6-Thinking-0715</a></sup></th>
+<th align="center"><sup>OAI-OSS-20B*</sup></th>
+<th align="center"><sup>Qwen3-30B-A3B-Thinking-2507*</sup></th>
+<th align="center"><sup>Qwen3-32B*</sup></th>
+<th align="center"><sup>Gemma3-27B</sup></th>
+<th align="center"><sup>Seed-OSS-36B-Instruct</sup></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td align="center" colspan=7><strong>Knowledge</strong></td>
+</tr>
+<tr>
+<td align="center">MMLU-Pro</td>
+<td align="center">86.6</td>
+<td align="center">76.2</td>
+<td align="center"><ins>81.9</ins> (80.9)</td>
+<td align="center">81.8</td>
+<td align="center">67.5</td>
+<td align="center"><b>82.7</b></td>
+</tr>
+<tr>
+<td align="center">MMLU</td>
+<td align="center">90.6</td>
+<td align="center">81.7 (85.3)</td>
+<td align="center"><ins>86.9</ins></td>
+<td align="center">86.2</td>
+<td align="center">76.9</td>
+<td align="center"><b>87.4</b></td>
+</tr>
+<tr>
+<td align="center">GPQA-D</td>
+<td align="center">80.7</td>
+<td align="center"><b>72.2</b> (71.5)</td>
+<td align="center"><ins>71.4</ins> (73.4)</td>
+<td align="center">66.7 (68.4)</td>
+<td align="center">42.4</td>
+<td align="center"><ins>71.4</ins></td>
+</tr>
+<tr>
+<td align="center">SuperGPQA</td>
+<td align="center">63.4</td>
+<td align="center">50.1</td>
+<td align="center"><b>57.3</b> (56.8)</td>
+<td align="center">49.3</td>
+<td align="center">-</td>
+<td align="center"><ins>55.7</ins></td>
+</tr>
+<tr>
+<td align="center">SimpleQA</td>
+<td align="center">23.7</td>
+<td align="center">6.7</td>
+<td align="center"><b>23.6</b></td>
+<td align="center">8.6</td>
+<td align="center"><ins>10</ins></td>
+<td align="center">9.7</td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Math</strong></td>
+</tr>
+<tr>
+<td align="center">AIME24</td>
+<td align="center">90.3</td>
+<td align="center"><b>92.7</b> (92.1)</td>
+<td align="center">87.7</td>
+<td align="center">82.7 (81.4)</td>
+<td align="center">-</td>
+<td align="center"><ins>91.7</ins></td>
+</tr>
+<tr>
+<td align="center">AIME25</td>
+<td align="center">86</td>
+<td align="center"><b>90.3</b> (91.7)</td>
+<td align="center">81.3 (85)</td>
+<td align="center">73.3 (72.9)</td>
+<td align="center">-</td>
+<td align="center"><ins>84.7</ins></td>
+</tr>
+<tr>
+<td align="center">BeyondAIME</td>
+<td align="center">60</td>
+<td align="center"><b>69</b></td>
+<td align="center">56</td>
+<td align="center">29</td>
+<td align="center">-</td>
+<td align="center"><ins>65</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Reasoning</strong></td>
+</tr>
+<tr>
+<td align="center">ArcAGI V2</td>
+<td align="center">50.3</td>
+<td align="center"><b>41.7</b></td>
+<td align="center">37.8</td>
+<td align="center">14.4</td>
+<td align="center">-</td>
+<td align="center"><ins>40.6</ins></td>
+</tr>
+<tr>
+<td align="center">KORBench</td>
+<td align="center">74.8</td>
+<td align="center"><b>72.3</b></td>
+<td align="center">70.2</td>
+<td align="center">65.4</td>
+<td align="center">-</td>
+<td align="center"><ins>70.6</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Coding</strong></td>
+</tr>
+<tr>
+<td align="center">LiveCodeBench v6<br/><sup>(02/2025-05/2025)</sup></td>
+<td align="center">66.8</td>
+<td align="center"><ins>63.8</ins></td>
+<td align="center">60.3 (66)</td>
+<td align="center">53.4</td>
+<td align="center">-</td>
+<td align="center"><b>67.4</b></td>
+</tr>
+<tr>
+<td align="center">HLE</td>
+<td align="center">13.9</td>
+<td align="center"><b>12.7</b> (10.9)</td>
+<td align="center">8.7</td>
+<td align="center">6.9</td>
+<td align="center">-</td>
+<td align="center"><ins>10.1</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Instruction Following</strong></td>
+</tr>
+<tr>
+<td align="center">IFEval</td>
+<td align="center">86.3</td>
+<td align="center"><b>92.8</b></td>
+<td align="center">88 (88.9)</td>
+<td align="center">88.4 (85)</td>
+<td align="center"><ins>90.4</ins></td>
+<td align="center">85.8</td>
+</tr>
+
+
+<tr>
+<td align="center" colspan=7><strong>Agent</strong></td>
+</tr>
+<tr>
+<td align="center">TAU1-Retail</td>
+<td align="center">63</td>
+<td align="center">(54.8)</td>
+<td align="center"><ins>58.7</ins> (67.8)</td>
+<td align="center">40.9</td>
+<td align="center">-</td>
+<td align="center"><b>70.4</b></td>
+</tr>
+<tr>
+<td align="center">TAU1-Airline</td>
+<td align="center">49</td>
+<td align="center">(38)</td>
+<td align="center"><b>47</b> (48)</td>
+<td align="center">38</td>
+<td align="center">-</td>
+<td align="center"><ins>46</ins></td>
+</tr>
+<tr>
+<td align="center">SWE-Bench Verified<br/><sup>(OpenHands)</sup></td>
+<td align="center">41.8</td>
+<td align="center"><b>(60.7)</b></td>
+<td align="center">31</td>
+<td align="center">23.4</td>
+<td align="center">-</td>
+<td align="center"><ins>56</ins></td>
+</tr>
+<tr>
+<td align="center">SWE-Bench Verified<br/><sup>(AgentLess 4*10)</sup></td>
+<td align="center">48.4</td>
+<td align="center">-</td>
+<td align="center">33.5</td>
+<td align="center"><ins>39.7</ins></td>
+<td align="center">-</td>
+<td align="center"><b>47</b></td>
+</tr>
+<tr>
+<td align="center">Multi-SWE-Bench</td>
+<td align="center">17.7</td>
+<td align="center">-</td>
+<td align="center"><ins>9.5</ins></td>
+<td align="center">7.7</td>
+<td align="center">-</td>
+<td align="center"><b>17</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Multilingualism</strong></td>
+</tr>
+<tr>
+<td align="center">MMMLU</td>
+<td align="center">84.3</td>
+<td align="center">77.4 (75.7)</td>
+<td align="center"><b>79</b></td>
+<td align="center"><b>79</b> (80.6)</td>
+<td align="center">-</td>
+<td align="center"><ins>78.4</ins></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Long Context</strong></td>
+</tr>
+<tr>
+<td align="center">RULER<br/><sup>(128K)</sup></td>
+<td align="center">94.5</td>
+<td align="center">78.7</td>
+<td align="center"><ins>94.5</ins></td>
+<td align="center">77.5</td>
+<td align="center">-</td>
+<td align="center"><b>94.6</b></td>
+</tr>
+
+<tr>
+<td align="center" colspan=7><strong>Safety</strong></td>
+</tr>
+<tr>
+<td align="center">AIR-Bench</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">-</td>
+<td align="center">75.6</td>
+</tr>
+</tbody>
+</table>
+</div>
+
+<sup>
+- <b>Bold</b> denotes open-source SOTA. <ins>Underlined</ins> indicates the second place in the open-source model.
+</sup><br/><sup>
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". Some results have been omitted due to the failure of the evaluation run.
+</sup><br/><sup>
+- The results of Gemma3-27B are sourced directly from its technical report.
+</sup><br/><sup>
+- Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, top_p=0.95. Specifically, for Taubench, temperature=1, top_p=0.7.
+</sup><br/><sup>
+</sup>
+
+> [!NOTE]
+> We recommend sampling with `temperature=1.1` and `top_p=0.95`.
+
+### Thinking Budget
+
+Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget.
+
+![thinking_budget](./thinking_budget.png)
+
+Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes.
 ```
+<seed:think>
+Got it, let's try to solve this problem step by step. The problem says ... ...
+<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
+Using the power rule, ... ...
+<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
+Alternatively, remember that ... ...
+<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
+Because if ... ...
+<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
+</seed:think>
+To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).
+```
+
+If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value.
+
+## Quick Start
+```shell
+pip3 install -r requirements.txt
+pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
+```
+
 ```python
-#SDK模型下载
-from modelscope import snapshot_download
-model_dir = snapshot_download('ByteDance-Seed/Seed-OSS-36B-Instruct')
-```
-Git下载
-```
-#Git模型下载
-git clone https://www.modelscope.cn/ByteDance-Seed/Seed-OSS-36B-Instruct.git
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import os
+import re
+
+model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")  # You may want to use bfloat16 and/or move to GPU here
+messages = [
+    {"role": "user", "content": "How to make pasta?"},
+]
+tokenized_chat = tokenizer.apply_chat_template(
+  messages, 
+  tokenize=True, 
+  add_generation_prompt=True, 
+  return_tensors="pt", 
+  thinking_budget=512 # control the thinking budget
+)
+
+outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
+
+output_text = tokenizer.decode(outputs[0])
 ```

-<p style="color: lightgrey;">如果您是本模型的贡献者，我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>，及时完善模型卡片内容。</p>
+## Inference
+
+### Download Model
+
+Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct`
+
+### Transformers
+The `generate.py` script provides a simple interface for model inference with configurable options.
+
+#### Basic Usage
+```shell
+cd inference
+python3 generate.py --model_path /path/to/model
+```
+
+#### Key Parameters
+| Parameter | Description |
+|-----------|-------------|
+| `--model_path` | Path to the pretrained model directory (required) |
+| `--prompts` | Input prompts (default: sample cooking/code questions) |
+| `--max_new_tokens` | Maximum tokens to generate (default: 4096) |
+| `--attn_implementation` | Attention mechanism: `flash_attention_2` (default) or `eager` |
+| `--load_in_4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) |
+| `--thinking_budget` | Thinking budget in tokens (default: -1 for unlimited budget) |
+
+#### Quantization Examples
+```shell
+# 8-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_8bit True
+
+# 4-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_4bit True
+```
+
+#### Custom Prompts
+```shell
+python3 generate.py --model_path /path/to/model --prompts "['What is machine learning?', 'Explain quantum computing']"
+```
+
+### vLLM
+Use vllm >= 0.10.0 or higher for inference.
+
+- First install vLLM with Seed-OSS support version:
+```shell
+VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss
+```
+
+- Start vLLM API server:
+```shell
+python3 -m vllm.entrypoints.openai.api_server \
+    --host localhost \
+    --port 4321 \
+    --enable-auto-tool-choice \
+    --tool-call-parser seed_oss \
+    --trust-remote-code \
+    --model ./Seed-OSS-36B-Instruct \
+    --chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \
+    --tensor-parallel-size 8 \
+    --dtype bfloat16 \
+    --served-model-name seed_oss
+```
+
+- Test with OpenAI client:
+
+Chat
+
+```shell
+python3 inference/vllm_chat.py
+```
+
+Tool Call
+```shell
+python3 inference/vllm_tool_call.py
+```
+
+
+## Model Card
+See [MODEL_CARD](./MODEL_CARD.md).
+
+## License
+This project is licensed under Apache-2.0. See the [LICENSE](./LICENSE) flie for details.
+
+## Citation
+
+```bibtex
+@misc{seed2025seed-oss,
+  author={ByteDance Seed Team},
+  title={Seed-OSS Open-Source Models},
+  year={2025},
+  howpublished={\url{https://github.com/ByteDance-Seed/seed-oss}}
+}
+```
+
+## About [ByteDance Seed Team](https://seed.bytedance.com/)
+
+Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
--- a/chat_template.jinja
+++ b/chat_template.jinja
@ -0,0 +1,171 @@
+{# ----------‑‑‑ special token variables ‑‑‑---------- #}
+{%- set bos_token              = '<seed:bos>'               -%}
+{%- set eos_token              = '<seed:eos>'               -%}
+{%- set pad_token              = '<seed:pad>'               -%}
+{%- set toolcall_begin_token   = '<seed:tool_call>'         -%}
+{%- set toolcall_end_token     = '</seed:tool_call>'        -%}
+{%- set think_begin_token      = '<seed:think>'             -%}
+{%- set think_end_token        = '</seed:think>'            -%}
+{%- set budget_begin_token     = '<seed:cot_budget_reflect>'-%}
+{%- set budget_end_token       = '</seed:cot_budget_reflect>'-%}
+{# -------------- reflection-interval lookup -------------- #}
+{%- if not thinking_budget is defined %}
+{%- set thinking_budget = -1 -%}
+{%- endif -%}
+{%- set budget_reflections_v05 = {
+     0:      0,
+     512:    128,
+     1024:   256,
+     2048:   512,
+     4096:   512,
+     8192:   1024,
+     16384:  1024
+} -%}
+{# 找到 “大于等于 thinking_budget” 的第一个档位 #}
+{%- set ns = namespace(interval = None) -%}
+{%- for k, v in budget_reflections_v05 | dictsort -%}
+    {%- if ns.interval is none and thinking_budget <= k -%}
+        {%- set ns.interval = v -%}
+    {%- endif -%}
+{%- endfor -%}
+{# 若超过最大档位，则用最后一个档位的值 #}
+{%- if ns.interval is none -%}
+    {%- set ns.interval = budget_reflections_v05[16384] -%}
+{%- endif -%}
+{# ---------- 预处理 system 消息 ---------- #}
+{%- if messages[0]["role"] == "system" %}
+{%- set system_message = messages[0]["content"] %}
+{%- set loop_messages = messages[1:] %}
+{%- else %}
+{%- set loop_messages = messages %}
+{%- endif %}
+{# ---------- 确保 tools 存在 ---------- #}
+{%- if not tools is defined or tools is none %}
+{%- set tools = [] %}
+{%- endif %}
+{# tools2doc.jinja #}
+{%- macro py_type(t) -%}
+    {%- if t == "string" -%}str
+    {%- elif t in ("number", "integer") -%}int
+    {%- elif t == "boolean" -%}bool
+    {%- elif t == "array" -%}list
+    {%- else -%}Any{%- endif -%}
+{%- endmacro -%}
+{# ---------- 输出 system 块 ---------- #}
+{%- if system_message is defined %}
+{{ bos_token + "system\n" + system_message }}
+{%- else %}
+{%- if tools is iterable and tools | length > 0 %}
+{{ bos_token + "system\nYou are Doubao, a helpful AI assistant. You may call one or more functions to assist with the user query." }}
+{%- endif %}
+{%- endif %}
+{%- if use_json_tooldef is defined and use_json_tooldef %}
+
+{{"Tool List:\nYou are authorized to use the following tools (described in JSON Schema format). Before performing any task, you must decide how to call them based on the descriptions and parameters of these tools."}}
+{{ tools | tojson(ensure_ascii=False) }}
+{%- else %}
+{%- for item in tools if item.type == "function" %}
+
+
+Function:
+def {{ item.function.name }}(
+{%- for name, spec in item.function.parameters.properties.items() %}
+        {{- name }}: {{ py_type(spec.type) }}{% if not loop.last %},{% endif %}
+{%- endfor %}):
+    """
+    {{ item.function.description | trim }}
+
+    {# ---------- Args ---------- #}
+    {%- if item.function.parameters.properties %}
+    Args:
+    {%- for name, spec in item.function.parameters.properties.items() %}
+
+    - {{ name }} ({{ py_type(spec.type) }})
+      {%- if name in item.function.parameters.required %} [必填]{% else %} [选填]{% endif %}:
+      {{- " " ~ (spec.description or "") }}
+    {%- endfor %}
+    {%- endif %}
+
+    {# ---------- Returns ---------- #}
+    {%- if item.function.returns is defined
+          and item.function.returns.properties is defined
+          and item.function.returns.properties %}
+    Returns:
+    {%- for name, spec in item.function.returns.properties.items() %}
+
+    - {{ name }} ({{ py_type(spec.type) }}):
+      {{- " " ~ (spec.description or "") }}
+    {%- endfor %}
+    {%- endif %}
+
+    """
+{%- endfor %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+
+{{"工具调用请遵循如下格式:\n<seed:tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>value_1</parameter>\n<parameter=example_parameter_2>This is the value for the second parameter\nthat can span\nmultiple lines</parameter>\n</function>\n</seed:tool_call>\n"}}
+{%- endif %}
+{# 结束 system 块行尾 #}
+{%- if system_message is defined or tools is iterable and tools | length > 0 %}
+{{ eos_token }}
+{%- endif %}
+{# ---------- Thinking Budget ---------- #}
+{%- if thinking_budget is defined %}
+{%- if thinking_budget == 0 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions." }}
+{{ eos_token }}
+{%- elif not thinking_budget == -1 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is "}}{{thinking_budget}}{{". That is, you need to complete your thinking within "}}{{thinking_budget}}{{" tokens and start answering the user's questions. You will reflect on your thinking process every "}}{{ns.interval}}{{" tokens, stating how many tokens have been used and how many are left."}}
+{{ eos_token }}
+{%- endif %}
+{%- endif %}
+{# ---------- 逐条写出历史消息 ---------- #}
+{%- for message in loop_messages %}
+{%- if message.role == "assistant"
+  and message.tool_calls is defined
+  and message.tool_calls is iterable
+  and message.tool_calls | length > 0 %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + "\n" }}
+{%- endif %}
+{%- for tool_call in message.tool_calls %}
+{%- if tool_call.function is defined %}{% set tool_call = tool_call.function %}{% endif %}
+{{ "\n" + toolcall_begin_token + "\n<function=" + tool_call.name + ">\n" }}
+{%- if tool_call.arguments is defined %}
+{%- for arg_name, arg_value in tool_call.arguments | items %}
+{{ "<parameter=" + arg_name + ">" }}
+{%- set arg_value = arg_value if arg_value is string else arg_value | string %}
+{{ arg_value+"</parameter>\n" }}
+{%- endfor %}
+{%- endif %}
+{{ "</function>\n" + toolcall_end_token }}
+{%- endfor %}
+{{ eos_token }}
+{%- elif message.role in ["user", "system"] %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- elif message.role == "assistant" %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + eos_token }}
+{%- endif %}
+{# 包括 tool 角色，在这个逻辑 #}
+{%- else %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- endif %}
+{%- endfor %}
+{# ---------- 控制模型开始续写 ---------- #}
+{%- if add_generation_prompt %}
+{{ bos_token+"assistant\n" }}
+{%- if thinking_budget == 0 %}
+{{ think_begin_token+budget_begin_token }}
+{%- endif %}
+{%- endif %}
--- a/config.json
+++ b/config.json
@ -0,0 +1,33 @@
+{
+  "architectures": [
+    "SeedOssForCausalLM"
+  ],
+  "attention_bias": true,
+  "attention_dropout": 0.1,
+  "attention_out_bias": false,
+  "bos_token_id": 0,
+  "pad_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 27648,
+  "max_position_embeddings": 524288,
+  "mlp_bias": false,
+  "model_type": "seed_oss",
+  "num_attention_heads": 80,
+  "num_hidden_layers": 64,
+  "num_key_value_heads": 8,
+  "residual_dropout": 0.1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "rope_type": "default"
+  },
+  "rope_theta": 10000000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.55.0",
+  "use_cache": true,
+  "vocab_size": 155136
+}
--- a/configuration.json
+++ b/configuration.json
@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@ -0,0 +1,10 @@
+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "pad_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.55.0",
+  "temperature": 1.1,
+  "top_p": 0.95
+}
+ 
--- a/model-00001-of-00015.safetensors
+++ b/model-00001-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a6387b80f12db915254cbe82c26d393f0f5a10600ce7bda028e3ee90c256eecc
+size 135
--- a/model-00002-of-00015.safetensors
+++ b/model-00002-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2d0b95a5d785f8e2a18329296773e042b8caa9a3f0a1d9e8ef2c9bb4a14eea
+size 135
--- a/model-00003-of-00015.safetensors
+++ b/model-00003-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1a3e358505119541fa85625546348a60f39685fba7549bd94c8e982d407a0555
+size 135
--- a/model-00004-of-00015.safetensors
+++ b/model-00004-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0d6bbfb4ab754f2cb391caa40f67dd9d349b5381b402574a0440813606a348c5
+size 135
--- a/model-00005-of-00015.safetensors
+++ b/model-00005-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:107cce88b60faf9bad30769172dce01cd1764570f92cb0a80dece2e238167f23
+size 135
--- a/model-00006-of-00015.safetensors
+++ b/model-00006-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e71fa75e94020a23d9a15da86ed328bdc01462a0a3f09ecdd614f047a802301a
+size 135
--- a/model-00007-of-00015.safetensors
+++ b/model-00007-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a04d657585986417b4957ae284b889c2b58083e39a90994a068ea4a25cfa27ae
+size 135
--- a/model-00008-of-00015.safetensors
+++ b/model-00008-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1ef369c73695b6d4ea90e68154005d90a2733f67053b10211830a8d85e9263c4
+size 135
--- a/model-00009-of-00015.safetensors
+++ b/model-00009-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:63e354190fef1698af8cf2b2b6eb3ceb4627be4e15c886fcefae04c40046811e
+size 135
--- a/model-00010-of-00015.safetensors
+++ b/model-00010-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a4781bad8d0e3bee0f1adda8017b951edd34a57638420cadaabf433e6bde8d0c
+size 135
--- a/model-00011-of-00015.safetensors
+++ b/model-00011-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:223165c90a98f80f66a5f2dcb94e6f09e3454974473fe14c6822c0628ee55f56
+size 135
--- a/model-00012-of-00015.safetensors
+++ b/model-00012-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8db709a2c461316819593bef8ae9e252cdf5da323f4361be62dd7f4d3c4c8f18
+size 135
--- a/model-00013-of-00015.safetensors
+++ b/model-00013-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4e6c7c009da0d562231304d6eef141a64f95a73e37b4d2576aa587a82b5713ec
+size 135
--- a/model-00014-of-00015.safetensors
+++ b/model-00014-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6d233a72fe9dc4cbea98e275729541d9ebf06a7d0ecf4edd68e0f86d8b021339
+size 135
--- a/model-00015-of-00015.safetensors
+++ b/model-00015-of-00015.safetensors
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:edabb4aa838885534911083fa9d7c00468f9e43103eb1bf61dc4a033af42d1c8
+size 135
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@ -0,0 +1,779 @@
+{
+  "metadata": {
+    "total_parameters": 36151104512,
+    "total_size": 72302209024
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00015-of-00015.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+    "model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+    "model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+    "model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+    "model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+    "model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+    "model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+    "model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+    "model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+    "model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+    "model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+    "model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+    "model.norm.weight": "model-00015-of-00015.safetensors"
+  }
+}
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@ -0,0 +1,23 @@
+{
+  "bos_token": {
+    "content": "<seed:bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<seed:eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<seed:pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/thinking_budget.png
+++ b/thinking_budget.png
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
				`@ -0,0 +1 @@`
				`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`