Upload folder using ModelScope SDK
This commit is contained in:
parent
250d85ddde
commit
16e5e3e3e0
4
.gitattributes
vendored
4
.gitattributes
vendored
@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|||||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
|
||||||
|
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
||||||
660
README.md
660
README.md
@ -1,47 +1,627 @@
|
|||||||
---
|
---
|
||||||
license: Apache License 2.0
|
license: apache-2.0
|
||||||
|
pipeline_tag: text-generation
|
||||||
#model-type:
|
library_name: transformers
|
||||||
##如 gpt、phi、llama、chatglm、baichuan 等
|
tags:
|
||||||
#- gpt
|
- vllm
|
||||||
|
language:
|
||||||
#domain:
|
- en
|
||||||
##如 nlp、cv、audio、multi-modal
|
- zh
|
||||||
#- nlp
|
base_model:
|
||||||
|
- ByteDance-Seed/Seed-OSS-36B-Base
|
||||||
#language:
|
|
||||||
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
|
|
||||||
#- cn
|
|
||||||
|
|
||||||
#metrics:
|
|
||||||
##如 CIDEr、Blue、ROUGE 等
|
|
||||||
#- CIDEr
|
|
||||||
|
|
||||||
#tags:
|
|
||||||
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
|
|
||||||
#- pretrained
|
|
||||||
|
|
||||||
#tools:
|
|
||||||
##如 vllm、fastchat、llamacpp、AdaSeq 等
|
|
||||||
#- vllm
|
|
||||||
---
|
---
|
||||||
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
|
|
||||||
#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型
|
|
||||||
|
|
||||||
SDK下载
|
<div align="center">
|
||||||
```bash
|
👋 Hi, everyone!
|
||||||
#安装ModelScope
|
<br>
|
||||||
pip install modelscope
|
We are <b>ByteDance Seed Team.</b>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
You can get to know us better through the following channels👇
|
||||||
|
<br>
|
||||||
|
<a href="https://seed.bytedance.com/">
|
||||||
|
<img src="https://img.shields.io/badge/Website-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white"></a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
|
# Seed-OSS Open-Source Models
|
||||||
|
<p align="center">
|
||||||
|
<a href="https://github.com/ByteDance-Seed/seed-oss">
|
||||||
|
<img src="https://img.shields.io/badge/Seed-Project Page-yellow"></a>
|
||||||
|
<a href="https://github.com/ByteDance-Seed/seed-oss">
|
||||||
|
<img src="https://img.shields.io/badge/Seed-Tech Report Coming Soon-red"></a>
|
||||||
|
<a href="https://huggingface.co/ByteDance-Seed">
|
||||||
|
<img src="https://img.shields.io/badge/Seed-Hugging Face-orange"></a>
|
||||||
|
<br>
|
||||||
|
<a href="./LICENSE">
|
||||||
|
<img src="https://img.shields.io/badge/License-Apache2.0-blue"></a>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> This model card is dedicated to the `Seed-OSS-36B-Instruct` model.
|
||||||
|
|
||||||
|
## News
|
||||||
|
- [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`.
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks.
|
||||||
|
|
||||||
|
We release this series of models to the open-source community under the Apache-2.0 license.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Seed-OSS is primarily optimized for international (i18n) use cases.
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
- **Flexible Control of Thinking Budget**: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios.
|
||||||
|
- **Enhanced Reasoning Capability**: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities.
|
||||||
|
- **Agentic Intelligence**: Performs exceptionally well in agentic tasks such as tool-using and issue resolving.
|
||||||
|
- **Research-Friendly**: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options.
|
||||||
|
- **Native Long Context**: Trained with up-to-512K long context natively.
|
||||||
|
|
||||||
|
### Model Summary
|
||||||
|
|
||||||
|
Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
| | |
|
||||||
|
|:---:|:---:|
|
||||||
|
| | **Seed-OSS-36B** |
|
||||||
|
| **Parameters** | 36B |
|
||||||
|
| **Attention** | GQA |
|
||||||
|
| **Activation Function** | SwiGLU |
|
||||||
|
| **Number of Layers** | 64 |
|
||||||
|
| **Number of QKV Heads** | 80 / 8 / 8 |
|
||||||
|
| **Head Size** | 128 |
|
||||||
|
| **Hidden Size** | 5120 |
|
||||||
|
| **Vocabulary Size** | 155K |
|
||||||
|
| **Context Length** | 512K |
|
||||||
|
| **RoPE Base Frequency** | 1e7 |
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
|
||||||
|
## Evaluation Results
|
||||||
|
|
||||||
|
### Seed-OSS-36B-Base
|
||||||
|
|
||||||
|
Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., *w/ syn.*) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., *w/o syn.*), offering the community a high-performance foundation model unaffected by synthetic instruction data.
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="center">Benchmark</th>
|
||||||
|
<th align="center"><sup><a href="https://seed.bytedance.com/en/seed1_6">Seed1.6-Base</a></sup></th>
|
||||||
|
<th align="center"><sup>Qwen3-30B-A3B-Base-2507*</sup></th>
|
||||||
|
<th align="center"><sup>Qwen2.5-32B-Base*</sup></th>
|
||||||
|
<th align="center"><sup>Seed-OSS-36B-Base<br>(<i>w/ syn.</i>)</sup></th>
|
||||||
|
<th align="center"><sup>Seed-OSS-36B-Base-woSyn<br>(<i>w/o syn.</i>)</sup></th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=6><strong>Knowledge</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MMLU-Pro</td>
|
||||||
|
<td align="center">70</td>
|
||||||
|
<td align="center">59.8</td>
|
||||||
|
<td align="center">58.5 (55.1)</td>
|
||||||
|
<td align="center"><b>65.1</b></td>
|
||||||
|
<td align="center">60.4</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MMLU</td>
|
||||||
|
<td align="center">88.8</td>
|
||||||
|
<td align="center">82.7</td>
|
||||||
|
<td align="center">84 (83.3)</td>
|
||||||
|
<td align="center"><b>84.9</b></td>
|
||||||
|
<td align="center">84.8</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">TriviaQA</td>
|
||||||
|
<td align="center">91</td>
|
||||||
|
<td align="center">76.2</td>
|
||||||
|
<td align="center">76</td>
|
||||||
|
<td align="center"><b>82.1</b></td>
|
||||||
|
<td align="center">81.9</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">GPQA-D</td>
|
||||||
|
<td align="center">43.4</td>
|
||||||
|
<td align="center"><b>37</b></td>
|
||||||
|
<td align="center">29.3</td>
|
||||||
|
<td align="center">31.7</td>
|
||||||
|
<td align="center">35.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">SimpleQA</td>
|
||||||
|
<td align="center">17.1</td>
|
||||||
|
<td align="center">7.2</td>
|
||||||
|
<td align="center">6.1</td>
|
||||||
|
<td align="center">5.8</td>
|
||||||
|
<td align="center"><b>7.4</b></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=6><strong>Reasoning</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">BBH</td>
|
||||||
|
<td align="center">92.1</td>
|
||||||
|
<td align="center">81.4</td>
|
||||||
|
<td align="center">79.1 (84.5)</td>
|
||||||
|
<td align="center"><b>87.7</b></td>
|
||||||
|
<td align="center">87.2</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">AGIEval-en</td>
|
||||||
|
<td align="center">78</td>
|
||||||
|
<td align="center">66.4</td>
|
||||||
|
<td align="center">65.6</td>
|
||||||
|
<td align="center"><b>70.7</b></td>
|
||||||
|
<td align="center">70.1</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=6><strong>Math</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">GSM8K</td>
|
||||||
|
<td align="center">93.1</td>
|
||||||
|
<td align="center">87</td>
|
||||||
|
<td align="center">87.5 (92.9)</td>
|
||||||
|
<td align="center"><b>90.8</b></td>
|
||||||
|
<td align="center">90.3</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MATH</td>
|
||||||
|
<td align="center">72.9</td>
|
||||||
|
<td align="center">61.1</td>
|
||||||
|
<td align="center">63.5 (57.7)</td>
|
||||||
|
<td align="center"><b>81.7</b></td>
|
||||||
|
<td align="center">61.3</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=6><strong>Coding</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MBPP</td>
|
||||||
|
<td align="center">83.6</td>
|
||||||
|
<td align="center">78.8</td>
|
||||||
|
<td align="center">77.8 (84.5)</td>
|
||||||
|
<td align="center"><b>80.6</b></td>
|
||||||
|
<td align="center">74.6</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">HumanEval</td>
|
||||||
|
<td align="center">78</td>
|
||||||
|
<td align="center">70.7</td>
|
||||||
|
<td align="center">47.6 (58.5)</td>
|
||||||
|
<td align="center"><b>76.8</b></td>
|
||||||
|
<td align="center">75.6</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<sup>
|
||||||
|
- <b>Bold</b> denotes open-source SOTA.
|
||||||
|
</sup><br/><sup>
|
||||||
|
- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)".
|
||||||
|
</sup>
|
||||||
|
|
||||||
|
### Seed-OSS-36B-Instruct
|
||||||
|
|
||||||
|
<div align="center">
|
||||||
|
<table>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th align="center">Benchmark</th>
|
||||||
|
<th align="center"><sup><a href="https://console.volcengine.com/ark/region:ark+cn-beijing/model/detail?Id=doubao-seed-1-6-thinking">Seed1.6-Thinking-0715</a></sup></th>
|
||||||
|
<th align="center"><sup>OAI-OSS-20B*</sup></th>
|
||||||
|
<th align="center"><sup>Qwen3-30B-A3B-Thinking-2507*</sup></th>
|
||||||
|
<th align="center"><sup>Qwen3-32B*</sup></th>
|
||||||
|
<th align="center"><sup>Gemma3-27B</sup></th>
|
||||||
|
<th align="center"><sup>Seed-OSS-36B-Instruct</sup></th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Knowledge</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MMLU-Pro</td>
|
||||||
|
<td align="center">86.6</td>
|
||||||
|
<td align="center">76.2</td>
|
||||||
|
<td align="center"><ins>81.9</ins> (80.9)</td>
|
||||||
|
<td align="center">81.8</td>
|
||||||
|
<td align="center">67.5</td>
|
||||||
|
<td align="center"><b>82.7</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MMLU</td>
|
||||||
|
<td align="center">90.6</td>
|
||||||
|
<td align="center">81.7 (85.3)</td>
|
||||||
|
<td align="center"><ins>86.9</ins></td>
|
||||||
|
<td align="center">86.2</td>
|
||||||
|
<td align="center">76.9</td>
|
||||||
|
<td align="center"><b>87.4</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">GPQA-D</td>
|
||||||
|
<td align="center">80.7</td>
|
||||||
|
<td align="center"><b>72.2</b> (71.5)</td>
|
||||||
|
<td align="center"><ins>71.4</ins> (73.4)</td>
|
||||||
|
<td align="center">66.7 (68.4)</td>
|
||||||
|
<td align="center">42.4</td>
|
||||||
|
<td align="center"><ins>71.4</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">SuperGPQA</td>
|
||||||
|
<td align="center">63.4</td>
|
||||||
|
<td align="center">50.1</td>
|
||||||
|
<td align="center"><b>57.3</b> (56.8)</td>
|
||||||
|
<td align="center">49.3</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>55.7</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">SimpleQA</td>
|
||||||
|
<td align="center">23.7</td>
|
||||||
|
<td align="center">6.7</td>
|
||||||
|
<td align="center"><b>23.6</b></td>
|
||||||
|
<td align="center">8.6</td>
|
||||||
|
<td align="center"><ins>10</ins></td>
|
||||||
|
<td align="center">9.7</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Math</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">AIME24</td>
|
||||||
|
<td align="center">90.3</td>
|
||||||
|
<td align="center"><b>92.7</b> (92.1)</td>
|
||||||
|
<td align="center">87.7</td>
|
||||||
|
<td align="center">82.7 (81.4)</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>91.7</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">AIME25</td>
|
||||||
|
<td align="center">86</td>
|
||||||
|
<td align="center"><b>90.3</b> (91.7)</td>
|
||||||
|
<td align="center">81.3 (85)</td>
|
||||||
|
<td align="center">73.3 (72.9)</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>84.7</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">BeyondAIME</td>
|
||||||
|
<td align="center">60</td>
|
||||||
|
<td align="center"><b>69</b></td>
|
||||||
|
<td align="center">56</td>
|
||||||
|
<td align="center">29</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>65</ins></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Reasoning</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">ArcAGI V2</td>
|
||||||
|
<td align="center">50.3</td>
|
||||||
|
<td align="center"><b>41.7</b></td>
|
||||||
|
<td align="center">37.8</td>
|
||||||
|
<td align="center">14.4</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>40.6</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">KORBench</td>
|
||||||
|
<td align="center">74.8</td>
|
||||||
|
<td align="center"><b>72.3</b></td>
|
||||||
|
<td align="center">70.2</td>
|
||||||
|
<td align="center">65.4</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>70.6</ins></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Coding</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">LiveCodeBench v6<br/><sup>(02/2025-05/2025)</sup></td>
|
||||||
|
<td align="center">66.8</td>
|
||||||
|
<td align="center"><ins>63.8</ins></td>
|
||||||
|
<td align="center">60.3 (66)</td>
|
||||||
|
<td align="center">53.4</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><b>67.4</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">HLE</td>
|
||||||
|
<td align="center">13.9</td>
|
||||||
|
<td align="center"><b>12.7</b> (10.9)</td>
|
||||||
|
<td align="center">8.7</td>
|
||||||
|
<td align="center">6.9</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>10.1</ins></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Instruction Following</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">IFEval</td>
|
||||||
|
<td align="center">86.3</td>
|
||||||
|
<td align="center"><b>92.8</b></td>
|
||||||
|
<td align="center">88 (88.9)</td>
|
||||||
|
<td align="center">88.4 (85)</td>
|
||||||
|
<td align="center"><ins>90.4</ins></td>
|
||||||
|
<td align="center">85.8</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Agent</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">TAU1-Retail</td>
|
||||||
|
<td align="center">63</td>
|
||||||
|
<td align="center">(54.8)</td>
|
||||||
|
<td align="center"><ins>58.7</ins> (67.8)</td>
|
||||||
|
<td align="center">40.9</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><b>70.4</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">TAU1-Airline</td>
|
||||||
|
<td align="center">49</td>
|
||||||
|
<td align="center">(38)</td>
|
||||||
|
<td align="center"><b>47</b> (48)</td>
|
||||||
|
<td align="center">38</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>46</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">SWE-Bench Verified<br/><sup>(OpenHands)</sup></td>
|
||||||
|
<td align="center">41.8</td>
|
||||||
|
<td align="center"><b>(60.7)</b></td>
|
||||||
|
<td align="center">31</td>
|
||||||
|
<td align="center">23.4</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>56</ins></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">SWE-Bench Verified<br/><sup>(AgentLess 4*10)</sup></td>
|
||||||
|
<td align="center">48.4</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">33.5</td>
|
||||||
|
<td align="center"><ins>39.7</ins></td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><b>47</b></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">Multi-SWE-Bench</td>
|
||||||
|
<td align="center">17.7</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>9.5</ins></td>
|
||||||
|
<td align="center">7.7</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><b>17</b></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Multilingualism</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">MMMLU</td>
|
||||||
|
<td align="center">84.3</td>
|
||||||
|
<td align="center">77.4 (75.7)</td>
|
||||||
|
<td align="center"><b>79</b></td>
|
||||||
|
<td align="center"><b>79</b> (80.6)</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><ins>78.4</ins></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Long Context</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">RULER<br/><sup>(128K)</sup></td>
|
||||||
|
<td align="center">94.5</td>
|
||||||
|
<td align="center">78.7</td>
|
||||||
|
<td align="center"><ins>94.5</ins></td>
|
||||||
|
<td align="center">77.5</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center"><b>94.6</b></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td align="center" colspan=7><strong>Safety</strong></td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td align="center">AIR-Bench</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">-</td>
|
||||||
|
<td align="center">75.6</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<sup>
|
||||||
|
- <b>Bold</b> denotes open-source SOTA. <ins>Underlined</ins> indicates the second place in the open-source model.
|
||||||
|
</sup><br/><sup>
|
||||||
|
- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". Some results have been omitted due to the failure of the evaluation run.
|
||||||
|
</sup><br/><sup>
|
||||||
|
- The results of Gemma3-27B are sourced directly from its technical report.
|
||||||
|
</sup><br/><sup>
|
||||||
|
- Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, top_p=0.95. Specifically, for Taubench, temperature=1, top_p=0.7.
|
||||||
|
</sup><br/><sup>
|
||||||
|
</sup>
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> We recommend sampling with `temperature=1.1` and `top_p=0.95`.
|
||||||
|
|
||||||
|
### Thinking Budget
|
||||||
|
|
||||||
|
Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes.
|
||||||
```
|
```
|
||||||
|
<seed:think>
|
||||||
|
Got it, let's try to solve this problem step by step. The problem says ... ...
|
||||||
|
<seed:cot_budget_reflect>I have used 129 tokens, and there are 383 tokens remaining for use.</seed:cot_budget_reflect>
|
||||||
|
Using the power rule, ... ...
|
||||||
|
<seed:cot_budget_reflect>I have used 258 tokens, and there are 254 tokens remaining for use.</seed:cot_budget_reflect>
|
||||||
|
Alternatively, remember that ... ...
|
||||||
|
<seed:cot_budget_reflect>I have used 393 tokens, and there are 119 tokens remaining for use.</seed:cot_budget_reflect>
|
||||||
|
Because if ... ...
|
||||||
|
<seed:cot_budget_reflect>I have exhausted my token budget, and now I will start answering the question.</seed:cot_budget_reflect>
|
||||||
|
</seed:think>
|
||||||
|
To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).
|
||||||
|
```
|
||||||
|
|
||||||
|
If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
```shell
|
||||||
|
pip3 install -r requirements.txt
|
||||||
|
pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
|
||||||
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
#SDK模型下载
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
from modelscope import snapshot_download
|
import os
|
||||||
model_dir = snapshot_download('ByteDance-Seed/Seed-OSS-36B-Instruct')
|
import re
|
||||||
```
|
|
||||||
Git下载
|
model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
|
||||||
```
|
|
||||||
#Git模型下载
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
|
||||||
git clone https://www.modelscope.cn/ByteDance-Seed/Seed-OSS-36B-Instruct.git
|
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here
|
||||||
|
messages = [
|
||||||
|
{"role": "user", "content": "How to make pasta?"},
|
||||||
|
]
|
||||||
|
tokenized_chat = tokenizer.apply_chat_template(
|
||||||
|
messages,
|
||||||
|
tokenize=True,
|
||||||
|
add_generation_prompt=True,
|
||||||
|
return_tensors="pt",
|
||||||
|
thinking_budget=512 # control the thinking budget
|
||||||
|
)
|
||||||
|
|
||||||
|
outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
|
||||||
|
|
||||||
|
output_text = tokenizer.decode(outputs[0])
|
||||||
```
|
```
|
||||||
|
|
||||||
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
|
## Inference
|
||||||
|
|
||||||
|
### Download Model
|
||||||
|
|
||||||
|
Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct`
|
||||||
|
|
||||||
|
### Transformers
|
||||||
|
The `generate.py` script provides a simple interface for model inference with configurable options.
|
||||||
|
|
||||||
|
#### Basic Usage
|
||||||
|
```shell
|
||||||
|
cd inference
|
||||||
|
python3 generate.py --model_path /path/to/model
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Key Parameters
|
||||||
|
| Parameter | Description |
|
||||||
|
|-----------|-------------|
|
||||||
|
| `--model_path` | Path to the pretrained model directory (required) |
|
||||||
|
| `--prompts` | Input prompts (default: sample cooking/code questions) |
|
||||||
|
| `--max_new_tokens` | Maximum tokens to generate (default: 4096) |
|
||||||
|
| `--attn_implementation` | Attention mechanism: `flash_attention_2` (default) or `eager` |
|
||||||
|
| `--load_in_4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) |
|
||||||
|
| `--thinking_budget` | Thinking budget in tokens (default: -1 for unlimited budget) |
|
||||||
|
|
||||||
|
#### Quantization Examples
|
||||||
|
```shell
|
||||||
|
# 8-bit quantization
|
||||||
|
python3 generate.py --model_path /path/to/model --load_in_8bit True
|
||||||
|
|
||||||
|
# 4-bit quantization
|
||||||
|
python3 generate.py --model_path /path/to/model --load_in_4bit True
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Custom Prompts
|
||||||
|
```shell
|
||||||
|
python3 generate.py --model_path /path/to/model --prompts "['What is machine learning?', 'Explain quantum computing']"
|
||||||
|
```
|
||||||
|
|
||||||
|
### vLLM
|
||||||
|
Use vllm >= 0.10.0 or higher for inference.
|
||||||
|
|
||||||
|
- First install vLLM with Seed-OSS support version:
|
||||||
|
```shell
|
||||||
|
VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss
|
||||||
|
```
|
||||||
|
|
||||||
|
- Start vLLM API server:
|
||||||
|
```shell
|
||||||
|
python3 -m vllm.entrypoints.openai.api_server \
|
||||||
|
--host localhost \
|
||||||
|
--port 4321 \
|
||||||
|
--enable-auto-tool-choice \
|
||||||
|
--tool-call-parser seed_oss \
|
||||||
|
--trust-remote-code \
|
||||||
|
--model ./Seed-OSS-36B-Instruct \
|
||||||
|
--chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \
|
||||||
|
--tensor-parallel-size 8 \
|
||||||
|
--dtype bfloat16 \
|
||||||
|
--served-model-name seed_oss
|
||||||
|
```
|
||||||
|
|
||||||
|
- Test with OpenAI client:
|
||||||
|
|
||||||
|
Chat
|
||||||
|
|
||||||
|
```shell
|
||||||
|
python3 inference/vllm_chat.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Tool Call
|
||||||
|
```shell
|
||||||
|
python3 inference/vllm_tool_call.py
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Model Card
|
||||||
|
See [MODEL_CARD](./MODEL_CARD.md).
|
||||||
|
|
||||||
|
## License
|
||||||
|
This project is licensed under Apache-2.0. See the [LICENSE](./LICENSE) flie for details.
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
@misc{seed2025seed-oss,
|
||||||
|
author={ByteDance Seed Team},
|
||||||
|
title={Seed-OSS Open-Source Models},
|
||||||
|
year={2025},
|
||||||
|
howpublished={\url{https://github.com/ByteDance-Seed/seed-oss}}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## About [ByteDance Seed Team](https://seed.bytedance.com/)
|
||||||
|
|
||||||
|
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
|
||||||
171
chat_template.jinja
Normal file
171
chat_template.jinja
Normal file
@ -0,0 +1,171 @@
|
|||||||
|
{# ----------‑‑‑ special token variables ‑‑‑---------- #}
|
||||||
|
{%- set bos_token = '<seed:bos>' -%}
|
||||||
|
{%- set eos_token = '<seed:eos>' -%}
|
||||||
|
{%- set pad_token = '<seed:pad>' -%}
|
||||||
|
{%- set toolcall_begin_token = '<seed:tool_call>' -%}
|
||||||
|
{%- set toolcall_end_token = '</seed:tool_call>' -%}
|
||||||
|
{%- set think_begin_token = '<seed:think>' -%}
|
||||||
|
{%- set think_end_token = '</seed:think>' -%}
|
||||||
|
{%- set budget_begin_token = '<seed:cot_budget_reflect>'-%}
|
||||||
|
{%- set budget_end_token = '</seed:cot_budget_reflect>'-%}
|
||||||
|
{# -------------- reflection-interval lookup -------------- #}
|
||||||
|
{%- if not thinking_budget is defined %}
|
||||||
|
{%- set thinking_budget = -1 -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- set budget_reflections_v05 = {
|
||||||
|
0: 0,
|
||||||
|
512: 128,
|
||||||
|
1024: 256,
|
||||||
|
2048: 512,
|
||||||
|
4096: 512,
|
||||||
|
8192: 1024,
|
||||||
|
16384: 1024
|
||||||
|
} -%}
|
||||||
|
{# 找到 “大于等于 thinking_budget” 的第一个档位 #}
|
||||||
|
{%- set ns = namespace(interval = None) -%}
|
||||||
|
{%- for k, v in budget_reflections_v05 | dictsort -%}
|
||||||
|
{%- if ns.interval is none and thinking_budget <= k -%}
|
||||||
|
{%- set ns.interval = v -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{%- endfor -%}
|
||||||
|
{# 若超过最大档位,则用最后一个档位的值 #}
|
||||||
|
{%- if ns.interval is none -%}
|
||||||
|
{%- set ns.interval = budget_reflections_v05[16384] -%}
|
||||||
|
{%- endif -%}
|
||||||
|
{# ---------- 预处理 system 消息 ---------- #}
|
||||||
|
{%- if messages[0]["role"] == "system" %}
|
||||||
|
{%- set system_message = messages[0]["content"] %}
|
||||||
|
{%- set loop_messages = messages[1:] %}
|
||||||
|
{%- else %}
|
||||||
|
{%- set loop_messages = messages %}
|
||||||
|
{%- endif %}
|
||||||
|
{# ---------- 确保 tools 存在 ---------- #}
|
||||||
|
{%- if not tools is defined or tools is none %}
|
||||||
|
{%- set tools = [] %}
|
||||||
|
{%- endif %}
|
||||||
|
{# tools2doc.jinja #}
|
||||||
|
{%- macro py_type(t) -%}
|
||||||
|
{%- if t == "string" -%}str
|
||||||
|
{%- elif t in ("number", "integer") -%}int
|
||||||
|
{%- elif t == "boolean" -%}bool
|
||||||
|
{%- elif t == "array" -%}list
|
||||||
|
{%- else -%}Any{%- endif -%}
|
||||||
|
{%- endmacro -%}
|
||||||
|
{# ---------- 输出 system 块 ---------- #}
|
||||||
|
{%- if system_message is defined %}
|
||||||
|
{{ bos_token + "system\n" + system_message }}
|
||||||
|
{%- else %}
|
||||||
|
{%- if tools is iterable and tools | length > 0 %}
|
||||||
|
{{ bos_token + "system\nYou are Doubao, a helpful AI assistant. You may call one or more functions to assist with the user query." }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if use_json_tooldef is defined and use_json_tooldef %}
|
||||||
|
|
||||||
|
{{"Tool List:\nYou are authorized to use the following tools (described in JSON Schema format). Before performing any task, you must decide how to call them based on the descriptions and parameters of these tools."}}
|
||||||
|
{{ tools | tojson(ensure_ascii=False) }}
|
||||||
|
{%- else %}
|
||||||
|
{%- for item in tools if item.type == "function" %}
|
||||||
|
|
||||||
|
|
||||||
|
Function:
|
||||||
|
def {{ item.function.name }}(
|
||||||
|
{%- for name, spec in item.function.parameters.properties.items() %}
|
||||||
|
{{- name }}: {{ py_type(spec.type) }}{% if not loop.last %},{% endif %}
|
||||||
|
{%- endfor %}):
|
||||||
|
"""
|
||||||
|
{{ item.function.description | trim }}
|
||||||
|
|
||||||
|
{# ---------- Args ---------- #}
|
||||||
|
{%- if item.function.parameters.properties %}
|
||||||
|
Args:
|
||||||
|
{%- for name, spec in item.function.parameters.properties.items() %}
|
||||||
|
|
||||||
|
- {{ name }} ({{ py_type(spec.type) }})
|
||||||
|
{%- if name in item.function.parameters.required %} [必填]{% else %} [选填]{% endif %}:
|
||||||
|
{{- " " ~ (spec.description or "") }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
{# ---------- Returns ---------- #}
|
||||||
|
{%- if item.function.returns is defined
|
||||||
|
and item.function.returns.properties is defined
|
||||||
|
and item.function.returns.properties %}
|
||||||
|
Returns:
|
||||||
|
{%- for name, spec in item.function.returns.properties.items() %}
|
||||||
|
|
||||||
|
- {{ name }} ({{ py_type(spec.type) }}):
|
||||||
|
{{- " " ~ (spec.description or "") }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
|
||||||
|
"""
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if tools is iterable and tools | length > 0 %}
|
||||||
|
|
||||||
|
{{"工具调用请遵循如下格式:\n<seed:tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>value_1</parameter>\n<parameter=example_parameter_2>This is the value for the second parameter\nthat can span\nmultiple lines</parameter>\n</function>\n</seed:tool_call>\n"}}
|
||||||
|
{%- endif %}
|
||||||
|
{# 结束 system 块行尾 #}
|
||||||
|
{%- if system_message is defined or tools is iterable and tools | length > 0 %}
|
||||||
|
{{ eos_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{# ---------- Thinking Budget ---------- #}
|
||||||
|
{%- if thinking_budget is defined %}
|
||||||
|
{%- if thinking_budget == 0 %}
|
||||||
|
{{ bos_token+"system" }}
|
||||||
|
{{ "You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions." }}
|
||||||
|
{{ eos_token }}
|
||||||
|
{%- elif not thinking_budget == -1 %}
|
||||||
|
{{ bos_token+"system" }}
|
||||||
|
{{ "You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is "}}{{thinking_budget}}{{". That is, you need to complete your thinking within "}}{{thinking_budget}}{{" tokens and start answering the user's questions. You will reflect on your thinking process every "}}{{ns.interval}}{{" tokens, stating how many tokens have been used and how many are left."}}
|
||||||
|
{{ eos_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
|
{# ---------- 逐条写出历史消息 ---------- #}
|
||||||
|
{%- for message in loop_messages %}
|
||||||
|
{%- if message.role == "assistant"
|
||||||
|
and message.tool_calls is defined
|
||||||
|
and message.tool_calls is iterable
|
||||||
|
and message.tool_calls | length > 0 %}
|
||||||
|
{{ bos_token + message.role }}
|
||||||
|
{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
|
||||||
|
{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
|
||||||
|
{{ "\n" + message.content | trim + "\n" }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- for tool_call in message.tool_calls %}
|
||||||
|
{%- if tool_call.function is defined %}{% set tool_call = tool_call.function %}{% endif %}
|
||||||
|
{{ "\n" + toolcall_begin_token + "\n<function=" + tool_call.name + ">\n" }}
|
||||||
|
{%- if tool_call.arguments is defined %}
|
||||||
|
{%- for arg_name, arg_value in tool_call.arguments | items %}
|
||||||
|
{{ "<parameter=" + arg_name + ">" }}
|
||||||
|
{%- set arg_value = arg_value if arg_value is string else arg_value | string %}
|
||||||
|
{{ arg_value+"</parameter>\n" }}
|
||||||
|
{%- endfor %}
|
||||||
|
{%- endif %}
|
||||||
|
{{ "</function>\n" + toolcall_end_token }}
|
||||||
|
{%- endfor %}
|
||||||
|
{{ eos_token }}
|
||||||
|
{%- elif message.role in ["user", "system"] %}
|
||||||
|
{{ bos_token + message.role + "\n" + message.content + eos_token }}
|
||||||
|
{%- elif message.role == "assistant" %}
|
||||||
|
{{ bos_token + message.role }}
|
||||||
|
{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
|
||||||
|
{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
|
||||||
|
{{ "\n" + message.content | trim + eos_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{# 包括 tool 角色,在这个逻辑 #}
|
||||||
|
{%- else %}
|
||||||
|
{{ bos_token + message.role + "\n" + message.content + eos_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endfor %}
|
||||||
|
{# ---------- 控制模型开始续写 ---------- #}
|
||||||
|
{%- if add_generation_prompt %}
|
||||||
|
{{ bos_token+"assistant\n" }}
|
||||||
|
{%- if thinking_budget == 0 %}
|
||||||
|
{{ think_begin_token+budget_begin_token }}
|
||||||
|
{%- endif %}
|
||||||
|
{%- endif %}
|
||||||
33
config.json
Normal file
33
config.json
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
{
|
||||||
|
"architectures": [
|
||||||
|
"SeedOssForCausalLM"
|
||||||
|
],
|
||||||
|
"attention_bias": true,
|
||||||
|
"attention_dropout": 0.1,
|
||||||
|
"attention_out_bias": false,
|
||||||
|
"bos_token_id": 0,
|
||||||
|
"pad_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"head_dim": 128,
|
||||||
|
"hidden_act": "silu",
|
||||||
|
"hidden_size": 5120,
|
||||||
|
"initializer_range": 0.02,
|
||||||
|
"intermediate_size": 27648,
|
||||||
|
"max_position_embeddings": 524288,
|
||||||
|
"mlp_bias": false,
|
||||||
|
"model_type": "seed_oss",
|
||||||
|
"num_attention_heads": 80,
|
||||||
|
"num_hidden_layers": 64,
|
||||||
|
"num_key_value_heads": 8,
|
||||||
|
"residual_dropout": 0.1,
|
||||||
|
"rms_norm_eps": 1e-06,
|
||||||
|
"rope_scaling": {
|
||||||
|
"rope_type": "default"
|
||||||
|
},
|
||||||
|
"rope_theta": 10000000.0,
|
||||||
|
"tie_word_embeddings": false,
|
||||||
|
"torch_dtype": "bfloat16",
|
||||||
|
"transformers_version": "4.55.0",
|
||||||
|
"use_cache": true,
|
||||||
|
"vocab_size": 155136
|
||||||
|
}
|
||||||
1
configuration.json
Normal file
1
configuration.json
Normal file
@ -0,0 +1 @@
|
|||||||
|
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
|
||||||
10
generation_config.json
Normal file
10
generation_config.json
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
{
|
||||||
|
"_from_model_config": true,
|
||||||
|
"bos_token_id": 0,
|
||||||
|
"pad_token_id": 1,
|
||||||
|
"eos_token_id": 2,
|
||||||
|
"transformers_version": "4.55.0",
|
||||||
|
"temperature": 1.1,
|
||||||
|
"top_p": 0.95
|
||||||
|
}
|
||||||
|
|
||||||
3
model-00001-of-00015.safetensors
Normal file
3
model-00001-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a6387b80f12db915254cbe82c26d393f0f5a10600ce7bda028e3ee90c256eecc
|
||||||
|
size 135
|
||||||
3
model-00002-of-00015.safetensors
Normal file
3
model-00002-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:fe2d0b95a5d785f8e2a18329296773e042b8caa9a3f0a1d9e8ef2c9bb4a14eea
|
||||||
|
size 135
|
||||||
3
model-00003-of-00015.safetensors
Normal file
3
model-00003-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1a3e358505119541fa85625546348a60f39685fba7549bd94c8e982d407a0555
|
||||||
|
size 135
|
||||||
3
model-00004-of-00015.safetensors
Normal file
3
model-00004-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:0d6bbfb4ab754f2cb391caa40f67dd9d349b5381b402574a0440813606a348c5
|
||||||
|
size 135
|
||||||
3
model-00005-of-00015.safetensors
Normal file
3
model-00005-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:107cce88b60faf9bad30769172dce01cd1764570f92cb0a80dece2e238167f23
|
||||||
|
size 135
|
||||||
3
model-00006-of-00015.safetensors
Normal file
3
model-00006-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:e71fa75e94020a23d9a15da86ed328bdc01462a0a3f09ecdd614f047a802301a
|
||||||
|
size 135
|
||||||
3
model-00007-of-00015.safetensors
Normal file
3
model-00007-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a04d657585986417b4957ae284b889c2b58083e39a90994a068ea4a25cfa27ae
|
||||||
|
size 135
|
||||||
3
model-00008-of-00015.safetensors
Normal file
3
model-00008-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:1ef369c73695b6d4ea90e68154005d90a2733f67053b10211830a8d85e9263c4
|
||||||
|
size 135
|
||||||
3
model-00009-of-00015.safetensors
Normal file
3
model-00009-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:63e354190fef1698af8cf2b2b6eb3ceb4627be4e15c886fcefae04c40046811e
|
||||||
|
size 135
|
||||||
3
model-00010-of-00015.safetensors
Normal file
3
model-00010-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:a4781bad8d0e3bee0f1adda8017b951edd34a57638420cadaabf433e6bde8d0c
|
||||||
|
size 135
|
||||||
3
model-00011-of-00015.safetensors
Normal file
3
model-00011-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:223165c90a98f80f66a5f2dcb94e6f09e3454974473fe14c6822c0628ee55f56
|
||||||
|
size 135
|
||||||
3
model-00012-of-00015.safetensors
Normal file
3
model-00012-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:8db709a2c461316819593bef8ae9e252cdf5da323f4361be62dd7f4d3c4c8f18
|
||||||
|
size 135
|
||||||
3
model-00013-of-00015.safetensors
Normal file
3
model-00013-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:4e6c7c009da0d562231304d6eef141a64f95a73e37b4d2576aa587a82b5713ec
|
||||||
|
size 135
|
||||||
3
model-00014-of-00015.safetensors
Normal file
3
model-00014-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:6d233a72fe9dc4cbea98e275729541d9ebf06a7d0ecf4edd68e0f86d8b021339
|
||||||
|
size 135
|
||||||
3
model-00015-of-00015.safetensors
Normal file
3
model-00015-of-00015.safetensors
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:edabb4aa838885534911083fa9d7c00468f9e43103eb1bf61dc4a033af42d1c8
|
||||||
|
size 135
|
||||||
779
model.safetensors.index.json
Normal file
779
model.safetensors.index.json
Normal file
@ -0,0 +1,779 @@
|
|||||||
|
{
|
||||||
|
"metadata": {
|
||||||
|
"total_parameters": 36151104512,
|
||||||
|
"total_size": 72302209024
|
||||||
|
},
|
||||||
|
"weight_map": {
|
||||||
|
"lm_head.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.embed_tokens.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
|
||||||
|
"model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
|
||||||
|
"model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
|
||||||
|
"model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
|
||||||
|
"model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
|
||||||
|
"model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
|
||||||
|
"model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
|
||||||
|
"model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
|
||||||
|
"model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
|
||||||
|
"model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
|
||||||
|
"model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
|
||||||
|
"model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
|
||||||
|
"model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
|
||||||
|
"model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
|
||||||
|
"model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
|
||||||
|
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
|
||||||
|
"model.norm.weight": "model-00015-of-00015.safetensors"
|
||||||
|
}
|
||||||
|
}
|
||||||
23
special_tokens_map.json
Normal file
23
special_tokens_map.json
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
{
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<seed:bos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<seed:eos>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<seed:pad>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
BIN
thinking_budget.png
Normal file
BIN
thinking_budget.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 186 KiB |
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
BIN
tokenizer.json
(Stored with Git LFS)
Normal file
Binary file not shown.
1035
tokenizer_config.json
Normal file
1035
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user