diff --git a/.gitattributes b/.gitattributes
index 53d7257..21b3632 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text
\ No newline at end of file
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
\ No newline at end of file
diff --git a/README.md b/README.md
index c35c9e1..c36def4 100644
--- a/README.md
+++ b/README.md
@@ -1,47 +1,627 @@
---
-license: Apache License 2.0
-
-#model-type:
-##如 gpt、phi、llama、chatglm、baichuan 等
-#- gpt
-
-#domain:
-##如 nlp、cv、audio、multi-modal
-#- nlp
-
-#language:
-##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
-#- cn
-
-#metrics:
-##如 CIDEr、Blue、ROUGE 等
-#- CIDEr
-
-#tags:
-##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
-#- pretrained
-
-#tools:
-##如 vllm、fastchat、llamacpp、AdaSeq 等
-#- vllm
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- vllm
+language:
+- en
+- zh
+base_model:
+- ByteDance-Seed/Seed-OSS-36B-Base
---
-### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
-#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型
-SDK下载
-```bash
-#安装ModelScope
-pip install modelscope
+
+ 👋 Hi, everyone!
+
+ We are ByteDance Seed Team.
+
+
+
+ You can get to know us better through the following channels👇
+
+
+
+
+
+
+
+
+# Seed-OSS Open-Source Models
+
+
+
+
+
+
+
+
+
+
+
+
+> [!NOTE]
+> This model card is dedicated to the `Seed-OSS-36B-Instruct` model.
+
+## News
+- [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`.
+
+## Introduction
+Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks.
+
+We release this series of models to the open-source community under the Apache-2.0 license.
+
+> [!NOTE]
+> Seed-OSS is primarily optimized for international (i18n) use cases.
+
+### Key Features
+- **Flexible Control of Thinking Budget**: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios.
+- **Enhanced Reasoning Capability**: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities.
+- **Agentic Intelligence**: Performs exceptionally well in agentic tasks such as tool-using and issue resolving.
+- **Research-Friendly**: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options.
+- **Native Long Context**: Trained with up-to-512K long context natively.
+
+### Model Summary
+
+Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation.
+
+
+
+| | |
+|:---:|:---:|
+| | **Seed-OSS-36B** |
+| **Parameters** | 36B |
+| **Attention** | GQA |
+| **Activation Function** | SwiGLU |
+| **Number of Layers** | 64 |
+| **Number of QKV Heads** | 80 / 8 / 8 |
+| **Head Size** | 128 |
+| **Hidden Size** | 5120 |
+| **Vocabulary Size** | 155K |
+| **Context Length** | 512K |
+| **RoPE Base Frequency** | 1e7 |
+
+
+
+
+## Evaluation Results
+
+### Seed-OSS-36B-Base
+
+Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., *w/ syn.*) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., *w/o syn.*), offering the community a high-performance foundation model unaffected by synthetic instruction data.
+
+
+
+
+
+| Benchmark |
+Seed1.6-Base |
+Qwen3-30B-A3B-Base-2507* |
+Qwen2.5-32B-Base* |
+Seed-OSS-36B-Base (w/ syn.) |
+Seed-OSS-36B-Base-woSyn (w/o syn.) |
+
+
+
+
+| Knowledge |
+
+
+| MMLU-Pro |
+70 |
+59.8 |
+58.5 (55.1) |
+65.1 |
+60.4 |
+
+
+| MMLU |
+88.8 |
+82.7 |
+84 (83.3) |
+84.9 |
+84.8 |
+
+
+| TriviaQA |
+91 |
+76.2 |
+76 |
+82.1 |
+81.9 |
+
+
+| GPQA-D |
+43.4 |
+37 |
+29.3 |
+31.7 |
+35.2 |
+
+
+| SimpleQA |
+17.1 |
+7.2 |
+6.1 |
+5.8 |
+7.4 |
+
+
+
+| Reasoning |
+
+
+| BBH |
+92.1 |
+81.4 |
+79.1 (84.5) |
+87.7 |
+87.2 |
+
+
+| AGIEval-en |
+78 |
+66.4 |
+65.6 |
+70.7 |
+70.1 |
+
+
+
+| Math |
+
+
+| GSM8K |
+93.1 |
+87 |
+87.5 (92.9) |
+90.8 |
+90.3 |
+
+
+| MATH |
+72.9 |
+61.1 |
+63.5 (57.7) |
+81.7 |
+61.3 |
+
+
+
+| Coding |
+
+
+| MBPP |
+83.6 |
+78.8 |
+77.8 (84.5) |
+80.6 |
+74.6 |
+
+
+| HumanEval |
+78 |
+70.7 |
+47.6 (58.5) |
+76.8 |
+75.6 |
+
+
+
+
+
+
+- Bold denotes open-source SOTA.
+
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)".
+
+
+### Seed-OSS-36B-Instruct
+
+
+
+
+
+| Benchmark |
+Seed1.6-Thinking-0715 |
+OAI-OSS-20B* |
+Qwen3-30B-A3B-Thinking-2507* |
+Qwen3-32B* |
+Gemma3-27B |
+Seed-OSS-36B-Instruct |
+
+
+
+
+| Knowledge |
+
+
+| MMLU-Pro |
+86.6 |
+76.2 |
+81.9 (80.9) |
+81.8 |
+67.5 |
+82.7 |
+
+
+| MMLU |
+90.6 |
+81.7 (85.3) |
+86.9 |
+86.2 |
+76.9 |
+87.4 |
+
+
+| GPQA-D |
+80.7 |
+72.2 (71.5) |
+71.4 (73.4) |
+66.7 (68.4) |
+42.4 |
+71.4 |
+
+
+| SuperGPQA |
+63.4 |
+50.1 |
+57.3 (56.8) |
+49.3 |
+- |
+55.7 |
+
+
+| SimpleQA |
+23.7 |
+6.7 |
+23.6 |
+8.6 |
+10 |
+9.7 |
+
+
+
+| Math |
+
+
+| AIME24 |
+90.3 |
+92.7 (92.1) |
+87.7 |
+82.7 (81.4) |
+- |
+91.7 |
+
+
+| AIME25 |
+86 |
+90.3 (91.7) |
+81.3 (85) |
+73.3 (72.9) |
+- |
+84.7 |
+
+
+| BeyondAIME |
+60 |
+69 |
+56 |
+29 |
+- |
+65 |
+
+
+
+| Reasoning |
+
+
+| ArcAGI V2 |
+50.3 |
+41.7 |
+37.8 |
+14.4 |
+- |
+40.6 |
+
+
+| KORBench |
+74.8 |
+72.3 |
+70.2 |
+65.4 |
+- |
+70.6 |
+
+
+
+| Coding |
+
+
+LiveCodeBench v6 (02/2025-05/2025) |
+66.8 |
+63.8 |
+60.3 (66) |
+53.4 |
+- |
+67.4 |
+
+
+| HLE |
+13.9 |
+12.7 (10.9) |
+8.7 |
+6.9 |
+- |
+10.1 |
+
+
+
+| Instruction Following |
+
+
+| IFEval |
+86.3 |
+92.8 |
+88 (88.9) |
+88.4 (85) |
+90.4 |
+85.8 |
+
+
+
+
+| Agent |
+
+
+| TAU1-Retail |
+63 |
+(54.8) |
+58.7 (67.8) |
+40.9 |
+- |
+70.4 |
+
+
+| TAU1-Airline |
+49 |
+(38) |
+47 (48) |
+38 |
+- |
+46 |
+
+
+SWE-Bench Verified (OpenHands) |
+41.8 |
+(60.7) |
+31 |
+23.4 |
+- |
+56 |
+
+
+SWE-Bench Verified (AgentLess 4*10) |
+48.4 |
+- |
+33.5 |
+39.7 |
+- |
+47 |
+
+
+| Multi-SWE-Bench |
+17.7 |
+- |
+9.5 |
+7.7 |
+- |
+17 |
+
+
+
+| Multilingualism |
+
+
+| MMMLU |
+84.3 |
+77.4 (75.7) |
+79 |
+79 (80.6) |
+- |
+78.4 |
+
+
+
+| Long Context |
+
+
+RULER (128K) |
+94.5 |
+78.7 |
+94.5 |
+77.5 |
+- |
+94.6 |
+
+
+
+| Safety |
+
+
+| AIR-Bench |
+- |
+- |
+- |
+- |
+- |
+75.6 |
+
+
+
+
+
+
+- Bold denotes open-source SOTA. Underlined indicates the second place in the open-source model.
+
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". Some results have been omitted due to the failure of the evaluation run.
+
+- The results of Gemma3-27B are sourced directly from its technical report.
+
+- Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, top_p=0.95. Specifically, for Taubench, temperature=1, top_p=0.7.
+
+
+
+> [!NOTE]
+> We recommend sampling with `temperature=1.1` and `top_p=0.95`.
+
+### Thinking Budget
+
+Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget.
+
+
+
+Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes.
```
+
+Got it, let's try to solve this problem step by step. The problem says ... ...
+I have used 129 tokens, and there are 383 tokens remaining for use.
+Using the power rule, ... ...
+I have used 258 tokens, and there are 254 tokens remaining for use.
+Alternatively, remember that ... ...
+I have used 393 tokens, and there are 119 tokens remaining for use.
+Because if ... ...
+I have exhausted my token budget, and now I will start answering the question.
+
+To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted).
+```
+
+If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value.
+
+## Quick Start
+```shell
+pip3 install -r requirements.txt
+pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss
+```
+
```python
-#SDK模型下载
-from modelscope import snapshot_download
-model_dir = snapshot_download('ByteDance-Seed/Seed-OSS-36B-Instruct')
-```
-Git下载
-```
-#Git模型下载
-git clone https://www.modelscope.cn/ByteDance-Seed/Seed-OSS-36B-Instruct.git
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import os
+import re
+
+model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct"
+
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here
+messages = [
+ {"role": "user", "content": "How to make pasta?"},
+]
+tokenized_chat = tokenizer.apply_chat_template(
+ messages,
+ tokenize=True,
+ add_generation_prompt=True,
+ return_tensors="pt",
+ thinking_budget=512 # control the thinking budget
+)
+
+outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)
+
+output_text = tokenizer.decode(outputs[0])
```
-如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。
\ No newline at end of file
+## Inference
+
+### Download Model
+
+Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct`
+
+### Transformers
+The `generate.py` script provides a simple interface for model inference with configurable options.
+
+#### Basic Usage
+```shell
+cd inference
+python3 generate.py --model_path /path/to/model
+```
+
+#### Key Parameters
+| Parameter | Description |
+|-----------|-------------|
+| `--model_path` | Path to the pretrained model directory (required) |
+| `--prompts` | Input prompts (default: sample cooking/code questions) |
+| `--max_new_tokens` | Maximum tokens to generate (default: 4096) |
+| `--attn_implementation` | Attention mechanism: `flash_attention_2` (default) or `eager` |
+| `--load_in_4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) |
+| `--thinking_budget` | Thinking budget in tokens (default: -1 for unlimited budget) |
+
+#### Quantization Examples
+```shell
+# 8-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_8bit True
+
+# 4-bit quantization
+python3 generate.py --model_path /path/to/model --load_in_4bit True
+```
+
+#### Custom Prompts
+```shell
+python3 generate.py --model_path /path/to/model --prompts "['What is machine learning?', 'Explain quantum computing']"
+```
+
+### vLLM
+Use vllm >= 0.10.0 or higher for inference.
+
+- First install vLLM with Seed-OSS support version:
+```shell
+VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss
+```
+
+- Start vLLM API server:
+```shell
+python3 -m vllm.entrypoints.openai.api_server \
+ --host localhost \
+ --port 4321 \
+ --enable-auto-tool-choice \
+ --tool-call-parser seed_oss \
+ --trust-remote-code \
+ --model ./Seed-OSS-36B-Instruct \
+ --chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \
+ --tensor-parallel-size 8 \
+ --dtype bfloat16 \
+ --served-model-name seed_oss
+```
+
+- Test with OpenAI client:
+
+Chat
+
+```shell
+python3 inference/vllm_chat.py
+```
+
+Tool Call
+```shell
+python3 inference/vllm_tool_call.py
+```
+
+
+## Model Card
+See [MODEL_CARD](./MODEL_CARD.md).
+
+## License
+This project is licensed under Apache-2.0. See the [LICENSE](./LICENSE) flie for details.
+
+## Citation
+
+```bibtex
+@misc{seed2025seed-oss,
+ author={ByteDance Seed Team},
+ title={Seed-OSS Open-Source Models},
+ year={2025},
+ howpublished={\url{https://github.com/ByteDance-Seed/seed-oss}}
+}
+```
+
+## About [ByteDance Seed Team](https://seed.bytedance.com/)
+
+Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
\ No newline at end of file
diff --git a/chat_template.jinja b/chat_template.jinja
new file mode 100644
index 0000000..d92d003
--- /dev/null
+++ b/chat_template.jinja
@@ -0,0 +1,171 @@
+{# ----------‑‑‑ special token variables ‑‑‑---------- #}
+{%- set bos_token = '' -%}
+{%- set eos_token = '' -%}
+{%- set pad_token = '' -%}
+{%- set toolcall_begin_token = '' -%}
+{%- set toolcall_end_token = '' -%}
+{%- set think_begin_token = '' -%}
+{%- set think_end_token = '' -%}
+{%- set budget_begin_token = ''-%}
+{%- set budget_end_token = ''-%}
+{# -------------- reflection-interval lookup -------------- #}
+{%- if not thinking_budget is defined %}
+{%- set thinking_budget = -1 -%}
+{%- endif -%}
+{%- set budget_reflections_v05 = {
+ 0: 0,
+ 512: 128,
+ 1024: 256,
+ 2048: 512,
+ 4096: 512,
+ 8192: 1024,
+ 16384: 1024
+} -%}
+{# 找到 “大于等于 thinking_budget” 的第一个档位 #}
+{%- set ns = namespace(interval = None) -%}
+{%- for k, v in budget_reflections_v05 | dictsort -%}
+ {%- if ns.interval is none and thinking_budget <= k -%}
+ {%- set ns.interval = v -%}
+ {%- endif -%}
+{%- endfor -%}
+{# 若超过最大档位,则用最后一个档位的值 #}
+{%- if ns.interval is none -%}
+ {%- set ns.interval = budget_reflections_v05[16384] -%}
+{%- endif -%}
+{# ---------- 预处理 system 消息 ---------- #}
+{%- if messages[0]["role"] == "system" %}
+{%- set system_message = messages[0]["content"] %}
+{%- set loop_messages = messages[1:] %}
+{%- else %}
+{%- set loop_messages = messages %}
+{%- endif %}
+{# ---------- 确保 tools 存在 ---------- #}
+{%- if not tools is defined or tools is none %}
+{%- set tools = [] %}
+{%- endif %}
+{# tools2doc.jinja #}
+{%- macro py_type(t) -%}
+ {%- if t == "string" -%}str
+ {%- elif t in ("number", "integer") -%}int
+ {%- elif t == "boolean" -%}bool
+ {%- elif t == "array" -%}list
+ {%- else -%}Any{%- endif -%}
+{%- endmacro -%}
+{# ---------- 输出 system 块 ---------- #}
+{%- if system_message is defined %}
+{{ bos_token + "system\n" + system_message }}
+{%- else %}
+{%- if tools is iterable and tools | length > 0 %}
+{{ bos_token + "system\nYou are Doubao, a helpful AI assistant. You may call one or more functions to assist with the user query." }}
+{%- endif %}
+{%- endif %}
+{%- if use_json_tooldef is defined and use_json_tooldef %}
+
+{{"Tool List:\nYou are authorized to use the following tools (described in JSON Schema format). Before performing any task, you must decide how to call them based on the descriptions and parameters of these tools."}}
+{{ tools | tojson(ensure_ascii=False) }}
+{%- else %}
+{%- for item in tools if item.type == "function" %}
+
+
+Function:
+def {{ item.function.name }}(
+{%- for name, spec in item.function.parameters.properties.items() %}
+ {{- name }}: {{ py_type(spec.type) }}{% if not loop.last %},{% endif %}
+{%- endfor %}):
+ """
+ {{ item.function.description | trim }}
+
+ {# ---------- Args ---------- #}
+ {%- if item.function.parameters.properties %}
+ Args:
+ {%- for name, spec in item.function.parameters.properties.items() %}
+
+ - {{ name }} ({{ py_type(spec.type) }})
+ {%- if name in item.function.parameters.required %} [必填]{% else %} [选填]{% endif %}:
+ {{- " " ~ (spec.description or "") }}
+ {%- endfor %}
+ {%- endif %}
+
+ {# ---------- Returns ---------- #}
+ {%- if item.function.returns is defined
+ and item.function.returns.properties is defined
+ and item.function.returns.properties %}
+ Returns:
+ {%- for name, spec in item.function.returns.properties.items() %}
+
+ - {{ name }} ({{ py_type(spec.type) }}):
+ {{- " " ~ (spec.description or "") }}
+ {%- endfor %}
+ {%- endif %}
+
+ """
+{%- endfor %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+
+{{"工具调用请遵循如下格式:\n\n\nvalue_1\nThis is the value for the second parameter\nthat can span\nmultiple lines\n\n\n"}}
+{%- endif %}
+{# 结束 system 块行尾 #}
+{%- if system_message is defined or tools is iterable and tools | length > 0 %}
+{{ eos_token }}
+{%- endif %}
+{# ---------- Thinking Budget ---------- #}
+{%- if thinking_budget is defined %}
+{%- if thinking_budget == 0 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions." }}
+{{ eos_token }}
+{%- elif not thinking_budget == -1 %}
+{{ bos_token+"system" }}
+{{ "You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is "}}{{thinking_budget}}{{". That is, you need to complete your thinking within "}}{{thinking_budget}}{{" tokens and start answering the user's questions. You will reflect on your thinking process every "}}{{ns.interval}}{{" tokens, stating how many tokens have been used and how many are left."}}
+{{ eos_token }}
+{%- endif %}
+{%- endif %}
+{# ---------- 逐条写出历史消息 ---------- #}
+{%- for message in loop_messages %}
+{%- if message.role == "assistant"
+ and message.tool_calls is defined
+ and message.tool_calls is iterable
+ and message.tool_calls | length > 0 %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + "\n" }}
+{%- endif %}
+{%- for tool_call in message.tool_calls %}
+{%- if tool_call.function is defined %}{% set tool_call = tool_call.function %}{% endif %}
+{{ "\n" + toolcall_begin_token + "\n\n" }}
+{%- if tool_call.arguments is defined %}
+{%- for arg_name, arg_value in tool_call.arguments | items %}
+{{ "" }}
+{%- set arg_value = arg_value if arg_value is string else arg_value | string %}
+{{ arg_value+"\n" }}
+{%- endfor %}
+{%- endif %}
+{{ "\n" + toolcall_end_token }}
+{%- endfor %}
+{{ eos_token }}
+{%- elif message.role in ["user", "system"] %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- elif message.role == "assistant" %}
+{{ bos_token + message.role }}
+{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %}
+{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }}
+{%- endif %}
+{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+{{ "\n" + message.content | trim + eos_token }}
+{%- endif %}
+{# 包括 tool 角色,在这个逻辑 #}
+{%- else %}
+{{ bos_token + message.role + "\n" + message.content + eos_token }}
+{%- endif %}
+{%- endfor %}
+{# ---------- 控制模型开始续写 ---------- #}
+{%- if add_generation_prompt %}
+{{ bos_token+"assistant\n" }}
+{%- if thinking_budget == 0 %}
+{{ think_begin_token+budget_begin_token }}
+{%- endif %}
+{%- endif %}
\ No newline at end of file
diff --git a/config.json b/config.json
new file mode 100644
index 0000000..e094445
--- /dev/null
+++ b/config.json
@@ -0,0 +1,33 @@
+{
+ "architectures": [
+ "SeedOssForCausalLM"
+ ],
+ "attention_bias": true,
+ "attention_dropout": 0.1,
+ "attention_out_bias": false,
+ "bos_token_id": 0,
+ "pad_token_id": 1,
+ "eos_token_id": 2,
+ "head_dim": 128,
+ "hidden_act": "silu",
+ "hidden_size": 5120,
+ "initializer_range": 0.02,
+ "intermediate_size": 27648,
+ "max_position_embeddings": 524288,
+ "mlp_bias": false,
+ "model_type": "seed_oss",
+ "num_attention_heads": 80,
+ "num_hidden_layers": 64,
+ "num_key_value_heads": 8,
+ "residual_dropout": 0.1,
+ "rms_norm_eps": 1e-06,
+ "rope_scaling": {
+ "rope_type": "default"
+ },
+ "rope_theta": 10000000.0,
+ "tie_word_embeddings": false,
+ "torch_dtype": "bfloat16",
+ "transformers_version": "4.55.0",
+ "use_cache": true,
+ "vocab_size": 155136
+}
\ No newline at end of file
diff --git a/configuration.json b/configuration.json
new file mode 100644
index 0000000..bbeeda1
--- /dev/null
+++ b/configuration.json
@@ -0,0 +1 @@
+{"framework": "pytorch", "task": "text-generation", "allow_remote": true}
\ No newline at end of file
diff --git a/generation_config.json b/generation_config.json
new file mode 100644
index 0000000..3a7b67b
--- /dev/null
+++ b/generation_config.json
@@ -0,0 +1,10 @@
+{
+ "_from_model_config": true,
+ "bos_token_id": 0,
+ "pad_token_id": 1,
+ "eos_token_id": 2,
+ "transformers_version": "4.55.0",
+ "temperature": 1.1,
+ "top_p": 0.95
+}
+
\ No newline at end of file
diff --git a/model-00001-of-00015.safetensors b/model-00001-of-00015.safetensors
new file mode 100644
index 0000000..b69abcc
--- /dev/null
+++ b/model-00001-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a6387b80f12db915254cbe82c26d393f0f5a10600ce7bda028e3ee90c256eecc
+size 135
diff --git a/model-00002-of-00015.safetensors b/model-00002-of-00015.safetensors
new file mode 100644
index 0000000..854a48a
--- /dev/null
+++ b/model-00002-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2d0b95a5d785f8e2a18329296773e042b8caa9a3f0a1d9e8ef2c9bb4a14eea
+size 135
diff --git a/model-00003-of-00015.safetensors b/model-00003-of-00015.safetensors
new file mode 100644
index 0000000..7c82696
--- /dev/null
+++ b/model-00003-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1a3e358505119541fa85625546348a60f39685fba7549bd94c8e982d407a0555
+size 135
diff --git a/model-00004-of-00015.safetensors b/model-00004-of-00015.safetensors
new file mode 100644
index 0000000..7adcec1
--- /dev/null
+++ b/model-00004-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0d6bbfb4ab754f2cb391caa40f67dd9d349b5381b402574a0440813606a348c5
+size 135
diff --git a/model-00005-of-00015.safetensors b/model-00005-of-00015.safetensors
new file mode 100644
index 0000000..bcb869e
--- /dev/null
+++ b/model-00005-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:107cce88b60faf9bad30769172dce01cd1764570f92cb0a80dece2e238167f23
+size 135
diff --git a/model-00006-of-00015.safetensors b/model-00006-of-00015.safetensors
new file mode 100644
index 0000000..ead20d5
--- /dev/null
+++ b/model-00006-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:e71fa75e94020a23d9a15da86ed328bdc01462a0a3f09ecdd614f047a802301a
+size 135
diff --git a/model-00007-of-00015.safetensors b/model-00007-of-00015.safetensors
new file mode 100644
index 0000000..96a8710
--- /dev/null
+++ b/model-00007-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a04d657585986417b4957ae284b889c2b58083e39a90994a068ea4a25cfa27ae
+size 135
diff --git a/model-00008-of-00015.safetensors b/model-00008-of-00015.safetensors
new file mode 100644
index 0000000..017e25f
--- /dev/null
+++ b/model-00008-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:1ef369c73695b6d4ea90e68154005d90a2733f67053b10211830a8d85e9263c4
+size 135
diff --git a/model-00009-of-00015.safetensors b/model-00009-of-00015.safetensors
new file mode 100644
index 0000000..5057972
--- /dev/null
+++ b/model-00009-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:63e354190fef1698af8cf2b2b6eb3ceb4627be4e15c886fcefae04c40046811e
+size 135
diff --git a/model-00010-of-00015.safetensors b/model-00010-of-00015.safetensors
new file mode 100644
index 0000000..d1d46d7
--- /dev/null
+++ b/model-00010-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a4781bad8d0e3bee0f1adda8017b951edd34a57638420cadaabf433e6bde8d0c
+size 135
diff --git a/model-00011-of-00015.safetensors b/model-00011-of-00015.safetensors
new file mode 100644
index 0000000..ced0535
--- /dev/null
+++ b/model-00011-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:223165c90a98f80f66a5f2dcb94e6f09e3454974473fe14c6822c0628ee55f56
+size 135
diff --git a/model-00012-of-00015.safetensors b/model-00012-of-00015.safetensors
new file mode 100644
index 0000000..7ffc30a
--- /dev/null
+++ b/model-00012-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8db709a2c461316819593bef8ae9e252cdf5da323f4361be62dd7f4d3c4c8f18
+size 135
diff --git a/model-00013-of-00015.safetensors b/model-00013-of-00015.safetensors
new file mode 100644
index 0000000..3c9b063
--- /dev/null
+++ b/model-00013-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:4e6c7c009da0d562231304d6eef141a64f95a73e37b4d2576aa587a82b5713ec
+size 135
diff --git a/model-00014-of-00015.safetensors b/model-00014-of-00015.safetensors
new file mode 100644
index 0000000..6c95ff9
--- /dev/null
+++ b/model-00014-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6d233a72fe9dc4cbea98e275729541d9ebf06a7d0ecf4edd68e0f86d8b021339
+size 135
diff --git a/model-00015-of-00015.safetensors b/model-00015-of-00015.safetensors
new file mode 100644
index 0000000..72710eb
--- /dev/null
+++ b/model-00015-of-00015.safetensors
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:edabb4aa838885534911083fa9d7c00468f9e43103eb1bf61dc4a033af42d1c8
+size 135
diff --git a/model.safetensors.index.json b/model.safetensors.index.json
new file mode 100644
index 0000000..4757f05
--- /dev/null
+++ b/model.safetensors.index.json
@@ -0,0 +1,779 @@
+{
+ "metadata": {
+ "total_parameters": 36151104512,
+ "total_size": 72302209024
+ },
+ "weight_map": {
+ "lm_head.weight": "model-00015-of-00015.safetensors",
+ "model.embed_tokens.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors",
+ "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors",
+ "model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors",
+ "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors",
+ "model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors",
+ "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors",
+ "model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors",
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors",
+ "model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors",
+ "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors",
+ "model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors",
+ "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors",
+ "model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors",
+ "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors",
+ "model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors",
+ "model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors",
+ "model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors",
+ "model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors",
+ "model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors",
+ "model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors",
+ "model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors",
+ "model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors",
+ "model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors",
+ "model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors",
+ "model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors",
+ "model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors",
+ "model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors",
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors",
+ "model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors",
+ "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors",
+ "model.norm.weight": "model-00015-of-00015.safetensors"
+ }
+}
diff --git a/special_tokens_map.json b/special_tokens_map.json
new file mode 100644
index 0000000..7dd43a5
--- /dev/null
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
+{
+ "bos_token": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "eos_token": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ },
+ "pad_token": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false
+ }
+}
diff --git a/thinking_budget.png b/thinking_budget.png
new file mode 100644
index 0000000..ab0237b
Binary files /dev/null and b/thinking_budget.png differ
diff --git a/tokenizer.json b/tokenizer.json
new file mode 100644
index 0000000..dc0d8c9
--- /dev/null
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f6bd848f52451824a3033a9f1e67eea5b399a13c90f845a332d3a29537e05827
+size 11883696
diff --git a/tokenizer_config.json b/tokenizer_config.json
new file mode 100644
index 0000000..c72b8f0
--- /dev/null
+++ b/tokenizer_config.json
@@ -0,0 +1,1035 @@
+{
+ "added_tokens_decoder": {
+ "0": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "1": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "2": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "3": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "4": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "5": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "6": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "7": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "8": {
+ "content": "",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": false
+ },
+ "9": {
+ "content": "<[PLHD9_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "10": {
+ "content": "<[PLHD10_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "11": {
+ "content": "<[PLHD11_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "12": {
+ "content": "<[PLHD12_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "13": {
+ "content": "<[PLHD13_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "14": {
+ "content": "<[PLHD14_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "15": {
+ "content": "<[PLHD15_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "16": {
+ "content": "<[PLHD16_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "17": {
+ "content": "<[PLHD17_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "18": {
+ "content": "<[PLHD18_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "19": {
+ "content": "<[PLHD19_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "20": {
+ "content": "<[PLHD20_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "21": {
+ "content": "<[PLHD21_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "22": {
+ "content": "<[PLHD22_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "23": {
+ "content": "<[PLHD23_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "24": {
+ "content": "<[PLHD24_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "25": {
+ "content": "<[PLHD25_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "26": {
+ "content": "<[PLHD26_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "27": {
+ "content": "<[PLHD27_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "28": {
+ "content": "<[PLHD28_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "29": {
+ "content": "<[PLHD29_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "30": {
+ "content": "<[PLHD30_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "31": {
+ "content": "<[PLHD31_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "32": {
+ "content": "<[PLHD32_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "33": {
+ "content": "<[PLHD33_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "34": {
+ "content": "<[PLHD34_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "35": {
+ "content": "<[PLHD35_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "36": {
+ "content": "<[PLHD36_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "37": {
+ "content": "<[PLHD37_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "38": {
+ "content": "<[PLHD38_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "39": {
+ "content": "<[PLHD39_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "40": {
+ "content": "<[PLHD40_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "41": {
+ "content": "<[PLHD41_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "42": {
+ "content": "<[PLHD42_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "43": {
+ "content": "<[PLHD43_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "44": {
+ "content": "<[PLHD44_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "45": {
+ "content": "<[PLHD45_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "46": {
+ "content": "<[PLHD46_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "47": {
+ "content": "<[PLHD47_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "48": {
+ "content": "<[PLHD48_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "49": {
+ "content": "<[PLHD49_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "50": {
+ "content": "<[PLHD50_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "51": {
+ "content": "<[PLHD51_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "52": {
+ "content": "<[PLHD52_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "53": {
+ "content": "<[PLHD53_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "54": {
+ "content": "<[PLHD54_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "55": {
+ "content": "<[PLHD55_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "56": {
+ "content": "<[PLHD56_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "57": {
+ "content": "<[PLHD57_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "58": {
+ "content": "<[PLHD58_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "59": {
+ "content": "<[PLHD59_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "60": {
+ "content": "<[PLHD60_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "61": {
+ "content": "<[PLHD61_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "62": {
+ "content": "<[PLHD62_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "63": {
+ "content": "<[PLHD63_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "64": {
+ "content": "<[PLHD64_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "65": {
+ "content": "<[PLHD65_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "66": {
+ "content": "<[PLHD66_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "67": {
+ "content": "<[PLHD67_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "68": {
+ "content": "<[PLHD68_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "69": {
+ "content": "<[PLHD69_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "70": {
+ "content": "<[PLHD70_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "71": {
+ "content": "<[PLHD71_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "72": {
+ "content": "<[PLHD72_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "73": {
+ "content": "<[PLHD73_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "74": {
+ "content": "<[PLHD74_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "75": {
+ "content": "<[PLHD75_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "76": {
+ "content": "<[PLHD76_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "77": {
+ "content": "<[PLHD77_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "78": {
+ "content": "<[PLHD78_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "79": {
+ "content": "<[PLHD79_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "80": {
+ "content": "<[PLHD80_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "81": {
+ "content": "<[PLHD81_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "82": {
+ "content": "<[PLHD82_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "83": {
+ "content": "<[PLHD83_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "84": {
+ "content": "<[PLHD84_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "85": {
+ "content": "<[PLHD85_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "86": {
+ "content": "<[PLHD86_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "87": {
+ "content": "<[PLHD87_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "88": {
+ "content": "<[PLHD88_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "89": {
+ "content": "<[PLHD89_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "90": {
+ "content": "<[PLHD90_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "91": {
+ "content": "<[PLHD91_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "92": {
+ "content": "<[PLHD92_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "93": {
+ "content": "<[PLHD93_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "94": {
+ "content": "<[PLHD94_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "95": {
+ "content": "<[PLHD95_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "96": {
+ "content": "<[PLHD96_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "97": {
+ "content": "<[PLHD97_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "98": {
+ "content": "<[PLHD98_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "99": {
+ "content": "<[PLHD99_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "100": {
+ "content": "<[PLHD100_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "101": {
+ "content": "<[PLHD101_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "102": {
+ "content": "<[PLHD102_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "103": {
+ "content": "<[PLHD103_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "104": {
+ "content": "<[PLHD104_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "105": {
+ "content": "<[PLHD105_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "106": {
+ "content": "<[PLHD106_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "107": {
+ "content": "<[PLHD107_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "108": {
+ "content": "<[PLHD108_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "109": {
+ "content": "<[PLHD109_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "110": {
+ "content": "<[PLHD110_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "111": {
+ "content": "<[PLHD111_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "112": {
+ "content": "<[PLHD112_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "113": {
+ "content": "<[PLHD113_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "114": {
+ "content": "<[PLHD114_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "115": {
+ "content": "<[PLHD115_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "116": {
+ "content": "<[PLHD116_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "117": {
+ "content": "<[PLHD117_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "118": {
+ "content": "<[PLHD118_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "119": {
+ "content": "<[PLHD119_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "120": {
+ "content": "<[PLHD120_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "121": {
+ "content": "<[PLHD121_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "122": {
+ "content": "<[PLHD122_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "123": {
+ "content": "<[PLHD123_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "124": {
+ "content": "<[PLHD124_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "125": {
+ "content": "<[PLHD125_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "126": {
+ "content": "<[PLHD126_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ },
+ "127": {
+ "content": "<[PLHD127_never_used]>",
+ "lstrip": false,
+ "normalized": false,
+ "rstrip": false,
+ "single_word": false,
+ "special": true
+ }
+ },
+ "bos_token": "",
+ "clean_up_tokenization_spaces": false,
+ "eos_token": "",
+ "extra_special_tokens": {},
+ "model_max_length": 1000000000000000019884624838656,
+ "pad_token": "",
+ "tokenizer_class": "PreTrainedTokenizerFast"
+}