diff --git a/.gitattributes b/.gitattributes index 53d7257..21b3632 100644 --- a/.gitattributes +++ b/.gitattributes @@ -44,4 +44,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.tar filter=lfs diff=lfs merge=lfs -text *.wasm filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text -*tfevents* filter=lfs diff=lfs merge=lfs -text \ No newline at end of file +*tfevents* filter=lfs diff=lfs merge=lfs -text + +tokenizer.json filter=lfs diff=lfs merge=lfs -text \ No newline at end of file diff --git a/README.md b/README.md index c35c9e1..c36def4 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,627 @@ --- -license: Apache License 2.0 - -#model-type: -##如 gpt、phi、llama、chatglm、baichuan 等 -#- gpt - -#domain: -##如 nlp、cv、audio、multi-modal -#- nlp - -#language: -##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa -#- cn - -#metrics: -##如 CIDEr、Blue、ROUGE 等 -#- CIDEr - -#tags: -##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他 -#- pretrained - -#tools: -##如 vllm、fastchat、llamacpp、AdaSeq 等 -#- vllm +license: apache-2.0 +pipeline_tag: text-generation +library_name: transformers +tags: +- vllm +language: +- en +- zh +base_model: +- ByteDance-Seed/Seed-OSS-36B-Base --- -### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。 -#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型 -SDK下载 -```bash -#安装ModelScope -pip install modelscope +
+ 👋 Hi, everyone! +
+ We are ByteDance Seed Team. +
+ +

+ You can get to know us better through the following channels👇 +
+ + +

+ +![seed logo](https://github.com/user-attachments/assets/c42e675e-497c-4508-8bb9-093ad4d1f216) + + +# Seed-OSS Open-Source Models +

+ + + + + + +
+ + +

+ +> [!NOTE] +> This model card is dedicated to the `Seed-OSS-36B-Instruct` model. + +## News +- [2025/08/20]🔥We release `Seed-OSS-36B-Base` (both with and without synthetic data versions) and `Seed-OSS-36B-Instruct`. + +## Introduction +Seed-OSS is a series of open-source large language models developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features. Although trained with only 12T tokens, Seed-OSS achieves excellent performance on several popular open benchmarks. + +We release this series of models to the open-source community under the Apache-2.0 license. + +> [!NOTE] +> Seed-OSS is primarily optimized for international (i18n) use cases. + +### Key Features +- **Flexible Control of Thinking Budget**: Allowing users to flexibly adjust the reasoning length as needed. This capability of dynamically controlling the reasoning length enhances inference efficiency in practical application scenarios. +- **Enhanced Reasoning Capability**: Specifically optimized for reasoning tasks while maintaining balanced and excellent general capabilities. +- **Agentic Intelligence**: Performs exceptionally well in agentic tasks such as tool-using and issue resolving. +- **Research-Friendly**: Given that the inclusion of synthetic instruction data in pre-training may affect the post-training research, we released pre-trained models both with and without instruction data, providing the research community with more diverse options. +- **Native Long Context**: Trained with up-to-512K long context natively. + +### Model Summary + +Seed-OSS adopts the popular causal language model architecture with RoPE, GQA attention, RMSNorm and SwiGLU activation. + +
+ +| | | +|:---:|:---:| +| | **Seed-OSS-36B** | +| **Parameters** | 36B | +| **Attention** | GQA | +| **Activation Function** | SwiGLU | +| **Number of Layers** | 64 | +| **Number of QKV Heads** | 80 / 8 / 8 | +| **Head Size** | 128 | +| **Hidden Size** | 5120 | +| **Vocabulary Size** | 155K | +| **Context Length** | 512K | +| **RoPE Base Frequency** | 1e7 | + +
+ + +## Evaluation Results + +### Seed-OSS-36B-Base + +Incorporating synthetic instruction data into pretraining leads to improved performance on most benchmarks. We adopt the version augmented with synthetic instruction data (i.e., *w/ syn.*) as `Seed-OSS-36B-Base`. We also release `Seed-OSS-36B-Base-woSyn` trained without such data (i.e., *w/o syn.*), offering the community a high-performance foundation model unaffected by synthetic instruction data. + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkSeed1.6-BaseQwen3-30B-A3B-Base-2507*Qwen2.5-32B-Base*Seed-OSS-36B-Base
(w/ syn.)
Seed-OSS-36B-Base-woSyn
(w/o syn.)
Knowledge
MMLU-Pro7059.858.5 (55.1)65.160.4
MMLU88.882.784 (83.3)84.984.8
TriviaQA9176.27682.181.9
GPQA-D43.43729.331.735.2
SimpleQA17.17.26.15.87.4
Reasoning
BBH92.181.479.1 (84.5)87.787.2
AGIEval-en7866.465.670.770.1
Math
GSM8K93.18787.5 (92.9)90.890.3
MATH72.961.163.5 (57.7)81.761.3
Coding
MBPP83.678.877.8 (84.5)80.674.6
HumanEval7870.747.6 (58.5)76.875.6
+
+ + +- Bold denotes open-source SOTA. +
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". + + +### Seed-OSS-36B-Instruct + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkSeed1.6-Thinking-0715OAI-OSS-20B*Qwen3-30B-A3B-Thinking-2507*Qwen3-32B*Gemma3-27BSeed-OSS-36B-Instruct
Knowledge
MMLU-Pro86.676.281.9 (80.9)81.867.582.7
MMLU90.681.7 (85.3)86.986.276.987.4
GPQA-D80.772.2 (71.5)71.4 (73.4)66.7 (68.4)42.471.4
SuperGPQA63.450.157.3 (56.8)49.3-55.7
SimpleQA23.76.723.68.6109.7
Math
AIME2490.392.7 (92.1)87.782.7 (81.4)-91.7
AIME258690.3 (91.7)81.3 (85)73.3 (72.9)-84.7
BeyondAIME60695629-65
Reasoning
ArcAGI V250.341.737.814.4-40.6
KORBench74.872.370.265.4-70.6
Coding
LiveCodeBench v6
(02/2025-05/2025)
66.863.860.3 (66)53.4-67.4
HLE13.912.7 (10.9)8.76.9-10.1
Instruction Following
IFEval86.392.888 (88.9)88.4 (85)90.485.8
Agent
TAU1-Retail63(54.8)58.7 (67.8)40.9-70.4
TAU1-Airline49(38)47 (48)38-46
SWE-Bench Verified
(OpenHands)
41.8(60.7)3123.4-56
SWE-Bench Verified
(AgentLess 4*10)
48.4-33.539.7-47
Multi-SWE-Bench17.7-9.57.7-17
Multilingualism
MMMLU84.377.4 (75.7)7979 (80.6)-78.4
Long Context
RULER
(128K)
94.578.794.577.5-94.6
Safety
AIR-Bench-----75.6
+
+ + +- Bold denotes open-source SOTA. Underlined indicates the second place in the open-source model. +
+- "*" indicates that the results in this column are presented in the format of "reproduced_results (reported_results_if_any)". Some results have been omitted due to the failure of the evaluation run. +
+- The results of Gemma3-27B are sourced directly from its technical report. +
+- Generation configs for Seed-OSS-36B-Instruct: temperature=1.1, top_p=0.95. Specifically, for Taubench, temperature=1, top_p=0.7. +
+ + +> [!NOTE] +> We recommend sampling with `temperature=1.1` and `top_p=0.95`. + +### Thinking Budget + +Users can flexibly specify the model's thinking budget. The figure below shows the performance curves across different tasks as the thinking budget varies. For simpler tasks (such as IFEval), the model's chain of thought (CoT) is shorter, and the score exhibits fluctuations as the thinking budget increases. For more challenging tasks (such as AIME and LiveCodeBench), the model's CoT is longer, and the score improves with an increase in the thinking budget. + +![thinking_budget](./thinking_budget.png) + +Here is an example with a thinking budget set to 512: during the reasoning process, the model periodically triggers self-reflection to estimate the consumed and remaining budget, and delivers the final response once the budget is exhausted or the reasoning concludes. ``` + +Got it, let's try to solve this problem step by step. The problem says ... ... +I have used 129 tokens, and there are 383 tokens remaining for use. +Using the power rule, ... ... +I have used 258 tokens, and there are 254 tokens remaining for use. +Alternatively, remember that ... ... +I have used 393 tokens, and there are 119 tokens remaining for use. +Because if ... ... +I have exhausted my token budget, and now I will start answering the question. + +To solve the problem, we start by using the properties of logarithms to simplify the given equations: (full answer omitted). +``` + +If no thinking budget is set (default mode), Seed-OSS will initiate thinking with unlimited length. If a thinking budget is specified, users are advised to prioritize values that are integer multiples of 512 (e.g., 512, 1K, 2K, 4K, 8K, or 16K), as the model has been extensively trained on these intervals. Models are instructed to output a direct response when the thinking budget is 0, and we recommend setting any budget below 512 to this value. + +## Quick Start +```shell +pip3 install -r requirements.txt +pip install git+ssh://git@github.com/Fazziekey/transformers.git@seed-oss +``` + ```python -#SDK模型下载 -from modelscope import snapshot_download -model_dir = snapshot_download('ByteDance-Seed/Seed-OSS-36B-Instruct') -``` -Git下载 -``` -#Git模型下载 -git clone https://www.modelscope.cn/ByteDance-Seed/Seed-OSS-36B-Instruct.git +from transformers import AutoModelForCausalLM, AutoTokenizer +import os +import re + +model_name_or_path = "ByteDance-Seed/Seed-OSS-36B-Instruct" + +tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) +model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here +messages = [ + {"role": "user", "content": "How to make pasta?"}, +] +tokenized_chat = tokenizer.apply_chat_template( + messages, + tokenize=True, + add_generation_prompt=True, + return_tensors="pt", + thinking_budget=512 # control the thinking budget +) + +outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048) + +output_text = tokenizer.decode(outputs[0]) ``` -

如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。

\ No newline at end of file +## Inference + +### Download Model + +Download Seed-OSS checkpoint to `./Seed-OSS-36B-Instruct` + +### Transformers +The `generate.py` script provides a simple interface for model inference with configurable options. + +#### Basic Usage +```shell +cd inference +python3 generate.py --model_path /path/to/model +``` + +#### Key Parameters +| Parameter | Description | +|-----------|-------------| +| `--model_path` | Path to the pretrained model directory (required) | +| `--prompts` | Input prompts (default: sample cooking/code questions) | +| `--max_new_tokens` | Maximum tokens to generate (default: 4096) | +| `--attn_implementation` | Attention mechanism: `flash_attention_2` (default) or `eager` | +| `--load_in_4bit/8bit` | Enable 4-bit/8-bit quantization (reduces memory usage) | +| `--thinking_budget` | Thinking budget in tokens (default: -1 for unlimited budget) | + +#### Quantization Examples +```shell +# 8-bit quantization +python3 generate.py --model_path /path/to/model --load_in_8bit True + +# 4-bit quantization +python3 generate.py --model_path /path/to/model --load_in_4bit True +``` + +#### Custom Prompts +```shell +python3 generate.py --model_path /path/to/model --prompts "['What is machine learning?', 'Explain quantum computing']" +``` + +### vLLM +Use vllm >= 0.10.0 or higher for inference. + +- First install vLLM with Seed-OSS support version: +```shell +VLLM_USE_PRECOMPILED=1 VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL=1 pip install git+ssh://git@github.com/FoolPlayer/vllm.git@seed-oss +``` + +- Start vLLM API server: +```shell +python3 -m vllm.entrypoints.openai.api_server \ + --host localhost \ + --port 4321 \ + --enable-auto-tool-choice \ + --tool-call-parser seed_oss \ + --trust-remote-code \ + --model ./Seed-OSS-36B-Instruct \ + --chat-template ./Seed-OSS-36B-Instruct/chat_template.jinja \ + --tensor-parallel-size 8 \ + --dtype bfloat16 \ + --served-model-name seed_oss +``` + +- Test with OpenAI client: + +Chat + +```shell +python3 inference/vllm_chat.py +``` + +Tool Call +```shell +python3 inference/vllm_tool_call.py +``` + + +## Model Card +See [MODEL_CARD](./MODEL_CARD.md). + +## License +This project is licensed under Apache-2.0. See the [LICENSE](./LICENSE) flie for details. + +## Citation + +```bibtex +@misc{seed2025seed-oss, + author={ByteDance Seed Team}, + title={Seed-OSS Open-Source Models}, + year={2025}, + howpublished={\url{https://github.com/ByteDance-Seed/seed-oss}} +} +``` + +## About [ByteDance Seed Team](https://seed.bytedance.com/) + +Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society. \ No newline at end of file diff --git a/chat_template.jinja b/chat_template.jinja new file mode 100644 index 0000000..d92d003 --- /dev/null +++ b/chat_template.jinja @@ -0,0 +1,171 @@ +{# ----------‑‑‑ special token variables ‑‑‑---------- #} +{%- set bos_token = '' -%} +{%- set eos_token = '' -%} +{%- set pad_token = '' -%} +{%- set toolcall_begin_token = '' -%} +{%- set toolcall_end_token = '' -%} +{%- set think_begin_token = '' -%} +{%- set think_end_token = '' -%} +{%- set budget_begin_token = ''-%} +{%- set budget_end_token = ''-%} +{# -------------- reflection-interval lookup -------------- #} +{%- if not thinking_budget is defined %} +{%- set thinking_budget = -1 -%} +{%- endif -%} +{%- set budget_reflections_v05 = { + 0: 0, + 512: 128, + 1024: 256, + 2048: 512, + 4096: 512, + 8192: 1024, + 16384: 1024 +} -%} +{# 找到 “大于等于 thinking_budget” 的第一个档位 #} +{%- set ns = namespace(interval = None) -%} +{%- for k, v in budget_reflections_v05 | dictsort -%} + {%- if ns.interval is none and thinking_budget <= k -%} + {%- set ns.interval = v -%} + {%- endif -%} +{%- endfor -%} +{# 若超过最大档位,则用最后一个档位的值 #} +{%- if ns.interval is none -%} + {%- set ns.interval = budget_reflections_v05[16384] -%} +{%- endif -%} +{# ---------- 预处理 system 消息 ---------- #} +{%- if messages[0]["role"] == "system" %} +{%- set system_message = messages[0]["content"] %} +{%- set loop_messages = messages[1:] %} +{%- else %} +{%- set loop_messages = messages %} +{%- endif %} +{# ---------- 确保 tools 存在 ---------- #} +{%- if not tools is defined or tools is none %} +{%- set tools = [] %} +{%- endif %} +{# tools2doc.jinja #} +{%- macro py_type(t) -%} + {%- if t == "string" -%}str + {%- elif t in ("number", "integer") -%}int + {%- elif t == "boolean" -%}bool + {%- elif t == "array" -%}list + {%- else -%}Any{%- endif -%} +{%- endmacro -%} +{# ---------- 输出 system 块 ---------- #} +{%- if system_message is defined %} +{{ bos_token + "system\n" + system_message }} +{%- else %} +{%- if tools is iterable and tools | length > 0 %} +{{ bos_token + "system\nYou are Doubao, a helpful AI assistant. You may call one or more functions to assist with the user query." }} +{%- endif %} +{%- endif %} +{%- if use_json_tooldef is defined and use_json_tooldef %} + +{{"Tool List:\nYou are authorized to use the following tools (described in JSON Schema format). Before performing any task, you must decide how to call them based on the descriptions and parameters of these tools."}} +{{ tools | tojson(ensure_ascii=False) }} +{%- else %} +{%- for item in tools if item.type == "function" %} + + +Function: +def {{ item.function.name }}( +{%- for name, spec in item.function.parameters.properties.items() %} + {{- name }}: {{ py_type(spec.type) }}{% if not loop.last %},{% endif %} +{%- endfor %}): + """ + {{ item.function.description | trim }} + + {# ---------- Args ---------- #} + {%- if item.function.parameters.properties %} + Args: + {%- for name, spec in item.function.parameters.properties.items() %} + + - {{ name }} ({{ py_type(spec.type) }}) + {%- if name in item.function.parameters.required %} [必填]{% else %} [选填]{% endif %}: + {{- " " ~ (spec.description or "") }} + {%- endfor %} + {%- endif %} + + {# ---------- Returns ---------- #} + {%- if item.function.returns is defined + and item.function.returns.properties is defined + and item.function.returns.properties %} + Returns: + {%- for name, spec in item.function.returns.properties.items() %} + + - {{ name }} ({{ py_type(spec.type) }}): + {{- " " ~ (spec.description or "") }} + {%- endfor %} + {%- endif %} + + """ +{%- endfor %} +{%- endif %} +{%- if tools is iterable and tools | length > 0 %} + +{{"工具调用请遵循如下格式:\n\n\nvalue_1\nThis is the value for the second parameter\nthat can span\nmultiple lines\n\n\n"}} +{%- endif %} +{# 结束 system 块行尾 #} +{%- if system_message is defined or tools is iterable and tools | length > 0 %} +{{ eos_token }} +{%- endif %} +{# ---------- Thinking Budget ---------- #} +{%- if thinking_budget is defined %} +{%- if thinking_budget == 0 %} +{{ bos_token+"system" }} +{{ "You are an intelligent assistant that can answer questions in one step without the need for reasoning and thinking, that is, your thinking budget is 0. Next, please skip the thinking process and directly start answering the user's questions." }} +{{ eos_token }} +{%- elif not thinking_budget == -1 %} +{{ bos_token+"system" }} +{{ "You are an intelligent assistant with reflective ability. In the process of thinking and reasoning, you need to strictly follow the thinking budget, which is "}}{{thinking_budget}}{{". That is, you need to complete your thinking within "}}{{thinking_budget}}{{" tokens and start answering the user's questions. You will reflect on your thinking process every "}}{{ns.interval}}{{" tokens, stating how many tokens have been used and how many are left."}} +{{ eos_token }} +{%- endif %} +{%- endif %} +{# ---------- 逐条写出历史消息 ---------- #} +{%- for message in loop_messages %} +{%- if message.role == "assistant" + and message.tool_calls is defined + and message.tool_calls is iterable + and message.tool_calls | length > 0 %} +{{ bos_token + message.role }} +{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %} +{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }} +{%- endif %} +{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %} +{{ "\n" + message.content | trim + "\n" }} +{%- endif %} +{%- for tool_call in message.tool_calls %} +{%- if tool_call.function is defined %}{% set tool_call = tool_call.function %}{% endif %} +{{ "\n" + toolcall_begin_token + "\n\n" }} +{%- if tool_call.arguments is defined %} +{%- for arg_name, arg_value in tool_call.arguments | items %} +{{ "" }} +{%- set arg_value = arg_value if arg_value is string else arg_value | string %} +{{ arg_value+"\n" }} +{%- endfor %} +{%- endif %} +{{ "\n" + toolcall_end_token }} +{%- endfor %} +{{ eos_token }} +{%- elif message.role in ["user", "system"] %} +{{ bos_token + message.role + "\n" + message.content + eos_token }} +{%- elif message.role == "assistant" %} +{{ bos_token + message.role }} +{%- if message.reasoning_content is defined and message.reasoning_content is string and message.reasoning_content | trim | length > 0 %} +{{ "\n" + think_begin_token + message.reasoning_content | trim + think_end_token }} +{%- endif %} +{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %} +{{ "\n" + message.content | trim + eos_token }} +{%- endif %} +{# 包括 tool 角色,在这个逻辑 #} +{%- else %} +{{ bos_token + message.role + "\n" + message.content + eos_token }} +{%- endif %} +{%- endfor %} +{# ---------- 控制模型开始续写 ---------- #} +{%- if add_generation_prompt %} +{{ bos_token+"assistant\n" }} +{%- if thinking_budget == 0 %} +{{ think_begin_token+budget_begin_token }} +{%- endif %} +{%- endif %} \ No newline at end of file diff --git a/config.json b/config.json new file mode 100644 index 0000000..e094445 --- /dev/null +++ b/config.json @@ -0,0 +1,33 @@ +{ + "architectures": [ + "SeedOssForCausalLM" + ], + "attention_bias": true, + "attention_dropout": 0.1, + "attention_out_bias": false, + "bos_token_id": 0, + "pad_token_id": 1, + "eos_token_id": 2, + "head_dim": 128, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 27648, + "max_position_embeddings": 524288, + "mlp_bias": false, + "model_type": "seed_oss", + "num_attention_heads": 80, + "num_hidden_layers": 64, + "num_key_value_heads": 8, + "residual_dropout": 0.1, + "rms_norm_eps": 1e-06, + "rope_scaling": { + "rope_type": "default" + }, + "rope_theta": 10000000.0, + "tie_word_embeddings": false, + "torch_dtype": "bfloat16", + "transformers_version": "4.55.0", + "use_cache": true, + "vocab_size": 155136 +} \ No newline at end of file diff --git a/configuration.json b/configuration.json new file mode 100644 index 0000000..bbeeda1 --- /dev/null +++ b/configuration.json @@ -0,0 +1 @@ +{"framework": "pytorch", "task": "text-generation", "allow_remote": true} \ No newline at end of file diff --git a/generation_config.json b/generation_config.json new file mode 100644 index 0000000..3a7b67b --- /dev/null +++ b/generation_config.json @@ -0,0 +1,10 @@ +{ + "_from_model_config": true, + "bos_token_id": 0, + "pad_token_id": 1, + "eos_token_id": 2, + "transformers_version": "4.55.0", + "temperature": 1.1, + "top_p": 0.95 +} + \ No newline at end of file diff --git a/model-00001-of-00015.safetensors b/model-00001-of-00015.safetensors new file mode 100644 index 0000000..b69abcc --- /dev/null +++ b/model-00001-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6387b80f12db915254cbe82c26d393f0f5a10600ce7bda028e3ee90c256eecc +size 135 diff --git a/model-00002-of-00015.safetensors b/model-00002-of-00015.safetensors new file mode 100644 index 0000000..854a48a --- /dev/null +++ b/model-00002-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe2d0b95a5d785f8e2a18329296773e042b8caa9a3f0a1d9e8ef2c9bb4a14eea +size 135 diff --git a/model-00003-of-00015.safetensors b/model-00003-of-00015.safetensors new file mode 100644 index 0000000..7c82696 --- /dev/null +++ b/model-00003-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a3e358505119541fa85625546348a60f39685fba7549bd94c8e982d407a0555 +size 135 diff --git a/model-00004-of-00015.safetensors b/model-00004-of-00015.safetensors new file mode 100644 index 0000000..7adcec1 --- /dev/null +++ b/model-00004-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d6bbfb4ab754f2cb391caa40f67dd9d349b5381b402574a0440813606a348c5 +size 135 diff --git a/model-00005-of-00015.safetensors b/model-00005-of-00015.safetensors new file mode 100644 index 0000000..bcb869e --- /dev/null +++ b/model-00005-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:107cce88b60faf9bad30769172dce01cd1764570f92cb0a80dece2e238167f23 +size 135 diff --git a/model-00006-of-00015.safetensors b/model-00006-of-00015.safetensors new file mode 100644 index 0000000..ead20d5 --- /dev/null +++ b/model-00006-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e71fa75e94020a23d9a15da86ed328bdc01462a0a3f09ecdd614f047a802301a +size 135 diff --git a/model-00007-of-00015.safetensors b/model-00007-of-00015.safetensors new file mode 100644 index 0000000..96a8710 --- /dev/null +++ b/model-00007-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a04d657585986417b4957ae284b889c2b58083e39a90994a068ea4a25cfa27ae +size 135 diff --git a/model-00008-of-00015.safetensors b/model-00008-of-00015.safetensors new file mode 100644 index 0000000..017e25f --- /dev/null +++ b/model-00008-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1ef369c73695b6d4ea90e68154005d90a2733f67053b10211830a8d85e9263c4 +size 135 diff --git a/model-00009-of-00015.safetensors b/model-00009-of-00015.safetensors new file mode 100644 index 0000000..5057972 --- /dev/null +++ b/model-00009-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63e354190fef1698af8cf2b2b6eb3ceb4627be4e15c886fcefae04c40046811e +size 135 diff --git a/model-00010-of-00015.safetensors b/model-00010-of-00015.safetensors new file mode 100644 index 0000000..d1d46d7 --- /dev/null +++ b/model-00010-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a4781bad8d0e3bee0f1adda8017b951edd34a57638420cadaabf433e6bde8d0c +size 135 diff --git a/model-00011-of-00015.safetensors b/model-00011-of-00015.safetensors new file mode 100644 index 0000000..ced0535 --- /dev/null +++ b/model-00011-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:223165c90a98f80f66a5f2dcb94e6f09e3454974473fe14c6822c0628ee55f56 +size 135 diff --git a/model-00012-of-00015.safetensors b/model-00012-of-00015.safetensors new file mode 100644 index 0000000..7ffc30a --- /dev/null +++ b/model-00012-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8db709a2c461316819593bef8ae9e252cdf5da323f4361be62dd7f4d3c4c8f18 +size 135 diff --git a/model-00013-of-00015.safetensors b/model-00013-of-00015.safetensors new file mode 100644 index 0000000..3c9b063 --- /dev/null +++ b/model-00013-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4e6c7c009da0d562231304d6eef141a64f95a73e37b4d2576aa587a82b5713ec +size 135 diff --git a/model-00014-of-00015.safetensors b/model-00014-of-00015.safetensors new file mode 100644 index 0000000..6c95ff9 --- /dev/null +++ b/model-00014-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d233a72fe9dc4cbea98e275729541d9ebf06a7d0ecf4edd68e0f86d8b021339 +size 135 diff --git a/model-00015-of-00015.safetensors b/model-00015-of-00015.safetensors new file mode 100644 index 0000000..72710eb --- /dev/null +++ b/model-00015-of-00015.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:edabb4aa838885534911083fa9d7c00468f9e43103eb1bf61dc4a033af42d1c8 +size 135 diff --git a/model.safetensors.index.json b/model.safetensors.index.json new file mode 100644 index 0000000..4757f05 --- /dev/null +++ b/model.safetensors.index.json @@ -0,0 +1,779 @@ +{ + "metadata": { + "total_parameters": 36151104512, + "total_size": 72302209024 + }, + "weight_map": { + "lm_head.weight": "model-00015-of-00015.safetensors", + "model.embed_tokens.weight": "model-00001-of-00015.safetensors", + "model.layers.0.input_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.0.mlp.down_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.mlp.up_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.input_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.1.mlp.down_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.mlp.up_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.10.input_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.10.mlp.down_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.mlp.up_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.input_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.11.mlp.down_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.mlp.up_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.12.input_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.12.mlp.down_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.12.mlp.up_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.13.input_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.13.mlp.down_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.mlp.up_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.k_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.q_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.v_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.input_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.14.mlp.down_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.mlp.up_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.input_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.15.mlp.down_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.mlp.up_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.input_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.16.mlp.down_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.mlp.up_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00015.safetensors", + "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00015.safetensors", + "model.layers.17.input_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.17.mlp.down_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.mlp.up_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.k_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.q_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.v_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.input_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.18.mlp.down_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.mlp.up_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.k_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.q_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.v_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.input_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.19.mlp.down_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.mlp.up_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.2.input_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.2.mlp.down_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.mlp.up_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.20.input_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.20.mlp.down_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.mlp.up_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.21.input_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.21.mlp.down_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.21.mlp.up_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00015.safetensors", + "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00015.safetensors", + "model.layers.22.input_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.22.mlp.down_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.mlp.up_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.k_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.q_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.v_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.input_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.23.mlp.down_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.mlp.up_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.k_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.q_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.v_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.input_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.24.mlp.down_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.mlp.up_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.k_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.q_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.v_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.input_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.25.mlp.down_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.mlp.up_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00015.safetensors", + "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00015.safetensors", + "model.layers.26.input_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.26.mlp.down_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.mlp.up_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.k_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.q_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.v_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.input_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.27.mlp.down_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.mlp.up_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.k_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.q_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.v_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.input_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.28.mlp.down_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.mlp.up_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.k_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.q_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.v_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.input_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.29.mlp.down_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.mlp.gate_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.mlp.up_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.post_attention_layernorm.weight": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.k_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.q_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.v_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.3.input_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.3.mlp.down_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.3.mlp.up_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00015.safetensors", + "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00015.safetensors", + "model.layers.30.input_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.30.mlp.down_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.30.mlp.up_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.30.self_attn.k_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.k_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.o_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.q_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.q_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.v_proj.bias": "model-00007-of-00015.safetensors", + "model.layers.30.self_attn.v_proj.weight": "model-00007-of-00015.safetensors", + "model.layers.31.input_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.31.mlp.down_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.mlp.up_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.k_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.q_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.v_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.input_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.32.mlp.down_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.mlp.up_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.k_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.q_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.v_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.input_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.33.mlp.down_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.mlp.up_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.post_attention_layernorm.weight": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.k_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.q_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.v_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.input_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.34.mlp.down_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.34.mlp.gate_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.mlp.up_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.34.self_attn.k_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.k_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.o_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.q_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.q_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.v_proj.bias": "model-00008-of-00015.safetensors", + "model.layers.34.self_attn.v_proj.weight": "model-00008-of-00015.safetensors", + "model.layers.35.input_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.35.mlp.down_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.mlp.up_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.k_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.q_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.v_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.input_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.36.mlp.down_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.mlp.up_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.k_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.q_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.v_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.input_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.37.mlp.down_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.mlp.up_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.post_attention_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.k_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.q_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.v_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.input_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.38.mlp.down_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.mlp.gate_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.mlp.up_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.post_attention_layernorm.weight": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.k_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.k_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.o_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.q_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.q_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.v_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.38.self_attn.v_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.39.input_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.39.mlp.down_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.39.mlp.up_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.39.self_attn.k_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.k_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.o_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.q_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.q_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.v_proj.bias": "model-00009-of-00015.safetensors", + "model.layers.39.self_attn.v_proj.weight": "model-00009-of-00015.safetensors", + "model.layers.4.input_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.4.mlp.down_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.mlp.up_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.40.input_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.40.mlp.down_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.mlp.gate_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.mlp.up_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.post_attention_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.k_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.k_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.o_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.q_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.q_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.v_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.40.self_attn.v_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.input_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.41.mlp.down_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.mlp.gate_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.mlp.up_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.post_attention_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.k_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.k_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.o_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.q_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.q_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.v_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.41.self_attn.v_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.input_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.42.mlp.down_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.mlp.gate_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.mlp.up_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.post_attention_layernorm.weight": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.k_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.k_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.o_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.q_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.q_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.v_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.42.self_attn.v_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.input_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.43.mlp.down_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.43.mlp.gate_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.mlp.up_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.post_attention_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.43.self_attn.k_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.k_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.o_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.q_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.q_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.v_proj.bias": "model-00010-of-00015.safetensors", + "model.layers.43.self_attn.v_proj.weight": "model-00010-of-00015.safetensors", + "model.layers.44.input_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.44.mlp.down_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.mlp.gate_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.mlp.up_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.post_attention_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.k_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.k_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.o_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.q_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.q_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.v_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.44.self_attn.v_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.input_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.45.mlp.down_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.mlp.gate_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.mlp.up_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.post_attention_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.k_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.k_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.o_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.q_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.q_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.v_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.45.self_attn.v_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.input_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.46.mlp.down_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.mlp.gate_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.mlp.up_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.post_attention_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.k_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.k_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.o_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.q_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.q_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.v_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.46.self_attn.v_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.input_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.47.mlp.down_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.mlp.gate_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.mlp.up_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.post_attention_layernorm.weight": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.k_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.k_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.o_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.q_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.q_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.v_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.47.self_attn.v_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.48.input_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.48.mlp.down_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.48.mlp.gate_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.48.mlp.up_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.48.post_attention_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.48.self_attn.k_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.k_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.o_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.q_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.q_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.v_proj.bias": "model-00011-of-00015.safetensors", + "model.layers.48.self_attn.v_proj.weight": "model-00011-of-00015.safetensors", + "model.layers.49.input_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.49.mlp.down_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.mlp.gate_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.mlp.up_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.post_attention_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.k_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.k_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.o_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.q_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.q_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.v_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.49.self_attn.v_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.5.input_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.5.mlp.down_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.mlp.up_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.50.input_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.50.mlp.down_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.mlp.gate_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.mlp.up_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.post_attention_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.k_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.k_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.o_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.q_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.q_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.v_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.50.self_attn.v_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.input_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.51.mlp.down_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.mlp.gate_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.mlp.up_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.post_attention_layernorm.weight": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.k_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.k_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.o_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.q_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.q_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.v_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.51.self_attn.v_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.input_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.52.mlp.down_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.52.mlp.gate_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.mlp.up_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.post_attention_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.52.self_attn.k_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.k_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.o_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.q_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.q_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.v_proj.bias": "model-00012-of-00015.safetensors", + "model.layers.52.self_attn.v_proj.weight": "model-00012-of-00015.safetensors", + "model.layers.53.input_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.53.mlp.down_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.mlp.gate_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.mlp.up_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.post_attention_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.k_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.k_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.o_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.q_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.q_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.v_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.53.self_attn.v_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.input_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.54.mlp.down_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.mlp.gate_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.mlp.up_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.post_attention_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.k_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.k_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.o_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.q_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.q_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.v_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.54.self_attn.v_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.input_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.55.mlp.down_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.mlp.gate_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.mlp.up_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.post_attention_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.k_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.k_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.o_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.q_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.q_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.v_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.55.self_attn.v_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.input_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.56.mlp.down_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.mlp.gate_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.mlp.up_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.post_attention_layernorm.weight": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.k_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.k_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.o_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.q_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.q_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.v_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.56.self_attn.v_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.57.input_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.57.mlp.down_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.57.mlp.gate_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.57.mlp.up_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.57.post_attention_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.57.self_attn.k_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.k_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.o_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.q_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.q_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.v_proj.bias": "model-00013-of-00015.safetensors", + "model.layers.57.self_attn.v_proj.weight": "model-00013-of-00015.safetensors", + "model.layers.58.input_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.58.mlp.down_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.mlp.gate_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.mlp.up_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.post_attention_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.k_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.k_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.o_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.q_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.q_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.v_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.58.self_attn.v_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.input_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.59.mlp.down_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.mlp.gate_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.mlp.up_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.post_attention_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.k_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.k_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.o_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.q_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.q_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.v_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.59.self_attn.v_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.6.input_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.6.mlp.down_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.mlp.up_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.60.input_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.60.mlp.down_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.mlp.gate_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.mlp.up_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.post_attention_layernorm.weight": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.k_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.k_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.o_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.q_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.q_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.v_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.60.self_attn.v_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.input_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.61.mlp.down_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.61.mlp.gate_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.mlp.up_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.post_attention_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.61.self_attn.k_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.k_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.o_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.q_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.q_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.v_proj.bias": "model-00014-of-00015.safetensors", + "model.layers.61.self_attn.v_proj.weight": "model-00014-of-00015.safetensors", + "model.layers.62.input_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.62.mlp.down_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.mlp.gate_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.mlp.up_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.post_attention_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.k_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.k_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.o_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.q_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.q_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.v_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.62.self_attn.v_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.input_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.63.mlp.down_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.mlp.gate_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.mlp.up_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.post_attention_layernorm.weight": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.k_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.k_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.o_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.q_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.q_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.v_proj.bias": "model-00015-of-00015.safetensors", + "model.layers.63.self_attn.v_proj.weight": "model-00015-of-00015.safetensors", + "model.layers.7.input_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.7.mlp.down_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.7.mlp.up_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00015.safetensors", + "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00015.safetensors", + "model.layers.8.input_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.8.mlp.down_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.mlp.up_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.k_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.v_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.input_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.9.mlp.down_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.mlp.up_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00015.safetensors", + "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00015.safetensors", + "model.norm.weight": "model-00015-of-00015.safetensors" + } +} diff --git a/special_tokens_map.json b/special_tokens_map.json new file mode 100644 index 0000000..7dd43a5 --- /dev/null +++ b/special_tokens_map.json @@ -0,0 +1,23 @@ +{ + "bos_token": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "eos_token": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + }, + "pad_token": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false + } +} diff --git a/thinking_budget.png b/thinking_budget.png new file mode 100644 index 0000000..ab0237b Binary files /dev/null and b/thinking_budget.png differ diff --git a/tokenizer.json b/tokenizer.json new file mode 100644 index 0000000..dc0d8c9 --- /dev/null +++ b/tokenizer.json @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f6bd848f52451824a3033a9f1e67eea5b399a13c90f845a332d3a29537e05827 +size 11883696 diff --git a/tokenizer_config.json b/tokenizer_config.json new file mode 100644 index 0000000..c72b8f0 --- /dev/null +++ b/tokenizer_config.json @@ -0,0 +1,1035 @@ +{ + "added_tokens_decoder": { + "0": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "1": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "2": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "3": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "4": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "5": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "6": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "7": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "8": { + "content": "", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": false + }, + "9": { + "content": "<[PLHD9_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "10": { + "content": "<[PLHD10_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "11": { + "content": "<[PLHD11_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "12": { + "content": "<[PLHD12_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "13": { + "content": "<[PLHD13_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "14": { + "content": "<[PLHD14_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "15": { + "content": "<[PLHD15_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "16": { + "content": "<[PLHD16_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "17": { + "content": "<[PLHD17_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "18": { + "content": "<[PLHD18_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "19": { + "content": "<[PLHD19_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "20": { + "content": "<[PLHD20_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "21": { + "content": "<[PLHD21_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "22": { + "content": "<[PLHD22_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "23": { + "content": "<[PLHD23_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "24": { + "content": "<[PLHD24_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "25": { + "content": "<[PLHD25_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "26": { + "content": "<[PLHD26_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "27": { + "content": "<[PLHD27_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "28": { + "content": "<[PLHD28_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "29": { + "content": "<[PLHD29_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "30": { + "content": "<[PLHD30_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "31": { + "content": "<[PLHD31_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "32": { + "content": "<[PLHD32_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "33": { + "content": "<[PLHD33_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "34": { + "content": "<[PLHD34_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "35": { + "content": "<[PLHD35_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "36": { + "content": "<[PLHD36_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "37": { + "content": "<[PLHD37_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "38": { + "content": "<[PLHD38_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "39": { + "content": "<[PLHD39_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "40": { + "content": "<[PLHD40_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "41": { + "content": "<[PLHD41_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "42": { + "content": "<[PLHD42_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "43": { + "content": "<[PLHD43_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "44": { + "content": "<[PLHD44_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "45": { + "content": "<[PLHD45_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "46": { + "content": "<[PLHD46_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "47": { + "content": "<[PLHD47_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "48": { + "content": "<[PLHD48_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "49": { + "content": "<[PLHD49_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "50": { + "content": "<[PLHD50_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "51": { + "content": "<[PLHD51_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "52": { + "content": "<[PLHD52_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "53": { + "content": "<[PLHD53_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "54": { + "content": "<[PLHD54_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "55": { + "content": "<[PLHD55_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "56": { + "content": "<[PLHD56_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "57": { + "content": "<[PLHD57_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "58": { + "content": "<[PLHD58_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "59": { + "content": "<[PLHD59_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "60": { + "content": "<[PLHD60_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "61": { + "content": "<[PLHD61_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "62": { + "content": "<[PLHD62_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "63": { + "content": "<[PLHD63_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "64": { + "content": "<[PLHD64_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "65": { + "content": "<[PLHD65_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "66": { + "content": "<[PLHD66_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "67": { + "content": "<[PLHD67_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "68": { + "content": "<[PLHD68_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "69": { + "content": "<[PLHD69_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "70": { + "content": "<[PLHD70_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "71": { + "content": "<[PLHD71_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "72": { + "content": "<[PLHD72_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "73": { + "content": "<[PLHD73_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "74": { + "content": "<[PLHD74_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "75": { + "content": "<[PLHD75_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "76": { + "content": "<[PLHD76_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "77": { + "content": "<[PLHD77_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "78": { + "content": "<[PLHD78_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "79": { + "content": "<[PLHD79_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "80": { + "content": "<[PLHD80_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "81": { + "content": "<[PLHD81_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "82": { + "content": "<[PLHD82_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "83": { + "content": "<[PLHD83_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "84": { + "content": "<[PLHD84_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "85": { + "content": "<[PLHD85_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "86": { + "content": "<[PLHD86_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "87": { + "content": "<[PLHD87_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "88": { + "content": "<[PLHD88_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "89": { + "content": "<[PLHD89_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "90": { + "content": "<[PLHD90_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "91": { + "content": "<[PLHD91_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "92": { + "content": "<[PLHD92_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "93": { + "content": "<[PLHD93_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "94": { + "content": "<[PLHD94_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "95": { + "content": "<[PLHD95_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "96": { + "content": "<[PLHD96_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "97": { + "content": "<[PLHD97_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "98": { + "content": "<[PLHD98_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "99": { + "content": "<[PLHD99_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "100": { + "content": "<[PLHD100_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "101": { + "content": "<[PLHD101_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "102": { + "content": "<[PLHD102_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "103": { + "content": "<[PLHD103_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "104": { + "content": "<[PLHD104_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "105": { + "content": "<[PLHD105_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "106": { + "content": "<[PLHD106_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "107": { + "content": "<[PLHD107_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "108": { + "content": "<[PLHD108_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "109": { + "content": "<[PLHD109_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "110": { + "content": "<[PLHD110_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "111": { + "content": "<[PLHD111_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "112": { + "content": "<[PLHD112_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "113": { + "content": "<[PLHD113_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "114": { + "content": "<[PLHD114_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "115": { + "content": "<[PLHD115_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "116": { + "content": "<[PLHD116_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "117": { + "content": "<[PLHD117_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "118": { + "content": "<[PLHD118_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "119": { + "content": "<[PLHD119_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "120": { + "content": "<[PLHD120_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "121": { + "content": "<[PLHD121_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "122": { + "content": "<[PLHD122_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "123": { + "content": "<[PLHD123_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "124": { + "content": "<[PLHD124_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "125": { + "content": "<[PLHD125_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "126": { + "content": "<[PLHD126_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + }, + "127": { + "content": "<[PLHD127_never_used]>", + "lstrip": false, + "normalized": false, + "rstrip": false, + "single_word": false, + "special": true + } + }, + "bos_token": "", + "clean_up_tokenization_spaces": false, + "eos_token": "", + "extra_special_tokens": {}, + "model_max_length": 1000000000000000019884624838656, + "pad_token": "", + "tokenizer_class": "PreTrainedTokenizerFast" +}