commit file to repo
This commit is contained in:
commit
e61972010a
1
.gitattributes
vendored
Normal file
1
.gitattributes
vendored
Normal file
@ -0,0 +1 @@
|
|||||||
|
*.json filter=lfs diff=lfs merge=lfs -text
|
||||||
0
.gitignore
vendored
Normal file
0
.gitignore
vendored
Normal file
1347
configs/20260121_112135.py
Normal file
1347
configs/20260121_112135.py
Normal file
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,7 @@
|
|||||||
|
[RISE-CORE Msg(16245:139918883195904:libvgpu.c:900)]: Initializing.....
|
||||||
|
[RISE-CORE ERROR (pid:16245 thread=139918883195904 libvgpu.c:958)]: cuInit failed:100
|
||||||
|
01/21 12:02:35 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs]: {'score': 31.428571428571427}
|
||||||
|
01/21 12:02:35 - OpenCompass - INFO - time elapsed: 2.37s
|
||||||
|
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||||
|
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||||
|
[RISE-CORE Msg(16245:139918883195904:multiprocess_memory_limit.c:504)]: Calling exit handler 16245
|
||||||
7
logs/eval/public/qwen3-0-6b@v1.0.4/lambada.out
Normal file
7
logs/eval/public/qwen3-0-6b@v1.0.4/lambada.out
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
[RISE-CORE Msg(16057:139978682436608:libvgpu.c:900)]: Initializing.....
|
||||||
|
[RISE-CORE ERROR (pid:16057 thread=139978682436608 libvgpu.c:958)]: cuInit failed:100
|
||||||
|
01/21 12:02:24 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/lambada]: {'accuracy': 0.038812342324859306}
|
||||||
|
01/21 12:02:24 - OpenCompass - INFO - time elapsed: 2.13s
|
||||||
|
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||||
|
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||||
|
[RISE-CORE Msg(16057:139978682436608:multiprocess_memory_limit.c:504)]: Calling exit handler 16057
|
||||||
7
logs/eval/public/qwen3-0-6b@v1.0.4/triviaqa.out
Normal file
7
logs/eval/public/qwen3-0-6b@v1.0.4/triviaqa.out
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
[RISE-CORE Msg(16242:140509361744896:libvgpu.c:900)]: Initializing.....
|
||||||
|
[RISE-CORE ERROR (pid:16242 thread=140509361744896 libvgpu.c:958)]: cuInit failed:100
|
||||||
|
01/21 12:02:37 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/triviaqa]: {'score': 0.011316057485572028}
|
||||||
|
01/21 12:02:37 - OpenCompass - INFO - time elapsed: 3.78s
|
||||||
|
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||||
|
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||||
|
[RISE-CORE Msg(16242:140509361744896:multiprocess_memory_limit.c:504)]: Calling exit handler 16242
|
||||||
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_0.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_0.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_1.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_1.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_2.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/lambada_2.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_0.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_0.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_1.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_1.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_2.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_2.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_3.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_3.out
Normal file
File diff suppressed because one or more lines are too long
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_4.out
Normal file
10
logs/infer/public/qwen3-0-6b@v1.0.4/triviaqa_4.out
Normal file
File diff suppressed because one or more lines are too long
BIN
predictions/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_0.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_0.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_1.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_1.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_2.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_2.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_0.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_0.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_1.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_1.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_2.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_2.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_3.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_3.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_4.json
(Stored with Git LFS)
Normal file
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_4.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Normal file
BIN
results/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/lambada.json
(Stored with Git LFS)
Normal file
BIN
results/public/qwen3-0-6b@v1.0.4/lambada.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/triviaqa.json
(Stored with Git LFS)
Normal file
BIN
results/public/qwen3-0-6b@v1.0.4/triviaqa.json
(Stored with Git LFS)
Normal file
Binary file not shown.
87
summary/summary_20260121_112135.csv
Normal file
87
summary/summary_20260121_112135.csv
Normal file
@ -0,0 +1,87 @@
|
|||||||
|
dataset,version,metric,mode,public/qwen3-0-6b@v1.0.4
|
||||||
|
--------- 考试 Exam ---------,-,-,-,-
|
||||||
|
ceval,-,-,-,-
|
||||||
|
agieval,-,-,-,-
|
||||||
|
mmlu,-,-,-,-
|
||||||
|
GaokaoBench,-,-,-,-
|
||||||
|
ARC-c,-,-,-,-
|
||||||
|
--------- 语言 Language ---------,-,-,-,-
|
||||||
|
WiC,-,-,-,-
|
||||||
|
summedits,-,-,-,-
|
||||||
|
chid-dev,-,-,-,-
|
||||||
|
afqmc-dev,-,-,-,-
|
||||||
|
bustm-dev,-,-,-,-
|
||||||
|
cluewsc-dev,-,-,-,-
|
||||||
|
WSC,-,-,-,-
|
||||||
|
winogrande,-,-,-,-
|
||||||
|
flores_100,-,-,-,-
|
||||||
|
--------- 知识 Knowledge ---------,-,-,-,-
|
||||||
|
BoolQ,-,-,-,-
|
||||||
|
commonsense_qa,-,-,-,-
|
||||||
|
nq,-,-,-,-
|
||||||
|
triviaqa,2121ce,score,gen,0.01
|
||||||
|
--------- 推理 Reasoning ---------,-,-,-,-
|
||||||
|
cmnli,-,-,-,-
|
||||||
|
ocnli,-,-,-,-
|
||||||
|
ocnli_fc-dev,-,-,-,-
|
||||||
|
AX_b,-,-,-,-
|
||||||
|
AX_g,-,-,-,-
|
||||||
|
CB,-,-,-,-
|
||||||
|
RTE,-,-,-,-
|
||||||
|
story_cloze,-,-,-,-
|
||||||
|
COPA,-,-,-,-
|
||||||
|
ReCoRD,-,-,-,-
|
||||||
|
hellaswag,-,-,-,-
|
||||||
|
piqa,-,-,-,-
|
||||||
|
siqa,-,-,-,-
|
||||||
|
strategyqa,-,-,-,-
|
||||||
|
math,-,-,-,-
|
||||||
|
gsm8k,-,-,-,-
|
||||||
|
TheoremQA,-,-,-,-
|
||||||
|
openai_humaneval,-,-,-,-
|
||||||
|
mbpp,-,-,-,-
|
||||||
|
cmmlu,-,-,-,-
|
||||||
|
bbh,-,-,-,-
|
||||||
|
--------- 理解 Understanding ---------,-,-,-,-
|
||||||
|
C3,-,-,-,-
|
||||||
|
CMRC_dev,-,-,-,-
|
||||||
|
DRCD_dev,-,-,-,-
|
||||||
|
MultiRC,-,-,-,-
|
||||||
|
race-middle,-,-,-,-
|
||||||
|
race-high,-,-,-,-
|
||||||
|
openbookqa_fact,-,-,-,-
|
||||||
|
csl_dev,-,-,-,-
|
||||||
|
lcsts,-,-,-,-
|
||||||
|
Xsum,-,-,-,-
|
||||||
|
eprstmt-dev,-,-,-,-
|
||||||
|
lambada,217e11,accuracy,gen,0.04
|
||||||
|
tnews-dev,-,-,-,-
|
||||||
|
--------- 安全 Safety ---------,-,-,-,-
|
||||||
|
crows_pairs,-,-,-,-
|
||||||
|
--------- LEval Exact Match (Acc) ---------,-,-,-,-
|
||||||
|
LEval_coursera,-,-,-,-
|
||||||
|
LEval_gsm100,-,-,-,-
|
||||||
|
LEval_quality,-,-,-,-
|
||||||
|
LEval_tpo,-,-,-,-
|
||||||
|
LEval_topic_retrieval,-,-,-,-
|
||||||
|
--------- LEval Gen (ROUGE) ---------,-,-,-,-
|
||||||
|
LEval_financialqa,-,-,-,-
|
||||||
|
LEval_gov_report_summ,-,-,-,-
|
||||||
|
LEval_legal_contract_qa,-,-,-,-
|
||||||
|
LEval_meeting_summ,-,-,-,-
|
||||||
|
LEval_multidocqa,-,-,-,-
|
||||||
|
LEval_narrativeqa,-,-,-,-
|
||||||
|
LEval_nq,-,-,-,-
|
||||||
|
LEval_news_summ,-,-,-,-
|
||||||
|
LEval_paper_assistant,-,-,-,-
|
||||||
|
LEval_patent_summ,-,-,-,-
|
||||||
|
LEval_review_summ,-,-,-,-
|
||||||
|
LEval_scientificqa,-,-,-,-
|
||||||
|
LEval_tvshow_summ--------- 长文本 LongBench ---------,-,-,-,-
|
||||||
|
longbench_lsht,-,-,-,-
|
||||||
|
longbench_vcsum,-,-,-,-
|
||||||
|
longbench_dureader,-,-,-,-
|
||||||
|
longbench_multifieldqa_zh,-,-,-,-
|
||||||
|
longbench_passage_retrieval_zh,-,-,-,-
|
||||||
|
--------- 单选 自定义数据 ---------,-,-,-,-
|
||||||
|
SageBench-exam,-,-,-,-
|
||||||
|
197
summary/summary_20260121_112135.txt
Normal file
197
summary/summary_20260121_112135.txt
Normal file
@ -0,0 +1,197 @@
|
|||||||
|
20260121_112135
|
||||||
|
tabulate format
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
dataset version metric mode public/qwen3-0-6b@v1.0.4
|
||||||
|
----------------------------------------------------- --------- -------- ------ --------------------------
|
||||||
|
--------- 考试 Exam --------- - - - -
|
||||||
|
ceval - - - -
|
||||||
|
agieval - - - -
|
||||||
|
mmlu - - - -
|
||||||
|
GaokaoBench - - - -
|
||||||
|
ARC-c - - - -
|
||||||
|
--------- 语言 Language --------- - - - -
|
||||||
|
WiC - - - -
|
||||||
|
summedits - - - -
|
||||||
|
chid-dev - - - -
|
||||||
|
afqmc-dev - - - -
|
||||||
|
bustm-dev - - - -
|
||||||
|
cluewsc-dev - - - -
|
||||||
|
WSC - - - -
|
||||||
|
winogrande - - - -
|
||||||
|
flores_100 - - - -
|
||||||
|
--------- 知识 Knowledge --------- - - - -
|
||||||
|
BoolQ - - - -
|
||||||
|
commonsense_qa - - - -
|
||||||
|
nq - - - -
|
||||||
|
triviaqa 2121ce score gen 0.01
|
||||||
|
--------- 推理 Reasoning --------- - - - -
|
||||||
|
cmnli - - - -
|
||||||
|
ocnli - - - -
|
||||||
|
ocnli_fc-dev - - - -
|
||||||
|
AX_b - - - -
|
||||||
|
AX_g - - - -
|
||||||
|
CB - - - -
|
||||||
|
RTE - - - -
|
||||||
|
story_cloze - - - -
|
||||||
|
COPA - - - -
|
||||||
|
ReCoRD - - - -
|
||||||
|
hellaswag - - - -
|
||||||
|
piqa - - - -
|
||||||
|
siqa - - - -
|
||||||
|
strategyqa - - - -
|
||||||
|
math - - - -
|
||||||
|
gsm8k - - - -
|
||||||
|
TheoremQA - - - -
|
||||||
|
openai_humaneval - - - -
|
||||||
|
mbpp - - - -
|
||||||
|
cmmlu - - - -
|
||||||
|
bbh - - - -
|
||||||
|
--------- 理解 Understanding --------- - - - -
|
||||||
|
C3 - - - -
|
||||||
|
CMRC_dev - - - -
|
||||||
|
DRCD_dev - - - -
|
||||||
|
MultiRC - - - -
|
||||||
|
race-middle - - - -
|
||||||
|
race-high - - - -
|
||||||
|
openbookqa_fact - - - -
|
||||||
|
csl_dev - - - -
|
||||||
|
lcsts - - - -
|
||||||
|
Xsum - - - -
|
||||||
|
eprstmt-dev - - - -
|
||||||
|
lambada 217e11 accuracy gen 0.04
|
||||||
|
tnews-dev - - - -
|
||||||
|
--------- 安全 Safety --------- - - - -
|
||||||
|
crows_pairs - - - -
|
||||||
|
--------- LEval Exact Match (Acc) --------- - - - -
|
||||||
|
LEval_coursera - - - -
|
||||||
|
LEval_gsm100 - - - -
|
||||||
|
LEval_quality - - - -
|
||||||
|
LEval_tpo - - - -
|
||||||
|
LEval_topic_retrieval - - - -
|
||||||
|
--------- LEval Gen (ROUGE) --------- - - - -
|
||||||
|
LEval_financialqa - - - -
|
||||||
|
LEval_gov_report_summ - - - -
|
||||||
|
LEval_legal_contract_qa - - - -
|
||||||
|
LEval_meeting_summ - - - -
|
||||||
|
LEval_multidocqa - - - -
|
||||||
|
LEval_narrativeqa - - - -
|
||||||
|
LEval_nq - - - -
|
||||||
|
LEval_news_summ - - - -
|
||||||
|
LEval_paper_assistant - - - -
|
||||||
|
LEval_patent_summ - - - -
|
||||||
|
LEval_review_summ - - - -
|
||||||
|
LEval_scientificqa - - - -
|
||||||
|
LEval_tvshow_summ--------- 长文本 LongBench --------- - - - -
|
||||||
|
longbench_lsht - - - -
|
||||||
|
longbench_vcsum - - - -
|
||||||
|
longbench_dureader - - - -
|
||||||
|
longbench_multifieldqa_zh - - - -
|
||||||
|
longbench_passage_retrieval_zh - - - -
|
||||||
|
--------- 单选 自定义数据 --------- - - - -
|
||||||
|
SageBench-exam - - - -
|
||||||
|
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
csv format
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
dataset,version,metric,mode,public/qwen3-0-6b@v1.0.4
|
||||||
|
--------- 考试 Exam ---------,-,-,-,-
|
||||||
|
ceval,-,-,-,-
|
||||||
|
agieval,-,-,-,-
|
||||||
|
mmlu,-,-,-,-
|
||||||
|
GaokaoBench,-,-,-,-
|
||||||
|
ARC-c,-,-,-,-
|
||||||
|
--------- 语言 Language ---------,-,-,-,-
|
||||||
|
WiC,-,-,-,-
|
||||||
|
summedits,-,-,-,-
|
||||||
|
chid-dev,-,-,-,-
|
||||||
|
afqmc-dev,-,-,-,-
|
||||||
|
bustm-dev,-,-,-,-
|
||||||
|
cluewsc-dev,-,-,-,-
|
||||||
|
WSC,-,-,-,-
|
||||||
|
winogrande,-,-,-,-
|
||||||
|
flores_100,-,-,-,-
|
||||||
|
--------- 知识 Knowledge ---------,-,-,-,-
|
||||||
|
BoolQ,-,-,-,-
|
||||||
|
commonsense_qa,-,-,-,-
|
||||||
|
nq,-,-,-,-
|
||||||
|
triviaqa,2121ce,score,gen,0.01
|
||||||
|
--------- 推理 Reasoning ---------,-,-,-,-
|
||||||
|
cmnli,-,-,-,-
|
||||||
|
ocnli,-,-,-,-
|
||||||
|
ocnli_fc-dev,-,-,-,-
|
||||||
|
AX_b,-,-,-,-
|
||||||
|
AX_g,-,-,-,-
|
||||||
|
CB,-,-,-,-
|
||||||
|
RTE,-,-,-,-
|
||||||
|
story_cloze,-,-,-,-
|
||||||
|
COPA,-,-,-,-
|
||||||
|
ReCoRD,-,-,-,-
|
||||||
|
hellaswag,-,-,-,-
|
||||||
|
piqa,-,-,-,-
|
||||||
|
siqa,-,-,-,-
|
||||||
|
strategyqa,-,-,-,-
|
||||||
|
math,-,-,-,-
|
||||||
|
gsm8k,-,-,-,-
|
||||||
|
TheoremQA,-,-,-,-
|
||||||
|
openai_humaneval,-,-,-,-
|
||||||
|
mbpp,-,-,-,-
|
||||||
|
cmmlu,-,-,-,-
|
||||||
|
bbh,-,-,-,-
|
||||||
|
--------- 理解 Understanding ---------,-,-,-,-
|
||||||
|
C3,-,-,-,-
|
||||||
|
CMRC_dev,-,-,-,-
|
||||||
|
DRCD_dev,-,-,-,-
|
||||||
|
MultiRC,-,-,-,-
|
||||||
|
race-middle,-,-,-,-
|
||||||
|
race-high,-,-,-,-
|
||||||
|
openbookqa_fact,-,-,-,-
|
||||||
|
csl_dev,-,-,-,-
|
||||||
|
lcsts,-,-,-,-
|
||||||
|
Xsum,-,-,-,-
|
||||||
|
eprstmt-dev,-,-,-,-
|
||||||
|
lambada,217e11,accuracy,gen,0.04
|
||||||
|
tnews-dev,-,-,-,-
|
||||||
|
--------- 安全 Safety ---------,-,-,-,-
|
||||||
|
crows_pairs,-,-,-,-
|
||||||
|
--------- LEval Exact Match (Acc) ---------,-,-,-,-
|
||||||
|
LEval_coursera,-,-,-,-
|
||||||
|
LEval_gsm100,-,-,-,-
|
||||||
|
LEval_quality,-,-,-,-
|
||||||
|
LEval_tpo,-,-,-,-
|
||||||
|
LEval_topic_retrieval,-,-,-,-
|
||||||
|
--------- LEval Gen (ROUGE) ---------,-,-,-,-
|
||||||
|
LEval_financialqa,-,-,-,-
|
||||||
|
LEval_gov_report_summ,-,-,-,-
|
||||||
|
LEval_legal_contract_qa,-,-,-,-
|
||||||
|
LEval_meeting_summ,-,-,-,-
|
||||||
|
LEval_multidocqa,-,-,-,-
|
||||||
|
LEval_narrativeqa,-,-,-,-
|
||||||
|
LEval_nq,-,-,-,-
|
||||||
|
LEval_news_summ,-,-,-,-
|
||||||
|
LEval_paper_assistant,-,-,-,-
|
||||||
|
LEval_patent_summ,-,-,-,-
|
||||||
|
LEval_review_summ,-,-,-,-
|
||||||
|
LEval_scientificqa,-,-,-,-
|
||||||
|
LEval_tvshow_summ--------- 长文本 LongBench ---------,-,-,-,-
|
||||||
|
longbench_lsht,-,-,-,-
|
||||||
|
longbench_vcsum,-,-,-,-
|
||||||
|
longbench_dureader,-,-,-,-
|
||||||
|
longbench_multifieldqa_zh,-,-,-,-
|
||||||
|
longbench_passage_retrieval_zh,-,-,-,-
|
||||||
|
--------- 单选 自定义数据 ---------,-,-,-,-
|
||||||
|
SageBench-exam,-,-,-,-
|
||||||
|
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||||
|
|
||||||
|
-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
raw format
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
-------------------------------
|
||||||
|
Model: public/qwen3-0-6b@v1.0.4
|
||||||
|
GaokaoBench_2010-2013_English_MCQs: {'score': 31.428571428571427}
|
||||||
|
lambada: {'accuracy': 0.038812342324859306}
|
||||||
|
triviaqa: {'score': 0.011316057485572028}
|
||||||
|
GaokaoBench: {'error': "missing datasets: {'GaokaoBench_2010-2022_Math_II_MCQs', 'GaokaoBench_2010-2022_Chemistry_MCQs', 'GaokaoBench_2010-2022_Chinese_Modern_Lit', 'GaokaoBench_2010-2022_Geography_MCQs', 'GaokaoBench_2012-2022_English_Cloze_Test', 'GaokaoBench_2010-2022_History_MCQs', 'GaokaoBench_2010-2022_Biology_MCQs', 'GaokaoBench_2010-2022_English_Fill_in_Blanks', 'GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs', 'GaokaoBench_2010-2022_English_Reading_Comp', 'GaokaoBench_2010-2022_Physics_MCQs', 'GaokaoBench_2010-2022_Political_Science_MCQs', 'GaokaoBench_2010-2022_Math_I_MCQs'}"}
|
||||||
|
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||||
Loading…
Reference in New Issue
Block a user