Compare commits
No commits in common. "v1.0.4-260121-112021" and "main" have entirely different histories.
v1.0.4-260
...
main
1
.gitattributes
vendored
1
.gitattributes
vendored
@ -1 +0,0 @@
|
||||
*.json filter=lfs diff=lfs merge=lfs -text
|
||||
0
.gitignore
vendored
0
.gitignore
vendored
File diff suppressed because it is too large
Load Diff
@ -1,7 +0,0 @@
|
||||
[RISE-CORE Msg(16245:139918883195904:libvgpu.c:900)]: Initializing.....
|
||||
[RISE-CORE ERROR (pid:16245 thread=139918883195904 libvgpu.c:958)]: cuInit failed:100
|
||||
01/21 12:02:35 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs]: {'score': 31.428571428571427}
|
||||
01/21 12:02:35 - OpenCompass - INFO - time elapsed: 2.37s
|
||||
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||
[RISE-CORE Msg(16245:139918883195904:multiprocess_memory_limit.c:504)]: Calling exit handler 16245
|
||||
@ -1,7 +0,0 @@
|
||||
[RISE-CORE Msg(16057:139978682436608:libvgpu.c:900)]: Initializing.....
|
||||
[RISE-CORE ERROR (pid:16057 thread=139978682436608 libvgpu.c:958)]: cuInit failed:100
|
||||
01/21 12:02:24 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/lambada]: {'accuracy': 0.038812342324859306}
|
||||
01/21 12:02:24 - OpenCompass - INFO - time elapsed: 2.13s
|
||||
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||
[RISE-CORE Msg(16057:139978682436608:multiprocess_memory_limit.c:504)]: Calling exit handler 16057
|
||||
@ -1,7 +0,0 @@
|
||||
[RISE-CORE Msg(16242:140509361744896:libvgpu.c:900)]: Initializing.....
|
||||
[RISE-CORE ERROR (pid:16242 thread=140509361744896 libvgpu.c:958)]: cuInit failed:100
|
||||
01/21 12:02:37 - OpenCompass - INFO - Task [public/qwen3-0-6b@v1.0.4/triviaqa]: {'score': 0.011316057485572028}
|
||||
01/21 12:02:37 - OpenCompass - INFO - time elapsed: 3.78s
|
||||
/opt/conda/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
|
||||
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
|
||||
[RISE-CORE Msg(16242:140509361744896:multiprocess_memory_limit.c:504)]: Calling exit handler 16242
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
BIN
predictions/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_0.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_0.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_1.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_1.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_2.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/lambada_2.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_0.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_0.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_1.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_1.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_2.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_2.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_3.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_3.json
(Stored with Git LFS)
Binary file not shown.
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_4.json
(Stored with Git LFS)
BIN
predictions/public/qwen3-0-6b@v1.0.4/triviaqa_4.json
(Stored with Git LFS)
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
BIN
results/public/qwen3-0-6b@v1.0.4/GaokaoBench_2010-2013_English_MCQs.json
(Stored with Git LFS)
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/lambada.json
(Stored with Git LFS)
BIN
results/public/qwen3-0-6b@v1.0.4/lambada.json
(Stored with Git LFS)
Binary file not shown.
BIN
results/public/qwen3-0-6b@v1.0.4/triviaqa.json
(Stored with Git LFS)
BIN
results/public/qwen3-0-6b@v1.0.4/triviaqa.json
(Stored with Git LFS)
Binary file not shown.
@ -1,87 +0,0 @@
|
||||
dataset,version,metric,mode,public/qwen3-0-6b@v1.0.4
|
||||
--------- 考试 Exam ---------,-,-,-,-
|
||||
ceval,-,-,-,-
|
||||
agieval,-,-,-,-
|
||||
mmlu,-,-,-,-
|
||||
GaokaoBench,-,-,-,-
|
||||
ARC-c,-,-,-,-
|
||||
--------- 语言 Language ---------,-,-,-,-
|
||||
WiC,-,-,-,-
|
||||
summedits,-,-,-,-
|
||||
chid-dev,-,-,-,-
|
||||
afqmc-dev,-,-,-,-
|
||||
bustm-dev,-,-,-,-
|
||||
cluewsc-dev,-,-,-,-
|
||||
WSC,-,-,-,-
|
||||
winogrande,-,-,-,-
|
||||
flores_100,-,-,-,-
|
||||
--------- 知识 Knowledge ---------,-,-,-,-
|
||||
BoolQ,-,-,-,-
|
||||
commonsense_qa,-,-,-,-
|
||||
nq,-,-,-,-
|
||||
triviaqa,2121ce,score,gen,0.01
|
||||
--------- 推理 Reasoning ---------,-,-,-,-
|
||||
cmnli,-,-,-,-
|
||||
ocnli,-,-,-,-
|
||||
ocnli_fc-dev,-,-,-,-
|
||||
AX_b,-,-,-,-
|
||||
AX_g,-,-,-,-
|
||||
CB,-,-,-,-
|
||||
RTE,-,-,-,-
|
||||
story_cloze,-,-,-,-
|
||||
COPA,-,-,-,-
|
||||
ReCoRD,-,-,-,-
|
||||
hellaswag,-,-,-,-
|
||||
piqa,-,-,-,-
|
||||
siqa,-,-,-,-
|
||||
strategyqa,-,-,-,-
|
||||
math,-,-,-,-
|
||||
gsm8k,-,-,-,-
|
||||
TheoremQA,-,-,-,-
|
||||
openai_humaneval,-,-,-,-
|
||||
mbpp,-,-,-,-
|
||||
cmmlu,-,-,-,-
|
||||
bbh,-,-,-,-
|
||||
--------- 理解 Understanding ---------,-,-,-,-
|
||||
C3,-,-,-,-
|
||||
CMRC_dev,-,-,-,-
|
||||
DRCD_dev,-,-,-,-
|
||||
MultiRC,-,-,-,-
|
||||
race-middle,-,-,-,-
|
||||
race-high,-,-,-,-
|
||||
openbookqa_fact,-,-,-,-
|
||||
csl_dev,-,-,-,-
|
||||
lcsts,-,-,-,-
|
||||
Xsum,-,-,-,-
|
||||
eprstmt-dev,-,-,-,-
|
||||
lambada,217e11,accuracy,gen,0.04
|
||||
tnews-dev,-,-,-,-
|
||||
--------- 安全 Safety ---------,-,-,-,-
|
||||
crows_pairs,-,-,-,-
|
||||
--------- LEval Exact Match (Acc) ---------,-,-,-,-
|
||||
LEval_coursera,-,-,-,-
|
||||
LEval_gsm100,-,-,-,-
|
||||
LEval_quality,-,-,-,-
|
||||
LEval_tpo,-,-,-,-
|
||||
LEval_topic_retrieval,-,-,-,-
|
||||
--------- LEval Gen (ROUGE) ---------,-,-,-,-
|
||||
LEval_financialqa,-,-,-,-
|
||||
LEval_gov_report_summ,-,-,-,-
|
||||
LEval_legal_contract_qa,-,-,-,-
|
||||
LEval_meeting_summ,-,-,-,-
|
||||
LEval_multidocqa,-,-,-,-
|
||||
LEval_narrativeqa,-,-,-,-
|
||||
LEval_nq,-,-,-,-
|
||||
LEval_news_summ,-,-,-,-
|
||||
LEval_paper_assistant,-,-,-,-
|
||||
LEval_patent_summ,-,-,-,-
|
||||
LEval_review_summ,-,-,-,-
|
||||
LEval_scientificqa,-,-,-,-
|
||||
LEval_tvshow_summ--------- 长文本 LongBench ---------,-,-,-,-
|
||||
longbench_lsht,-,-,-,-
|
||||
longbench_vcsum,-,-,-,-
|
||||
longbench_dureader,-,-,-,-
|
||||
longbench_multifieldqa_zh,-,-,-,-
|
||||
longbench_passage_retrieval_zh,-,-,-,-
|
||||
--------- 单选 自定义数据 ---------,-,-,-,-
|
||||
SageBench-exam,-,-,-,-
|
||||
|
@ -1,197 +0,0 @@
|
||||
20260121_112135
|
||||
tabulate format
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
dataset version metric mode public/qwen3-0-6b@v1.0.4
|
||||
----------------------------------------------------- --------- -------- ------ --------------------------
|
||||
--------- 考试 Exam --------- - - - -
|
||||
ceval - - - -
|
||||
agieval - - - -
|
||||
mmlu - - - -
|
||||
GaokaoBench - - - -
|
||||
ARC-c - - - -
|
||||
--------- 语言 Language --------- - - - -
|
||||
WiC - - - -
|
||||
summedits - - - -
|
||||
chid-dev - - - -
|
||||
afqmc-dev - - - -
|
||||
bustm-dev - - - -
|
||||
cluewsc-dev - - - -
|
||||
WSC - - - -
|
||||
winogrande - - - -
|
||||
flores_100 - - - -
|
||||
--------- 知识 Knowledge --------- - - - -
|
||||
BoolQ - - - -
|
||||
commonsense_qa - - - -
|
||||
nq - - - -
|
||||
triviaqa 2121ce score gen 0.01
|
||||
--------- 推理 Reasoning --------- - - - -
|
||||
cmnli - - - -
|
||||
ocnli - - - -
|
||||
ocnli_fc-dev - - - -
|
||||
AX_b - - - -
|
||||
AX_g - - - -
|
||||
CB - - - -
|
||||
RTE - - - -
|
||||
story_cloze - - - -
|
||||
COPA - - - -
|
||||
ReCoRD - - - -
|
||||
hellaswag - - - -
|
||||
piqa - - - -
|
||||
siqa - - - -
|
||||
strategyqa - - - -
|
||||
math - - - -
|
||||
gsm8k - - - -
|
||||
TheoremQA - - - -
|
||||
openai_humaneval - - - -
|
||||
mbpp - - - -
|
||||
cmmlu - - - -
|
||||
bbh - - - -
|
||||
--------- 理解 Understanding --------- - - - -
|
||||
C3 - - - -
|
||||
CMRC_dev - - - -
|
||||
DRCD_dev - - - -
|
||||
MultiRC - - - -
|
||||
race-middle - - - -
|
||||
race-high - - - -
|
||||
openbookqa_fact - - - -
|
||||
csl_dev - - - -
|
||||
lcsts - - - -
|
||||
Xsum - - - -
|
||||
eprstmt-dev - - - -
|
||||
lambada 217e11 accuracy gen 0.04
|
||||
tnews-dev - - - -
|
||||
--------- 安全 Safety --------- - - - -
|
||||
crows_pairs - - - -
|
||||
--------- LEval Exact Match (Acc) --------- - - - -
|
||||
LEval_coursera - - - -
|
||||
LEval_gsm100 - - - -
|
||||
LEval_quality - - - -
|
||||
LEval_tpo - - - -
|
||||
LEval_topic_retrieval - - - -
|
||||
--------- LEval Gen (ROUGE) --------- - - - -
|
||||
LEval_financialqa - - - -
|
||||
LEval_gov_report_summ - - - -
|
||||
LEval_legal_contract_qa - - - -
|
||||
LEval_meeting_summ - - - -
|
||||
LEval_multidocqa - - - -
|
||||
LEval_narrativeqa - - - -
|
||||
LEval_nq - - - -
|
||||
LEval_news_summ - - - -
|
||||
LEval_paper_assistant - - - -
|
||||
LEval_patent_summ - - - -
|
||||
LEval_review_summ - - - -
|
||||
LEval_scientificqa - - - -
|
||||
LEval_tvshow_summ--------- 长文本 LongBench --------- - - - -
|
||||
longbench_lsht - - - -
|
||||
longbench_vcsum - - - -
|
||||
longbench_dureader - - - -
|
||||
longbench_multifieldqa_zh - - - -
|
||||
longbench_passage_retrieval_zh - - - -
|
||||
--------- 单选 自定义数据 --------- - - - -
|
||||
SageBench-exam - - - -
|
||||
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||
|
||||
-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
csv format
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
dataset,version,metric,mode,public/qwen3-0-6b@v1.0.4
|
||||
--------- 考试 Exam ---------,-,-,-,-
|
||||
ceval,-,-,-,-
|
||||
agieval,-,-,-,-
|
||||
mmlu,-,-,-,-
|
||||
GaokaoBench,-,-,-,-
|
||||
ARC-c,-,-,-,-
|
||||
--------- 语言 Language ---------,-,-,-,-
|
||||
WiC,-,-,-,-
|
||||
summedits,-,-,-,-
|
||||
chid-dev,-,-,-,-
|
||||
afqmc-dev,-,-,-,-
|
||||
bustm-dev,-,-,-,-
|
||||
cluewsc-dev,-,-,-,-
|
||||
WSC,-,-,-,-
|
||||
winogrande,-,-,-,-
|
||||
flores_100,-,-,-,-
|
||||
--------- 知识 Knowledge ---------,-,-,-,-
|
||||
BoolQ,-,-,-,-
|
||||
commonsense_qa,-,-,-,-
|
||||
nq,-,-,-,-
|
||||
triviaqa,2121ce,score,gen,0.01
|
||||
--------- 推理 Reasoning ---------,-,-,-,-
|
||||
cmnli,-,-,-,-
|
||||
ocnli,-,-,-,-
|
||||
ocnli_fc-dev,-,-,-,-
|
||||
AX_b,-,-,-,-
|
||||
AX_g,-,-,-,-
|
||||
CB,-,-,-,-
|
||||
RTE,-,-,-,-
|
||||
story_cloze,-,-,-,-
|
||||
COPA,-,-,-,-
|
||||
ReCoRD,-,-,-,-
|
||||
hellaswag,-,-,-,-
|
||||
piqa,-,-,-,-
|
||||
siqa,-,-,-,-
|
||||
strategyqa,-,-,-,-
|
||||
math,-,-,-,-
|
||||
gsm8k,-,-,-,-
|
||||
TheoremQA,-,-,-,-
|
||||
openai_humaneval,-,-,-,-
|
||||
mbpp,-,-,-,-
|
||||
cmmlu,-,-,-,-
|
||||
bbh,-,-,-,-
|
||||
--------- 理解 Understanding ---------,-,-,-,-
|
||||
C3,-,-,-,-
|
||||
CMRC_dev,-,-,-,-
|
||||
DRCD_dev,-,-,-,-
|
||||
MultiRC,-,-,-,-
|
||||
race-middle,-,-,-,-
|
||||
race-high,-,-,-,-
|
||||
openbookqa_fact,-,-,-,-
|
||||
csl_dev,-,-,-,-
|
||||
lcsts,-,-,-,-
|
||||
Xsum,-,-,-,-
|
||||
eprstmt-dev,-,-,-,-
|
||||
lambada,217e11,accuracy,gen,0.04
|
||||
tnews-dev,-,-,-,-
|
||||
--------- 安全 Safety ---------,-,-,-,-
|
||||
crows_pairs,-,-,-,-
|
||||
--------- LEval Exact Match (Acc) ---------,-,-,-,-
|
||||
LEval_coursera,-,-,-,-
|
||||
LEval_gsm100,-,-,-,-
|
||||
LEval_quality,-,-,-,-
|
||||
LEval_tpo,-,-,-,-
|
||||
LEval_topic_retrieval,-,-,-,-
|
||||
--------- LEval Gen (ROUGE) ---------,-,-,-,-
|
||||
LEval_financialqa,-,-,-,-
|
||||
LEval_gov_report_summ,-,-,-,-
|
||||
LEval_legal_contract_qa,-,-,-,-
|
||||
LEval_meeting_summ,-,-,-,-
|
||||
LEval_multidocqa,-,-,-,-
|
||||
LEval_narrativeqa,-,-,-,-
|
||||
LEval_nq,-,-,-,-
|
||||
LEval_news_summ,-,-,-,-
|
||||
LEval_paper_assistant,-,-,-,-
|
||||
LEval_patent_summ,-,-,-,-
|
||||
LEval_review_summ,-,-,-,-
|
||||
LEval_scientificqa,-,-,-,-
|
||||
LEval_tvshow_summ--------- 长文本 LongBench ---------,-,-,-,-
|
||||
longbench_lsht,-,-,-,-
|
||||
longbench_vcsum,-,-,-,-
|
||||
longbench_dureader,-,-,-,-
|
||||
longbench_multifieldqa_zh,-,-,-,-
|
||||
longbench_passage_retrieval_zh,-,-,-,-
|
||||
--------- 单选 自定义数据 ---------,-,-,-,-
|
||||
SageBench-exam,-,-,-,-
|
||||
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||
|
||||
-------------------------------------------------------------------------------------------------------------------------------- THIS IS A DIVIDER --------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
raw format
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
-------------------------------
|
||||
Model: public/qwen3-0-6b@v1.0.4
|
||||
GaokaoBench_2010-2013_English_MCQs: {'score': 31.428571428571427}
|
||||
lambada: {'accuracy': 0.038812342324859306}
|
||||
triviaqa: {'score': 0.011316057485572028}
|
||||
GaokaoBench: {'error': "missing datasets: {'GaokaoBench_2010-2022_Math_II_MCQs', 'GaokaoBench_2010-2022_Chemistry_MCQs', 'GaokaoBench_2010-2022_Chinese_Modern_Lit', 'GaokaoBench_2010-2022_Geography_MCQs', 'GaokaoBench_2012-2022_English_Cloze_Test', 'GaokaoBench_2010-2022_History_MCQs', 'GaokaoBench_2010-2022_Biology_MCQs', 'GaokaoBench_2010-2022_English_Fill_in_Blanks', 'GaokaoBench_2010-2022_Chinese_Lang_and_Usage_MCQs', 'GaokaoBench_2010-2022_English_Reading_Comp', 'GaokaoBench_2010-2022_Physics_MCQs', 'GaokaoBench_2010-2022_Political_Science_MCQs', 'GaokaoBench_2010-2022_Math_I_MCQs'}"}
|
||||
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
|
||||
Loading…
Reference in New Issue
Block a user