init dataset
This commit is contained in:
parent
5498471bbd
commit
9cbeacecf1
44
README.md
44
README.md
@ -1,11 +1,39 @@
|
||||
|
||||
---
|
||||
license: Apache License 2.0
|
||||
displayName: SAMSum Corpus
|
||||
labelTypes:
|
||||
- Classification
|
||||
license:
|
||||
- CC BY-NC-ND 4.0
|
||||
mediaTypes:
|
||||
- Text
|
||||
paperUrl: https://arxiv.org/pdf/1911.12237v2.pdf
|
||||
publishDate: "2019"
|
||||
publishUrl: https://github.com/huggingface/datasets/tree/master/datasets/samsum
|
||||
publisher:
|
||||
- Samsung R&D Institute Poland
|
||||
tags:
|
||||
- Text
|
||||
taskTypes:
|
||||
- Text Summarization/Simplication
|
||||
- Federated Learning
|
||||
- Abstractive Text Summarization
|
||||
|
||||
---
|
||||
数据集文件元信息以及数据文件,请浏览“数据集文件”页面获取。
|
||||
|
||||
当前数据集卡片使用的是默认模版,数据集的贡献者未提供更加详细的数据集介绍,但是您可以通过如下GIT Clone命令,或者ModelScope SDK来下载数据集
|
||||
|
||||
#### 下载方法
|
||||
:modelscope-code[]{type="sdk"}
|
||||
:modelscope-code[]{type="git"}
|
||||
# 数据集介绍
|
||||
## 简介
|
||||
SAMSum 数据集包含大约 16k 个带有摘要的类似信使的对话。对话由精通英语的语言学家创建和记录。语言学家被要求创建类似于他们每天所写的对话,以反映他们现实生活中的信使对话的主题比例。风格和语域是多样化的——对话可以是非正式的、半正式的或正式的,它们可能包含俚语、表情符号和错别字。然后,用摘要对对话进行注释。假设摘要应该是人们在第三人称对话中所谈论内容的简明扼要。 SAMSum 数据集由波兰三星研发研究所准备并分发用于研究目的(非商业许可:CC BY-NC-ND 4.0)。
|
||||
## 引文
|
||||
|
||||
```
|
||||
"@article{gliwa2019samsum,
|
||||
title={SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization},
|
||||
author={Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander},
|
||||
journal={arXiv preprint arXiv:1911.12237},
|
||||
year={2019}
|
||||
}"
|
||||
```
|
||||
|
||||
|
||||
## Download dataset
|
||||
:modelscope-code[]{type="git"}
|
||||
18
metafile.yaml
Normal file
18
metafile.yaml
Normal file
@ -0,0 +1,18 @@
|
||||
displayName: SAMSum Corpus
|
||||
labelTypes:
|
||||
- Classification
|
||||
license:
|
||||
- CC BY-NC-ND 4.0
|
||||
mediaTypes:
|
||||
- Text
|
||||
paperUrl: https://arxiv.org/pdf/1911.12237v2.pdf
|
||||
publishDate: "2019"
|
||||
publishUrl: https://github.com/huggingface/datasets/tree/master/datasets/samsum
|
||||
publisher:
|
||||
- Samsung R&D Institute Poland
|
||||
tags:
|
||||
- Text
|
||||
taskTypes:
|
||||
- Text Summarization/Simplication
|
||||
- Federated Learning
|
||||
- Abstractive Text Summarization
|
||||
9
quickstart.md
Normal file
9
quickstart.md
Normal file
@ -0,0 +1,9 @@
|
||||
|
||||
## SDK usage
|
||||
```python
|
||||
from modelscope.msdatasets import MsDataset
|
||||
|
||||
MsDataset.load("OpenDataLab/SAMSum_Corpus")
|
||||
|
||||
# Note: If the SDK is not available, please use git to download the dataset.
|
||||
```
|
||||
BIN
raw/corpus.7z
(Stored with Git LFS)
Normal file
BIN
raw/corpus.7z
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
sample/other/test.json
(Stored with Git LFS)
Normal file
BIN
sample/other/test.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
sample/other/train.json
(Stored with Git LFS)
Normal file
BIN
sample/other/train.json
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
sample/other/val.json
(Stored with Git LFS)
Normal file
BIN
sample/other/val.json
(Stored with Git LFS)
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user