HuggingFace 加载数据集报错 ConnectionError 无需GoogleColab
weixin_37988564:
博主,想问一下新建文件夹将训练集测试集验证集分类之后,如何在自定义的数据集类中加载指定的数据集呢?因为load_dataset(path,split)而load_from_disk(path)只能加载指定路径的,难道需要些三个数据集类么?
[code=python]
import torch
from datasets import load_from_disk
class Dataset(torch.utils.data.Dataset):
def __init__(self,split):
self.dataset = load_from_disk('D:/Program Files (x86)/anaconda/envs/MyEnv/MyJupt/datasets/ChnSentiCorp',split=split)
def __len__(self):
return len(self.dataset)
def __getitem__(self,i):
text = self.dataset[i]['text']
lable = self.dataset[i]['lable']
return text,lable
[/code]
报错[code=python]
TypeError Traceback (most recent call last)
Input In [3], in ()
----> 1 dataset = Dataset('train')
2 len(dataset),dataset[0]
Input In [2], in Dataset.__init__(self, split)
5 def __init__(self,split):
----> 6 self.dataset = load_from_disk('D:/Program Files (x86)/anaconda/envs/MyEnv/MyJupt/datasets/ChnSentiCorp',split=split)
TypeErr
[/code]
|