CUDA error (3): initialization error (multiprocessing) 路 Issue #2517 路 pytorch/pytorch 路 GitHub

您所在的位置:网站首页 cudaerrorinitializationerror CUDA error (3): initialization error (multiprocessing) 路 Issue #2517 路 pytorch/pytorch 路 GitHub

CUDA error (3): initialization error (multiprocessing) 路 Issue #2517 路 pytorch/pytorch 路 GitHub

2023-10-20 01:49| 来源: 网络整理| 查看: 265

Hello, 聽聽聽聽 I run to the same issue using Python 3.7.1 and Pytorch 1.0.0. Error: RuntimeError: CUDA error: initialization error.

Here is my setup:

I created a class that inherits from torch.utils.data.Dataset to make my own dataset. Inside this class, I do some preprocessing on samples in the GPU. This preprocessing is performed by some neural network that I created that was instantiated inside the class and sent to the GPU. torch.cuda.is_available() is called inside the class. The class gets the device: self.DEVICE = torch.device(device) and maintains it for future use (to send samples to be processed to the GPU). The class was tested (alone) and works fine. The issue starts when using this class with torch.utils.data.DataLoader. (see (2)). My dataset class is instantiated, gets the device, create the model that does the prerocessing, did some preprocessing to validation set samples. It works fine. Then, Pytorch data loader is called. No issue until now. The issue raised when starts looping over the samples: for i, (data, labels) in enumerate(train_loader): pass

--> It raises the error at the moment when my dataset class tries to send the sample to the its device:

x.to(self.DEVICE) RuntimeError: CUDA error: initialization error

I have read all the above comments, and other forums (1, 2, 3). I tried to remove torch.cuda.is_available() within my dataset class to avoid CUDA initialization, and use torch.multiprocessing.set_start_method("spawn"), but it didn't help (but I am not sure if I am missing something). However, I think CUDA needs to be initialized at that class, because it starts using it.

My dataset class is the first class to use CUDA. In somehow, the dataloader may have re-initialized CUDA and messed it for the dataset class. I am not sure if dataset class should get the device a second time in case CUDA has been reinitialized.

Note: I do not explicitly use torch.multiprocessing anywhere in the code. I do not modify torch.manual_seed for now. I have only one GPU on the computer, so it is the same device whenever the GPU is used.

Not really an expert in CUDA. Any suggestions? Thank you!

Updates:

(P.S.) The code is split into many files. One file as main entry. The dataset is in a different file than the main. After adding torch.multiprocessing.set_start_method('spawn', force="True") to the top of the main file, wrapping the main code in __name__ == "__main__ (otherwise, the main entry starts to call itself = the main code executes itself twice which is not expected ...), and setting num_workers=1 for the data loader (i.e., no mlutiprocessing in the data loader) things seem to work fine as expected. However, once turning num_workers > 1 things went south when arriving to for i, (data, labels) in enumerate(train_loader): pass

: nvidia-smi starts showing new processes (I suppose each one concerns one worker which is the result from forking in the data loader), increase in the GPU memory use (each process takes up to 450 MiB). In a GPU with small memory, it runs out of memory quickly. In a GPU with large memory, after a while (it does take time to create the subprocesses = extremely slow) things work fine, but ends with waning: python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown. 3. I am not really sure what happens when the data loader starts forking (here). I assume that the whole process is duplicated (including the dataset? class (which use cuda) and all its belongings including the network that performs the preprocessing). I am not sure if this is a dead end (3). 4. One option it to switch to CPU. Forking seems way faster. 5. There is some work that uses GPU for data processing. (here).

Still looking for a solution.

Updates:

Currently, I use only one worker in the data loader (reduce the time of creating the worker, and the GPU memory usage). This seems practical. The worker does the preprocessing on the GPU.

**How to hide the warning

python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown len(cache))

**? see here.



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3