An Obscure RuntimeError for CUDA error: out of memory

Problem

Today when I was running PyTorch scripts, I met a strange problem:

1
2
3

a = torch.rand(2, 2).to('cuda:1')
......
torch.cuda.synchronize()

but result in the following error:

  File "....../test.py", line 67, in <module>
    torch.cuda.synchronize()
  File "....../miniconda3/envs/py39/lib/python3.9/site-packages/torch/cuda/__init__.py", line 495, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: out of memory

but It’s clear that GPU1 has enough memory (we only need to allocate 16 bytes!):

|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:1A:00.0 Off |                  N/A |
| 75%   73C    P2   303W / 350W |  24222MiB / 24268MiB |     64%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:1B:00.0 Off |                  N/A |
| 90%   80C    P2   328W / 350W |  15838MiB / 24268MiB |     92%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

And normally, when we fail to allocate the memory for tensors, the error is:

1	CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 6.00 GiB total capacity; 4.54 GiB already allocated; 14.94 MiB free; 4.64 GiB reserved in total by PyTorch)

But our error message is much “simpler”. So what happened?

Possible Answer

This confused me for some time. According to this website:

When you initially do a CUDA call, it’ll create a cuda context and a THC context on the primary GPU (GPU0), and for that i think it needs 200 MB or so. That’s right at the edge of how much memory you have left.

Surprisingly, in my case, GPU0 has occupied 24222MiB / 24268MiB memory. So there is no more memory for the context. In addition, this makes sense that out error message is RuntimeError: CUDA error: out of memory, not the message that tensallocation failed.

Possible Solution

Set the CUDA_VISIBLE_DEVICES environment variable. We need to change primary GPU (GPU0) to other one.

Method 1

In the starting python file:

1
2
3

# Do this before `import torch`
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1' # set to what you like, e.g., '1,2,3,4,5,6,7'

Method 2

In the shell:

1 2	# Do this before run python export CUDA_VISIBLE_DEVICES=1 # set to what you like, e.g., '1,2,3,4,5,6,7'

And then, our program is ready to go.