0%

Pytorch C packages CUDA Installation Mismatch Problems

When installing some non-pip installed packages, especially in the deep learning field, we may use python setup.py build install to build the packages locally. Then, some typical problems may happen in this stage. An CUDA mismatch error may be:

problem

This error can be caused for many reasons. I just report my situation and how do I solve it.

Why this happen?

Some packages need to be compiled by the local CUDA compilers and to be installed locally. Then, those packages cooperate with the pytorch in the conda environment. Therefore, they need to be a compiled with the same version (at least same major version, like cuda 11.x) CUDA compilers.

  • First, we inspect the conda environment’s pytorch’s CUDA version by:
1
2
3
>>> import torch
>>> torch.version.cuda
'11.3'

This means that our pytorch is compiled by cuda 11.3. (Same as the error message above!)

  • Then, we inspect the system’s CUDA compiler version by:
1
2
3
4
5
6
nvcc -V
# OUTPUT
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

This means that our system’s current CUDA version is 10.2. (Same as the error message above!)

Therefore, the compiler version going to compile the package is NOT consistent with the compiler compiled pytorch. The Error is reported.

How to solve it?

So to solve this problem, the easiest way is to install a new CUDA with corresponding version. (In my test, I don’t need to install an exact 11.3 version, only an 11.1 version is OK)

  1. Install the CUDA with specific version. Many installation tutorials can be found online (skipped)
  2. Export the new path in ~/.bashrc: Add following command at the end of ~/.bashrc:
1
2
export PATH=/usr/local/cuda-<YOUR VERSION>/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-<YOUR VERSION>/lib64:${LD_LIBRARY_PATH}

(Remember to change <YOUR VERSION> above to your CUDA version!!)

  1. Open a new terminal, type in:
1
2
3
4
5
6
7
nvcc -V
# OUTPUT
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
  1. Then it should work! Go to the installation directory, and then switch to the target conda environment, and install!

Common techniques for debugging

  • Inspecting the pytorch’s CUDA version:
1
2
3
>>> import torch
>>> torch.version.cuda
'11.3'
  • Inspecting the system’s CUDA compiler version:
1
nvcc -V

or

1
2
3
>>> from torch.utils.cpp_extension import CUDA_HOME
>>> CUDA_HOME
'/usr/local/cuda-11.1'
  • Change the $PATH variable, so the new CUDA can be found:

Add following command at the end of ~/.bashrc:

1
2
export PATH=/usr/local/cuda-11.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib64:$LD_LIBRARY_PATH

REMEMBER to change the CUDA version to your version.

  • Delete the cached installing data

In some situation, when we modified the compiler, we shall build the package from scratch.

Remove any of the build, cached, dist, temp directory! E.g., the build and DCNv3.egg-info and dist directory below.

package

(But be careful that don’t remove the source code!!!)