Categories
Uncategorized

PyTorch Geometric and CUDA

PyTorch Geometric (PyG) is an add-on library for developing graph neural networks using Python. It supports CUDA but you’ll have to make sure to install it correctly. Below is one error message I got after installing PyG:

from torch_geometric.data import Data
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
...

OSError: /anaconda3/lib/python3.7/site-packages/torch_sparse/_version_cuda.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs

It is clear this error is related to CUDA version. So, I checked it:

print(torch.version.cuda, torch.version)
10.2, 1.9.0

Running $ nvidia-smi, gave a CUDA version 11.2. So my system was somehow messed up with mixed versions of CUDA. To fix the mess and get PyG working, I did the following:

$ pip uninstall torch
$ pip install torch===1.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
$ pip install torch-scatter -f https://data.pyg.org/whl/torch-1.9.1+cu111.html
$ pip install torch-sparse -f https://data.pyg.org/whl/torch-1.9.1+cu111.html
$ pip install torch-geometric
$ apt-get install nvidia-modprobe

Note that there is no existing wheel built with CUDA 11.2 (cu112) so I used the closest version (cu111). Now PyG works! The “nvidia-modprobe” kernel extension fixes “RuntimeError: CUDA unknown error – this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero,” which I got after having two Python sessions running and both trying to using CUDA.

Update from some other testing regarding these errors:

RuntimeError: Detected that PyTorch and torch_cluster were compiled with different CUDA versions. PyTorch has CUDA version 11.1 and torch_cluster has CUDA version 10.2. Please reinstall the torch_cluster that matches your PyTorch install.

RuntimeError: Detected that PyTorch and torch_spline_conv were compiled with different CUDA versions. PyTorch has CUDA version 11.1 and torch_spline_conv has CUDA version 10.2. Please reinstall the torch_spline_conv that matches your PyTorch install.

The following commands fixed it:

$ pip install --upgrade pip
$ CUDA=cu111
$ TORCH=1.9.1
$ pip install torch-cluster==1.5.9 -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
$ pip install torch-spline-conv -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html