Sharing tensors

On PyTorch’s documentation, on a particular tutorial page, the following is mentioned:

PyTorch creates a tensor of the same shape and containing the same data as the NumPy array, going so far as to keep NumPy’s default 64-bit float data type.

(…)

It is important to know that these converted objects are using the same underlying memory as their source objects, meaning that changes to one are reflected in the other.

Using the “same underlying memory” sounds pretty neat, but how does it actually work?

PyArrayObject a.k.a. numpy.ndarray

Before we demonstrate PyTorch sharing its memory with NumPy, let’s take a brief look of the underlying C struct that backs NumPy’s ndarray:

typedef struct {
    PyObject_HEAD;
    char* data;
    int nd;
    npy_intp* dimensions;
    npy_intp* strides;
    PyObject* base;
    PyArray_Descr* descr;
    int flags;
    PyObject* weakreflist;
} PyArrayObject;

The most important members of the struct are:

char* data: the contiguous block of memory that holds the tensor data.
int nd: number of dimensions.
npy_intp* dimensions: shape of each dimension.
npy_intp* strides: for each dimension, the number of bytes that must be skipped to get to the next element in that dimension.

I love how data is declared as a char (exactly one byte, no matter the system) and not a custom typedef.

Sharing memory here means having the tensor allocated in memory and manipulated as both a PyTorch tensor an/or a NumPy array, within the same program, without reallocating or copying the tensor.

Here’s an example of having both ndarray and tensor pointing to the same block of memory:

from numpy import array
from torch import from_numpy, set_default_device

set_default_device('cpu')

numpy_array = array([1, 2, 3], dtype='int32')
pytorch_tensor = from_numpy(numpy_array)

print('NumPy array:', numpy_array)
print('PyTorch tensor:', pytorch_tensor)
print()

numpy_address = numpy_array.__array_interface__['data'][0]
pytorch_address = pytorch_tensor.data_ptr()

print('Memory address of NumPy array:', hex(numpy_address))
print('Memory address of PyTorch tensor:', hex(pytorch_address))
print()

numpy_array[0] = 4
numpy_array[1] = 5
numpy_array[2] = 6

print('Updated NumPy array:', numpy_array)
print('Updated PyTorch tensor (reflecting NumPy change):', pytorch_tensor)
print()

pytorch_tensor[0] = 7
pytorch_tensor[1] = 8
pytorch_tensor[2] = 9

print('Updated PyTorch tensor:', pytorch_tensor)
print('Updated NumPy array (reflecting PyTorch change):', numpy_array)
print()

Output:

NumPy array: [1 2 3]
PyTorch tensor: tensor([1, 2, 3], dtype=torch.int32)

Memory address of NumPy array: 0x10d27290
Memory address of PyTorch tensor: 0x10d27290

Updated NumPy array: [4 5 6]
Updated PyTorch tensor (reflecting NumPy change): tensor([4, 5, 6], dtype=torch.int32)

Updated PyTorch tensor: tensor([7, 8, 9], dtype=torch.int32)
Updated NumPy array (reflecting PyTorch change): [7 8 9]

In this example we are sharing from NumPy to PyTorch with tensor.numpy() method. You can share the other way around via using tensor.from_numpy(), which works similarly.

What enables this is surprinsingly simple. No fancy IPC (Inter-Process Communication) required, since we are within the same process, you can pretty much pass char* data around and cast it if needed. Both library custom objects points to the same data in memory, in other words.

Since the data layed out in memory is so simple, we can make some little experiments creating the array on our own and then passing it forward to NumPy or PyTorch.

Let’s create our array in C and expose it as a shared memory object within the system (we’ll use it in different processes!):

#include <fcntl.h>   // shm_open
#include <stdio.h> // getchar, perror, printf
#include <sys/mman.h> // mmap
#include <unistd.h>   // ftruncate

int main() {
    // the memory object can be found at /dev/shm/shared_array
    const char* name = "shared_array";

    // enough memory to hold three integers
    size_t array_size = 3 * sizeof(int);

    // 666 is equivalent of "rw-rw-rw-" unix permission
    int file_descriptor = shm_open(name, O_CREAT | O_RDWR, 0666);
    if (file_descriptor == -1) {
        perror("shm_open");
        return 1;
    }

    // truncates the object to exactly the array size
    if (ftruncate(file_descriptor, array_size) == -1) {
        perror("ftruncate");
        return 1;
    }

    // map the shared memory object into a local address space
    void* shared_memory = mmap(0, array_size, PROT_READ | PROT_WRITE, MAP_SHARED, file_descriptor, 0);
    if (shared_memory == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    // cast the shared memory than assign 1, 2, 3 as initial values
    int* array = (int*)shared_memory;
    array[0] = 1;
    array[1] = 2;
    array[2] = 3;

    printf("Shared memory initial values: %i %i %i\n\n", array[0], array[1], array[2]);
    printf("Press enter to exit and release shared memory...\n");
    getchar();
    printf("Shared memory current values: %i %i %i\n", array[0], array[1], array[2]);

    // cleanup
    munmap(shared_memory, array_size);
    shm_unlink(name);

    return 0;
}

Run it and you’ll get the following (don’t hit enter just yet):

$ gcc shared_by_c.c -o shared_by_c
$ ./shared_by_c
Shared memory initial values: 1 2 3

Press enter to exit and release shared memory...

Now, while having the C application paused, run the following Python script that will change the object live but from a different process (while using both Numpy and PyTorch):

from multiprocessing.resource_tracker import unregister
from multiprocessing.shared_memory import SharedMemory

from numpy import int32, ndarray
from torch import from_numpy, set_default_device

set_default_device('cpu')

# opening the shared memory object
shared_memory = SharedMemory(name='shared_array', create=False)

# avoiding the memory object to be deleted on close
unregister(shared_memory._name, 'shared_memory')

# loading as numpy array and changing the values
numpy_array = ndarray((3,), dtype=int32, buffer=shared_memory.buf)
print('NumPy array from shared memory:', numpy_array)
numpy_array[0] = 4
numpy_array[1] = 5
numpy_array[2] = 6
print('NumPy array from shared memory changed to:', numpy_array)

print()

# loading as pytorch tensor and changing the values
pytorch_tensor = from_numpy(numpy_array)
print('PyTorch tensor from shared memory:', pytorch_tensor)
pytorch_tensor[0] = 7
pytorch_tensor[1] = 8
pytorch_tensor[2] = 9
print('Updated PyTorch tensor:', pytorch_tensor)

shared_memory.close()

The output:

$ python shared_by_c.py
NumPy array from shared memory: [1 2 3]
NumPy array from shared memory changed to: [4 5 6]

PyTorch tensor from shared memory: tensor([4, 5, 6], dtype=torch.int32)
Updated PyTorch tensor: tensor([7, 8, 9], dtype=torch.int32)

Back to our other terminal with the C program that is paused, hit enter to see the values now changed:

Shared memory initial values: 1 2 3

Press enter to exit and release shared memory...

Shared memory current values: 7 8 9

RAM, not VRAM

Note that everything we did in this article is stored in RAM memory. NumPy does not have GPU support.

I believe the same can be achieved in VRAM by using PyTorch with CUDA support and replacing NumPy with CuPy, but I’ll defer confirming this to a future article.

PyArrayObject a.k.a. numpy.ndarray

Sharing a tensor from NumPy to PyTorch

Sharing our own array from C code

RAM, not VRAM

Sources