Ray Docker doesn't recognise cuda gpu within container #45408

stephano41 · 2024-05-17T10:32:23Z

What happened + What you expected to happen

I am trying to run Ray with gpu support within a custom docker image and docker compose.

The Dockerfile I am using:

FROM rayproject/ray:latest-py39-cu121

WORKDIR /opt/project

USER root

RUN  sudo apt-get update && \
     apt-get install -y build-essential --no-install-recommends gcc git wget

CMD ["bash"]

The docker compose file I am using:

services:
    app:
        build:
            context: .
            dockerfile: Dockerfile
        
        container_name: test_ray
        image: test_ray
        volumes:
            - ./:/opt/project/
        tty: true
        stdin_open: true
        shm_size: 12gb
        runtime: nvidia
        environment:
          NVIDIA_VISIBLE_DEVICES: all
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: 1
                  capabilities: [gpu]

within the container, I can see that nvidia-smi recognises the gpu. However running ray.get_gpu_ids() returns an empty list.

I have tried the following base images with no luck:

rayproject/ray:latest
rayproject/ray:2.20.0.5708e7-py310-cu121
rayproject/ray-ml:latest
The commands I use:

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Versions / Dependencies

docker
ubuntu
nvidia cuda 12.4

Reproduction script

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

rynewang · 2024-05-20T22:49:47Z

@kevin85421 would you mind taking a look at their yaml?

stephano41 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 17, 2024

anyscalesam added the core Issues that should be addressed in Ray Core label May 20, 2024

rynewang assigned kevin85421 May 20, 2024

rynewang added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Docker doesn't recognise cuda gpu within container #45408

Ray Docker doesn't recognise cuda gpu within container #45408

stephano41 commented May 17, 2024

rynewang commented May 20, 2024

Ray Docker doesn't recognise cuda gpu within container #45408

Ray Docker doesn't recognise cuda gpu within container #45408

Comments

stephano41 commented May 17, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

rynewang commented May 20, 2024