Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray Docker doesn't recognise cuda gpu within container #45408

Open
stephano41 opened this issue May 17, 2024 · 1 comment
Open

Ray Docker doesn't recognise cuda gpu within container #45408

stephano41 opened this issue May 17, 2024 · 1 comment
Assignees
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks

Comments

@stephano41
Copy link

What happened + What you expected to happen

I am trying to run Ray with gpu support within a custom docker image and docker compose.

The Dockerfile I am using:

FROM rayproject/ray:latest-py39-cu121

WORKDIR /opt/project

USER root

RUN  sudo apt-get update && \
     apt-get install -y build-essential --no-install-recommends gcc git wget

CMD ["bash"]

The docker compose file I am using:

services:
    app:
        build:
            context: .
            dockerfile: Dockerfile
        
        container_name: test_ray
        image: test_ray
        volumes:
            - ./:/opt/project/
        tty: true
        stdin_open: true
        shm_size: 12gb
        runtime: nvidia
        environment:
          NVIDIA_VISIBLE_DEVICES: all
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: 1
                  capabilities: [gpu]

within the container, I can see that nvidia-smi recognises the gpu. However running ray.get_gpu_ids() returns an empty list.

I have tried the following base images with no luck:

rayproject/ray:latest
rayproject/ray:2.20.0.5708e7-py310-cu121
rayproject/ray-ml:latest
The commands I use:

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Versions / Dependencies

docker
ubuntu
nvidia cuda 12.4

Reproduction script

docker compose build
docker compose run app bash
nvidia-smi I can see my cuda version being 12.2
python
import ray
print(ray.get_gpu_ids())

Issue Severity

High: It blocks me from completing my task.

@stephano41 stephano41 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 17, 2024
@anyscalesam anyscalesam added the core Issues that should be addressed in Ray Core label May 20, 2024
@rynewang
Copy link
Contributor

@kevin85421 would you mind taking a look at their yaml?

@rynewang rynewang added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants