-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is the GPU working? #4531
Comments
@15731807423 what's the output of |
@pdevine
|
@15731807423 looks like 70b is being partially offloaded, and 8b is fully running on the GPU. When you do Here's my ollama ps output on the 4090:
|
@pdevine (base) PS C:\Windows\System32> ollama run llama3
total duration: 5.8423836s (base) PS C:\Windows\System32> ollama run llama3:70b
total duration: 13.0373642s |
I think might be related to #1651 ? It doesn't look like |
It is using the GPU, but it's not particularly efficient at using it because the model is split across the CPU and GPU and the limitations of the computer (like slow memory). You can turn the GPU off entirely in the repl with:
Which should show you the difference in performance. You can also load a lower number of layers (i.e. Back to CPU only (using
or roughly half the speed of the GPU. |
To expand on what Patrick mentioned, the 42% of the model loaded on system memory and doing inference calculations on the CPU is significantly slower than the GPU, so the GPU is able to quickly accomplish it's calculations for each step in the inference, and then sits idle waiting for the CPU to catch up. The closer you can get to 100% on GPU, the better the performance will be. If you have further questions, let us know. |
After running 'ollama run llama3:70b', the CPU and GPU utilization increased to 100%, and the model began to be transferred to memory and graphics memory, then decreased to 0%. Then a message was sent, and the model began to answer. The GPU only rose to 100% at the beginning and then immediately dropped to 0%, and only the CPU remained working. Is this normal?
The text was updated successfully, but these errors were encountered: