-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2
#5428
opened Jun 11, 2024 by
c3-ali
Loading…
[Frontend] Add "input speed" to tqdm postfix alongside output speed
#5425
opened Jun 11, 2024 by
mgoin
Loading…
[Hardware][AMD][CI/Build][Doc][Kernel] Upgrade to ROCm 6.1, Dockerfile improvements, Paged Attention tuning
rocm
#5422
opened Jun 11, 2024 by
mawong-amd
Loading…
[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model
#5414
opened Jun 11, 2024 by
wooyeonlee0
•
Draft
1 of 4 tasks
[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.
#5412
opened Jun 11, 2024 by
bong-furiosa
Loading…
[Core] Refactor Worker and ModelRunner to consolidate control plane communication
#5408
opened Jun 11, 2024 by
stephanie-wang
Loading…
2 tasks done
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later
#5401
opened Jun 11, 2024 by
tlrmchlsmth
Loading…
[Experimental] Testing validity of the baseline AMD CI
#5394
opened Jun 10, 2024 by
Alexei-V-Ivanov-AMD
Loading…
[Kernel] Factor out epilogues from cutlass kernels
#5391
opened Jun 10, 2024 by
tlrmchlsmth
Loading…
[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel
#5390
opened Jun 10, 2024 by
cyang49
Loading…
[Model][Hardware][NV] Add support for ModelOpt static scaling checkpoints
#5387
opened Jun 10, 2024 by
pavanimajety
•
Draft
[WIP][Core] Support tensor parallel division with remainder of attention heads
#5367
opened Jun 9, 2024 by
NadavShmayo
Loading…
[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.
#5356
opened Jun 8, 2024 by
FurtherAI
Loading…
1 task
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.