This is a quick cheat sheet to set up env for vLLM in WSL(Windows Subsystem for Linux) or Linux.
Full documentation can be found in https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#requirements
curl -LsSf <https://astral.sh/uv/install.sh> | sh
uv venv --python 3.12 --seed
source .venv/bin/activate
<aside> đź’ˇ
In order to be performant, vLLM has to compile many cuda kernels. The compilation unfortunately introduces binary incompatibility with other CUDA versions and PyTorch versions, even for the same PyTorch version with different building configurations.
Therefore, it is recommended to install vLLM with a fresh new environment. If either you have a different CUDA version or you want to use an existing PyTorch installation, you need to build vLLM from source. See below for more details.
</aside>
uv pip install vllm --torch-backend=auto
--torch-backend=auto allows uv to automatically select the appropriate PyTorch index at runtime.
If this doesn’t work, try running uv self update to update uv first.
If you want to build from source,
sudo apt install ccache
install CUDA toolkit (12.4)