Running Large Language Model in Template

I have 2 GPUs, an internal Intel and a NVIDIA.

I would like to run an open source large language model. Most of the open source programs look like they run in Windows. I would like to allocate the NVIDIA process for the program. NVIDIA GPUs run best with Windows and Windows drivers.

What is the most efficient way to accomplish the task and how should I implement this? I could run Windows without Qubes if it’s air-gapped if there is no Qubes-based solution, but that would be a unfortunate alternative.