If you are willing to sacrifice the x16 PCIe slot, you can do x4 x4 x4 x4 bifurcation and run 4 5.0 drives in raid 0, it should be four times faster than a single drive. I personally think it would be a waste of money, and you will not be able to use a GPU as a display device.
With AMD mainboards, you have 24 CPU connected PCIe lanes, 16 are used for the GPU and 8 are general purpose. The last 8 lanes are either used for two 5.0 NVMe drives, or one NVMe drive and one extra x4 PCI slot.
If you want to use two GPUs, you can only have a single 5.0 drive, limiting the total amount of disk space at 5.0 speed.
That’s interesting. I observed that qvm-clone duration is roughly proportional to the size of the VM or template being cloned and assumed on-disk size was the key cause of the wait, but sounds like some/much of that extra time is rather because larger VMs/templates tend to have more exposed apps → more qvm-appmenu activity.
A waste because in the general case storage IO tends not to be a substantive performance bottleneck?