So, I have been testing the exact same thing: qTox on 2 separate Whonix-17-based VMs.
Result: No underruns in journal. Sound is fine. I have not touched quantum.
While sys-whonix is the netvm, 360p is achievable but 240p is more stable.
Tested switching the netvm to sys-firewall, then even 720p was running smoothly. I had the camera pointed at a TV with quite dynamic pictures.
Overall, it seems Tox chat always needs about 30-40 seconds to “stabilize” the conference initially, only then things look smooth (video and sound). Until then, there are these cracks and smears.
My conclusion so far:
It is not hardware
It is not Qubes OS
It seems partially related to qtox and network speed
Some recent update may have fixed the underruns (no idea how to check)
I have not tried a newer version of qtox, only the one that is on the official repo provided by Whonix 17.
For the sake of a more complete testing, I also downloaded and installed the much praised SimpleX chat.
Result: Only chat seems to work, regardless of network settings. Any attempt for audio or video call doesn’t even notify the other party about it.