Most of the suggestions out there on this problem improve things only mildly.
The code just doesn’t work that well for this. KVM and VirtualBox do not have delays anywhere near this level.
There also doesn’t seem to be a way to make this better by increasing latency and adding more buffer to the audio.