Huge read/write speed difference with USB 3.2 Gen 2x2 disk via sys-usb + qvm-block in sys-backup (1.4 GB/s write vs ~220 MB/s read)

Hi everyone,
I hope someone with deeper Xen/Qubes internals knowledge can help me understand what’s going on here. I’m seeing a very large asymmetry between sequential write and read performance on an external USB 3.2 Gen 2x2 SSD that’s passed through via sys-usb and attached as a block device to my sys-backup qube.
The disk is LUKS2-encrypted (Argon2id, 1 GiB memory), mounted in sys-backup, and I’m doing very basic dd tests with direct I/O and cache drop to make sure we’re measuring the real device throughput.

Write test (64 GiB):

user@sys-backup:~$ sudo sh -c ‘sync && echo 3 > /proc/sys/vm/drop_caches’ && sudo dd if=/dev/zero of=/mnt/test/bigfile bs=1G count=64 oflag=direct status=progress
68719476736 bytes (69 GB, 64 GiB) copied, 51 s, 1.4 GB/s
64+0 records in
64+0 records out
68719476736 bytes (69 GB, 64 GiB) copied, 50.6449 s, 1.4 GB/s

Read test (same file, same conditions):

user@sys-backup:~$ sudo sh -c ‘sync && echo 3 > /proc/sys/vm/drop_caches’ && sudo dd if=/mnt/test/bigfile of=/dev/null bs=1G iflag=direct status=progress
68518150144 bytes (69 GB, 64 GiB) copied, 312 s, 220 MB/s
1024+0 records in
1024+0 records out
68719476736 bytes (69 GB, 64 GiB) copied, 312.819 s, 220 MB/s

So writes hit ~1.4 GB/s (which seems realistic for USB 3.2 Gen 2x2 with encryption overhead), but reads are stuck at only ~220 MB/s — roughly 6× slower.
What I already checked/tried:

USB link negotiates SuperSpeedPlus Gen 2×2 (20 Gbps) — confirmed via lsusb -t in sys-usb
sys-usb has plenty of resources now: 8 GB RAM (actually used), 4 vCPUs
Read test uses iflag=direct + cache drop, same bs as write
No thermal throttling visible, sustained test
CPU usage during read is not maxed out in sys-backup or sys-usb

I’ve seen a few older threads mentioning that Xen block passthrough (xen-blkfront/backend) sometimes has very asymmetric performance, with reads being much slower than writes due to grant table / ring buffer limitations. Is that still the case in Qubes 4.3? Or am I missing some tunable/quirk?
Would really appreciate if any of the more experienced folks could explain what causes this read bottleneck in the current Xen/Qubes block proxy design — and whether there is any realistic way to improve the read speed without giving up the security benefits of sys-usb + block passthrough.
Thanks a lot in advance for any insights!