I am afraid that's going to be a though one. The way I understand how screen readers work is that they interrogate the UI and then provide some form of output be it audio or haptic.
In Qubes a lot of energy went into preventing exactly this scenario. My high level understanding is that each VM renders their windows onto a virtual screen. The areas of these windows are then sent securely as a bitmap to dom0 where those bitmaps then get rendered onto the screen.
The entire point being that no VM can see or interrogate any of the output of the other VMs and dom0 cannot be compromised because it is only dealing with bitmaps.
There might be a way to get Orca or similar tools to work on a per VM basis, which might be OK with audio output -- I don't know. But things like system wide focus management would require a Qubes specific screen reader implementation in dom0 with stubs in each VM.
With the upcoming version of Qubes GUI management will be removed from dom0 and instead happen in a dedicated GUI VM, which is even more secure and is meant to also make it easier to support different environments like GNOME, KDE etc. My hope would be that in this new architecture there would be an easier way to support this.