Streaming OpenGL/Vulkan calls to sys-gui-gpu

Getting accelerated 3D on Qubes is probably one of the biggest pain points. Even if everyone could forward a second GPU to a single domain using PCI passthrough (without even talking about the difficulties involved), we would not be able to do that to 2 domains simultaneously, and would have to face security implications even when dealing with 2 VMs non-simultaneously.

But there is another approach that (hopefully) would allow to overcome all of this: streaming the OpenGL calls from individual domains to a “GPU server” in sys-gui-gpu.

I played with the idea, happy that prior art existed - although trying that original codebase (written for rendering on Android, and itself based on a codebase for rendering on a Raspberry Pi, no shit) requires some amount of motivation.

I’m not quite to the point of getting this to run in sys-gui-gpu (partly because I don’t have one running yet :wink: ), but mostly because the original code is incomplete, only works when compiled as 32bit code (and thus with a 32bit apps), uses an insecure network protocol sharing and accepting pointers, and other fun stuff.

Nevertheless, I’ve started to poke at it as an experiment, to the point I’ve been able to run es2gears. That is, a 32bit version of it, with the “GLES server” running in the same domain with Mesa software rendering. And the small window that’s running 1000fps with plain Mesa runs at an anemic 70fps. That may sound not so promising, but given the state of the stack there is quite some room for improvement around 220fps which is finally not so bad given the room for improvement, and I’m submitting this to your thoughts.

Have fun :slight_smile:

Edit: a simple removal of apparently-arbitrary throttling already turned the original 15x slowdown into a mere 5x slowdown. Stay tuned for hopefully better stuff :slight_smile:


progress report

2 weeks after this heads-up, things have progressed nicely (given the amout of time I’ve been able to put into this). Progress highlights:

  • now testable with 64bit apps (while loosing temporarily the ability to work on 32bit)
  • texture support now working
  • a number of fixes to original forwarding of some APIs was done, and more were implemented

Much remains to be done for real-life usage, but more and more glmark2 benchmarks now run (textures, shading, bump, build:use-vbo=true, pulsar, function, conditionals).

High priority short term work:

  • provide at least a stub for all GLES2 core functions (all apps out there carelessly suppose the full API is implemented, and currently happily proceed calling NULL)
  • get an app’s graphic resources freed in the server when a client disconnects (we sometimes have fun interactions, not mentioning saturating memory)

progress report

I’ve been postponing a status update for a few days as I kept stumbling on problems preventing to see the actual progress, but here are the highlights.

  • near-complete rewrite of the limited UDP networking, with a TCP implementation. No more limitation on the size of textures and other buffers to be uploaded to the GPU.
  • lots of efforts into code cleanup and internals documentation (and still lots on my todo list, and not everything listed in the TODO file :wink: )
  • many bugs addressed
  • integrated and finalized useful work from all public upstream branches on github
  • last but not least, more coverage of both EGL and GLES2 APIs:
    • glmark2 can now be launched for its default set of benchmarks, the still-not-implemented functions don’t prevent it to run everything it can (it does skip all tests needing an extension, 2 tests miss a few crucial APIs, only 1 test clearly lacks something without any warning to give a clue)
    • Bobby Volley 2 can be launched and played (far from an AAA game, but hey, that’s still a real game, and it is playable despite a few missing APIs)

Next focus:

  • more coverage and some benchmark numbers
  • finally deal with the graphic resources isolation and freeing already mentioned, some crucial infrastructure work is already there
  • finally rework the protocol to get rid of those pointers-on-the-wire (yuck)
  • EGL/GLES extension handling

This seems like a really good idea, maybe we’ll even see a gaming qube in the future? Anyways, nice work!


I made a quick FPS comparison, using today’s 0b6cbf7265314 revision from git master, with the glmark2 default test (windowed, 800x600), comparing:

  • running natively on an AMD Stoney APU
  • running through gl-streaming through the loopback network interface
  • running glmark2 on a higher-end (QubesOS) laptop, streaming to the AMD Stoney APU through gigabit ethernet

Keep in mind that this is still preliminary code, with no profiling/optimisation done yet (I even removed most upstream optimisations, which were slowing down feature progress).

And all of this, obviously, is just statistics :slight_smile:

Potentially interesting information:

  • from GLS alone, most tests suffer about 60% drop in FPS, some perform noticeably better, but one (the famous ideas scene, which does not render properly to start with) suffers from nearly 90% FPS drop
  • the tests that show to be the most demanding in the native case usually don’t suffer as much from being used through GLS – that could be a rather good sign for real-life loads
  • if we add a real network in the picture, most tests show 5-10% additional loss, except:
    • one test using client-data instead of vbo (where we likely transfer the data several times, no surprise that’s great handicap when that happens over the network)
    • several tests show a better score than with GLS alone, which seems to hint that in those particular cases, the more powerful CPU can indeed compensate for part of the loss in GPU throughput (could those be CPU-bound benchmarks of the GPU ? would not sound good)
1 Like

This is cool. Please keep us up to date on your progress!

I’m currently working on proper support for EGL and GLES extensions, with the former pretty much working and the latter still get details wired. In parallel I’ve found a set of examples from a GLES2 book which put the spotlight on a nasty issue (for which I pushed a wrong fix to master, stay tuned) which may turn out to be the root cause for glmark2 -d ideas not rendering everything it should – I’m still investigating this one.

I’ll post a more formal “progress report” once those 2 items are fully dealt with.

It would be really great to have more real GLES2 applications making use of 3D, as mostly everything out there on the Desktop is using Desktop GL and GLX. I have a plan to take some godot3 3D games, which are mostly using GLES3, though it is quite easy to ask godot3 to use GLES2 instead – but that won’t be as good as a game designed for GLES2, we’ll just have very few graphical effects available…

1 Like

progress report

It took some time since last update, and accordingly that’s a pretty big one:

  • a process is forked on server side for each client connecting, providing both resource isolation (and freeing on client termination) and support for several simultaneous clients
  • server-side window is now created at WindowSurface-creation time, and has the expected size (server window does not get resized afterwards, but the displayed size can usually be reduced by making the input window smaller)
  • support for extensions, both for EGL and GLES2, with a couple of them implemented, as required/useful for the test programs at hand
  • successfully tested more apps (prboom, weston)
  • new support status page for tested apps
  • only EGL 1.0 is advertised, now that glmark2 landed a conformance fix allowing it to run
  • code cleanups, doc improvements, and too many bugfixes to list here

As “running weston over X11” implies, that seems to bring some Wayland support, but well… the compositor having access to GPU acceleration does not necessarily mean the wayland apps produce GPU-accelerated streams to be composed, or that meaningful wayland apps can run yet (eg. glmark2-es2-wayland does not yet).

what next ?

It looks like a good point in this prototype to get a look at rendering the window in the GPU domain. I’ll first have a look at running the server on the GPU domain. There are 2 complementary paths there:

  • traditional GPU-in-dom0 setup, but without networking in dom0, will require a virtio-based transport
  • sys-gui-gpu setup, where possibly the current TCP transport can be used as-is (can yield sooner results, but I’ll have to get that sys-gui-gpu to work first on my machine)

With the display being done in the same domain as the server, it will be possible to look at how to use a single X11 window for input and display.