What @alimirjamali means by “volumes and metadata” is essentially (but not limited to):
- The partitions inside the qube (volumes)
/dev/xvda
, /dev/xvdb
, /dev/xvdc
, etc.
- The metadata
- The output of
qvm-prefs <qube>
, more or less
- These values are contained in the XML files corresponding to each qube
- They specify things about the environment that Xen needs to create to boot the qube successfully
This, with a few caveats, is a qube, at its barebones.
This is mostly all you need to be able to:
- Run qubes remotely on someone else’s hardware
- Homelab server
- Remote computer
- Cloud Provider’s hardware
- Back up qubes, and move them around between machines
- Either when they’re shut down (“at rest”), or potentially while they’re still running
Context for Qubes Air, Data Centers, and “Live qube Transfer”
@alimirjamali is also describing the way gigantic data centers are structured in terms of hardware and networking.
They consist of tens of thousands of individual computers/servers (and those computers are usually pretending to be tens of thousands more), gigantic storage arrays (imagine a warehouse that is full of nothing but hard drives with YouTube or Netflix media on them).
Imagine how many people per second are asking that data center for files. Imagine how many of those are asking for the same file.
In your regular home or office network, you generally have the following:
- One (or two) ways to get out of the network (WAN endpoints)
- One (or two) big devices that everything is plugged into, that sorts, coordinates and forwards everyone’s data packets (Switch)
- Usually a big difference in speed when accessing devices that are inside the network, to devices that are outside the network (because most ISPs believe it’s morally right to extract value from artificially throttling/limiting the capabilities of physical hardware )
- Devices are generally connected to each other using the “Hub and Spoke” method.
Data center networking is a whole different ballgame:
- Hundreds, if not, thousands of ways to get in and out of the network (endpoints)
- Almost every server needs to be able to talk to the internet just as fast as other servers inside the data center
- Parts of the network need to be kept separate from each other, even though they likely use the same cables
- Usually the hard drives will generate enough traffic that they need to be on their own dedicated network
- The backups of the backups have backups, in case the backups of the backups fail
- Take a shot every time someone says the word “redundancy”
This is why data center networks are usually done like this:
This is called the “Spine and Leaf” method.
There are some exceptions (some people do have quite a lot of money to spend on their home networks ), but in general, Data Centers have a bottomless pit of money to be able to buy “the best of the best” in computer hardware and infrastructure.
They also change/rotate/upgrade that hardware faster than most of us change/upgrade our wardrobe/clothes!
(No joke, there are Data Centers that employ people whose ONLY job is to spend 8 hours a day going around to all the servers with a trolley of brand new SSDs, and swapping out the old ones that have died with fresh new ones, because the old ones have been written to so much that they’re now read-only. And I can guarantee you that it would be 3 x 8-hour shifts for round-the-clock drive-swapping)
Most of them have also promised their customers that they will “always be able to access their stuff, no matter what, 24/7/365.25”. They have also usually never told their customers what actually happens to their stuff behind the scenes (and to be fair, most customers couldn’t care less, as long as they can access their stuff when they want to, and nothing bad happens to their stuff).
This is important in the case of VMs. If you were asking a Data Center to run a VM for you containing your company network LDAP database (so your employees can log into their workstations/portals), you’d definitely want that to be always running.
You probably wouldn’t be too happy if the Data Center shut that down for an hour during the day to transfer it to a different server (and none of your employees could log in and just sat there getting paid to do nothing…).
This is why the Data Centers have spent a lot of time, effort and money developing ways to be able to move customers’ VMs around while they’re still running, internet-connected, and performing their tasks.
We know how some of those methods work (FOSS), and some we don’t (proprietary).
How does this relate to Qubes Air?
There will definitely be people out there who will happily put (some of) their qubes in a Data Center, where they will want to be able to access them at lightning-fast speed, and not have to worry about where they actually are, what hardware they’re running on, etc.
These people would probably also be ok with the Data Centers running their qubes for them (outsourced execution), even if it means allowing the Data Centers to see/understand everything that goes on inside those qubes.
There will also be people out there who would be ok with Data Centers storing their qubes “at rest” for them, but running them on hardware that they have complete control over (local execution). For these people, they will either require:
- a constant uninterrupted stream connection to their storage
- every read/write will require a data packet leaving/entering their Qubes OS machine
- they will need to copy over their entire qube’s contents in one go, execute it, and then copy it back upon qube shutdown
- Think
git
branches/merges but with a qube (could be cool)
- A hybrid combination of the two
- With a healthy dose of encryption so you can give to someone and say “hold my beer”
This means that online/offline migration isn’t something that can be feasibly written in code any time soon, so they’ve chosen to keep it on their wishlist, and focus on what they think they can achieve given their current resources.
I can also see “migration” (as opposed to straight copying of the 0s and 1s) of a qube having this uncanny ability to produce unforeseen circumstances, forcing you to “go back to square one” multiple times, and is honestly, in fairness, more work than it’s worth at this point in time.
There will be a way (in fact, there will likely be many ways…), but at the moment, it’s too under-resourced to investigate…
…and qubes-backup
currently meets the needs functionally for those that wish to move qubes from one machine to another at this point in time.
But still, being able to remotely control another Qubes OS machine using this protocol is exciting. Being able to cluster and orchestrate Qubes OS machines together is the first step to making the other things a reality.
Very exciting times, indeed.