Ever since I let my laptop with Qubes on it die I’ve had issues when starting individual qubes. The qubes just won’t run programs. They start but don’t do anything. I repeatedly restart them and kill them and eventually they start working properly but this is an issue every time i start my system. I am not even sure if it was due to the power loss, although this started around that time and is ongoing. It is also very unpredictable in that it affects some qubes sometimes while other other times, but mostly all of them. It doesn’t affect disposables and i believe templates although i haven’t tested thoroughly. It also doesn’t seem to be restricted to qubes that were on when the machine lost power, as some qubes affected were off and are still affected. Also anecdotally the issue seems to be less prevalent if I wait a while before starting programs after starting X but again I am not sure of this. If anyone has any ideas on next steps to solve this please let me know.
Edit:
Also, after killing the qubes during the period in which they are misbehaving, they often start again immediately by themselves. I believe this is because the instructions to start the qube are backed up and still waiting to be executed. This also sometimes happens when shutting down the qubes later once they are functioning properly.
Run this command in the qube that had this issue and then started to work after some time:
sudo systemd-analyze plot > startup_order.svg
Then open the startup_order.svg file in the browser.
You’ll see the order and time it took for systemd services to be started so you can see at which service it was stuck.
Yes those qubes had the issue on this boot. I checked the qubes’ logs and there were no issues. I looked carefully through dom0’s and it had minimal errors but nothing that seemed relevant to this issue. Anything in particular you think I should look for or do you have any other ideas? Should I post a snippet of dom0’s log? Thank you!
Maybe there is an issue with your disk and the disk read speed is just low. Do you have SSD or HDD?
Try to check the disk read/write speed.
Maybe try to check the disk for errors with fsck.
You can also check the SMART.
I used smartctl and it shows no issues. The disk shouldn’t be an issue as its a ssd. Online it says my read speed should be 2.7MB/s and write should be 1.9MB/s. It’s not a hardware issue.
Do the affected qubes use the same template?
Maybe it’s an issue with the template’s system storage: it’s corrupted or you’ve increased the system storage size for template but didn’t start the template to apply it. So that every time you start qube based on this template it’s running fsck to fix the errors in rootfs or resizing the system storage but these changes are not persistent so it’s repeating itself.
Try to start the template of the affected qubes so it’ll fix the errors or finish the resize, stop the template when it finish doing it and then try to start the affected qubes again.
After testing for a few days I believe that fixed it! Thank you for your help and patience.
However, I don’t recall increasing system storage size for any templates. Is corruption an issue I still need to think about?
So now that some time has passed, I have found that the problem seems to persist. Moreover it seems the delay from starting apps in qubes occurs when starting using qvm-run. If i “run a command” using the Qube Manager, then it is run correctly instantly. This is strange to me but I am not well versed in how qrexec works or why this could be happening. Could someone shed some light on this please? Thanks everyone.
I figured out the issue. It was due to sys-audio. It seems that when qubes are started that have sys-audio set as the audiovm, that this causes the delay. If sys-audio is started before then there is no issue.