Bug Report: System Jobs Hanging for Over 5 Hours During Boot

Summary:
During the boot process of Qubes OS, certain jobs (e.g., qubes-vm@... services) hang for over 5 hours without completing. After restarting, the system functions correctly, and no errors appear in journalctl logs. Attempts to mitigate the issue by setting a time limit for jobs to start have not resolved the problem. CPU and hardware activity appear normal during the hang, with no excessive heat or fan noise.


Steps to Reproduce

  1. Boot the system normally.
  2. Observe jobs like qubes-vm@sys-firewall, qubes-vm@sys-usb, and others starting during the boot process.
  3. Wait for the system to finish starting up.
  4. Experience hangs where some jobs take over 5 hours to complete.

Expected Behavior

All system jobs complete within a reasonable time during the boot process.


Observed Behavior

  • Specific jobs remain in the “starting” state for hours (e.g., qubes-vm@... jobs).
  • After a manual restart, the system boots successfully, and all services work as expected.
  • Logs (journalctl -b) show no apparent errors or issues.

Troubleshooting Steps Taken

  1. Checked Logs:

    • No errors or warnings found in journalctl -b.
    • All jobs appear to have completed successfully after a restart.
  2. Set Job Timeout:

    • Modified /etc/systemd/system.conf:
      DefaultTimeoutStartSec=120s
      JobTimeoutSec=120s
      
    • Result: Jobs still hang indefinitely during boot.
  3. Monitored Hardware:

    • No significant CPU or memory usage observed.
    • Fans remain inactive, suggesting no unusual workload.
  4. Disabled Autostart VMs:

    • Disabled VM autostart (qvm-prefs <VM_NAME> autostart false) for problematic qubes.
    • Result: Reduced the number of jobs but did not eliminate the hanging behavior.

Additional Notes

  • Hardware Behavior: Normal, with no signs of overheating or high resource usage.
  • Workarounds:
    • Restarting the system resolves the issue temporarily.
    • Disabling VM autostart reduces the number of hanging jobs but does not prevent them entirely.

Conclusion

This issue may be related to:

  • A bug in the systemd job handling for Qubes OS.
  • A race condition or dependency issue between qubes-vm@... services.
  • Hardware-related quirks that are not reported in logs.