[qubes-users] Systemd terminating qubesd during backup?

I seem to have an intermittent problem when my backup scripts are running late at night.

My qubesd is apparently being shutdown (sent a sigterm signal) by systemd during my long running backup sessions which then causes an eof pipe close exception and qvm-backup then receives a socket exception and immediately receives a second exception while still handling the first exception, thus the qvm-backup process gets forcibly terminated mid stream. Just prior to the qubesd shutdown I can clearly see that systemd had also shutdown/restarted the qubes memory manager (qubes-qmemman) too.

Q: What kind of background maintenance processing would actually require qubesd or the memory manager to be restarted?

Q: Could this processing be put on hold during backups?

Q: Or, how could I at least know when this maintenance is scheduled to happen so I could defer my own processing?

My scripts can certainly trap this error, erase the incomplete backup file, then loop and check for qubesd to complete its restart, and then finally restart my own backup processing, but why should this even be necessary?

When this happens its almost always during the backup of my largest VM which can take well over 90 minutes to complete. If I can somehow block/defer this kind of system maintenance until after my backups are complete that would be better than having to deal with trapping random restarts.

thanks,

Steve

Sorry for the amount of text below but googlegroups would not let me send this as *.txt attachments:

qvm-backup error:

Making a backup... 56.28%
Connection to qubesd terminated, reconnecting in 1.0 seconds

Backup error: Got empty response from qubesd. See journalctl in dom0 for details.

Traceback (most recent call last):
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 580, in qubesd_call
client_socket.connect(qubesadmin.config.QUBESD_SOCKET)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/bin/qvm-run", line 5, in <module>
sys.exit(main())
File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 199, in main
args = parser.parse_args(args, app=app)
File "/usr/lib/python3.5/site-packages/qubesadmin/tools/__init__.py", line 384, in parse_args
action.parse_qubes_app(self, namespace)
File "/usr/lib/python3.5/site-packages/qubesadmin/tools/__init__.py", line 170, in parse_qubes_app
namespace.domains += [app.domains[vm_name]]
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 91, in __getitem__
if not self.app.blind_mode and item not in self:
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 114, in __contains__
self.refresh_cache()
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 63, in refresh_cache
'admin.vm.List'
File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 583, in qubesd_call
'Failed to connect to qubesd service: %s', str(e))
qubesadmin.exc.QubesDaemonCommunicationError: Failed to connect to qubesd service: [Errno 2] No such file or directory

journalctl output:
Oct 11 03:15:02 dom0 systemd[1]: Stopping Qubes memory management daemon...
Oct 11 03:15:02 dom0 systemd[1]: Stopped Qubes memory management daemon.
Oct 11 03:15:02 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:02 dom0 kernel: audit: type=1131 audit(1633936502.556:250): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:02 dom0 systemd[1]: Starting Qubes memory management daemon...
Oct 11 03:15:07 dom0 qmemman.daemon[18836]: MIN_PREFMEM=199229440 DOM0_MEM_BOOST=349175808 CACHE_FACTOR=1.3
Oct 11 03:15:07 dom0 systemd[1]: Started Qubes memory management daemon.
Oct 11 03:15:07 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:07 dom0 kernel: audit: type=1130 audit(1633936507.784:251): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-qmemman comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53353416961, total_mem_pref=0, total_available_memory=53353416961)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: xenfree=53405845761 memset_reqs=[]
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53353416961, total_mem_pref=0, total_available_memory=53353416961)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: xenfree=53405845761 memset_reqs=[]
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53353416961, total_mem_pref=2416060825.6000004, total_available_memory=55215546215.4)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4278190080 pref=2416060825.6000004 last_target=4278190080
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: xenfree=53405845761 memset_reqs=[('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53336639745, total_mem_pref=2854994739.2000003, total_available_memory=58954139085.8)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4294967296 pref=2416060825.6000004 last_target=4294967296
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '3' act=4177526784 pref=438933913.6 last_target=4177526784
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: xenfree=53389068545 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 3 to 4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:07 dom0 systemd[1]: Stopping Qubes OS daemon...
Oct 11 03:15:07 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53319862529, total_mem_pref=2769393254.4, total_available_memory=59039740570.6)
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4294967296 pref=2330459340.8 last_target=4294967296
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: dom '3' act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: stat: xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 3 to 4194304000
Oct 11 03:15:07 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:07 dom0 qubesd[3180]: caught SIGTERM, exiting
Oct 11 03:15:08 dom0 widget-wrapper[7630]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 kernel: audit: type=1131 audit(1633936508.097:252): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:08 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53319862529, total_mem_pref=2620735488.0, total_available_memory=59188398337.0)
Oct 11 03:15:09 dom0 qubesd[3180]: Traceback (most recent call last):
Oct 11 03:15:09 dom0 qubesd[3180]: File "/usr/lib64/python3.5/runpy.py", line 193, in _run_module_as_main
Oct 11 03:15:09 dom0 qubesd[3180]: "__main__", mod_spec)
Oct 11 03:15:09 dom0 qubesd[3180]: File "/usr/lib64/python3.5/runpy.py", line 85, in _run_code
Oct 11 03:15:09 dom0 qubesd[3180]: exec(code, run_globals)
Oct 11 03:15:09 dom0 qubesd[3180]: File "/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 179, in <module>
Oct 11 03:15:09 dom0 qubesd[3180]: main()
Oct 11 03:15:09 dom0 qubesd[3180]: File "/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 169, in main
Oct 11 03:15:09 dom0 qubesd[3180]: copy_sparse_data(input_file, output, sparse_map)
Oct 11 03:15:09 dom0 qubesd[3180]: File "/usr/lib/python3.5/site-packages/qubes/tarwriter.py", line 126, in copy_sparse_data
Oct 11 03:15:09 dom0 qubesd[3180]: output_stream.write(buf[:read])
Oct 11 03:15:09 dom0 qubesd[3180]: BrokenPipeError: [Errno 32] Broken pipe
Oct 11 03:15:09 dom0 widget-wrapper[7637]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 systemd[1]: Stopped Qubes OS daemon.
Oct 11 03:15:09 dom0 libvirtd[3204]: 2021-10-11 07:15:08.211+0000: 3204: error : virNetSocketReadWire:1808 : End of file while reading data: Input/output error
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4294967296 pref=2181801574.4 last_target=4294967296
Oct 11 03:15:08 dom0 systemd[1]: Starting Qubes OS daemon...
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3' act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to 4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53319862529, total_mem_pref=2570187161.6, total_available_memory=59238946663.4)
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4294967296 pref=2131253248.0 last_target=4294967296
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3' act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:09 dom0 widget-wrapper[7630]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7637]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to 4194304000
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:09 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:08 dom0 qmemman.daemon.algo[18836]: balance_when_enough_memory(xen_free_memory=53319862529, total_mem_pref=2436657152.0, total_available_memory=59372476673.0)
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '0' act=4294967296 pref=1997723238.4 last_target=4294967296
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: dom '3' act=4194304000 pref=438933913.6 last_target=4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: stat: xenfree=53372291329 memset_reqs=[('3', 4194304000), ('0', 4294967296)]
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 3 to 4194304000
Oct 11 03:15:08 dom0 qmemman.systemstate[18836]: mem-set domain 0 to 4294967296
Oct 11 03:15:10 dom0 widget-wrapper[7630]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7637]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:10 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7630]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7637]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 widget-wrapper[7625]: Connection to qubesd terminated, reconnecting in 1.0 seconds
Oct 11 03:15:11 dom0 systemd[1]: Started Qubes OS daemon.
Oct 11 03:15:11 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 11 03:15:11 dom0 kernel: audit: type=1130 audit(1633936511.288:253): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubesd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Steve Coleman:

I seem to have an intermittent problem when my backup scripts are running
late at night.

My qubesd is apparently being shutdown (sent a sigterm signal) by systemd
during my long running backup sessions which then causes an eof pipe close
exception and qvm-backup then receives a socket exception and immediately
receives a second exception while still handling the first exception, thus
the qvm-backup process gets forcibly terminated mid stream. Just prior to
the qubesd shutdown I can clearly see that systemd had also
shutdown/restarted the qubes memory manager (qubes-qmemman) too.

Q: What kind of background maintenance processing would actually require
qubesd or the memory manager to be restarted?

It's crond running logrotate. Fixed in R4.1 but not yet in R4.0:

Q: Could this processing be put on hold during backups?

I always sandwich backup runs between 'systemctl stop crond' and
'systemctl start crond'.

Rusty

Steve Coleman:
> I seem to have an intermittent problem when my backup scripts are running
> late at night.

> My qubesd is apparently being shutdown (sent a sigterm signal) by systemd
> during my long running backup sessions which then causes an eof pipe close
> exception and qvm-backup then receives a socket exception and immediately
> receives a second exception while still handling the first exception, thus
> the qvm-backup process gets forcibly terminated mid stream. Just prior to
> the qubesd shutdown I can clearly see that systemd had also
> shutdown/restarted the qubes memory manager (qubes-qmemman) too.

> Q: What kind of background maintenance processing would actually require
> qubesd or the memory manager to be restarted?

It's crond running logrotate. Fixed in R4.1 but not yet in R4.0:
https://github.com/QubesOS/qubes-issues/issues/5004

> Q: Could this processing be put on hold during backups?

I always sandwich backup runs between 'systemctl stop crond' and
'systemctl start crond'.

Great!

I was hoping for an answer like that. In my case the system will be shutting down after the unattended backup completes, so this is actually twice as easy!

thanks!