Putting this in cron for every 5 min in dom0 can warn you before the system grinds to a halt because of dom0 memory. (adjust the the “4” to whatever you want your memory threshold to be in gigs, and adjust the expire time (in miliseconds))
FREE_MEMORY=$((`free -g | grep '^Mem' | awk '{print $7}'`))
if (( $FREE_MEMORY < 4 )); then
notify-send --expire-time=360000 --urgency=critical 'RUNNING OUT OF DOM0 MEMORY!!!!!!' "DOM0 memory is down to $FREE_MEMORY Gigs... DO SOMETHING!!!!! :) "
fi
Similarly, putting this in cron to run every 5 min in dom0 can warn you before the system runs out of xen memory for new qubes. (adjust the the “8” to whatever you want your memory threshold to be in gigs, and adjust the expire time (in miliseconds)) (note: this situation was a bigger problem in qubes 4.1 then qubes 4.2)
FREE_MEMORY=$((`xl info | grep free_memory | sed 's/^.*:\([0-9]*\)/\1/'` / 1000))
if (( $FREE_MEMORY < 8 )); then
notify-send --expire-time=360000 --urgency=critical 'RUNNING OUT OF XEN MEMORY!' "Xen memory is down to $FREE_MEMORY Gigs... Kill some VM's!"
fi
Quite misleading either, unfortunately. Both values are down to ridiculously low amounts after long uptime – and the system still has some caches to free when it needs memory… I have 393 megs free as reported by xl info and 1g free in dom0 – and the system runs smoothly for weeks.
How many qubes do you run at a time? (and how often do you shutdown/launch new ones?)
My system used to suddenly slow down to a point of having to do a hard power off. I found it was a dom0 memory problem. My dom0 memory will drop like crazy, but I run a lot of qubes (I checked and at this moment I have 26 qubes running). I believe the issue is the compositor storing all redraw information. To compensate for this, I changed the trigger from 8 gigs in dom0 (which i use on my system), to 4 gigs when posting it. But maybe I missed the mark by a order of magnitude
If you are stable at 1000 megs in dom0 we could change the free -g part to free -m, and the change the 4 to 500. The objective is just to not let it get to 0. That might be more applicable to the average user
Are you running 4.1 or 4.2? Its critical in 4.1 to not let the xl memory get down to 0 because the memory balloon doesn’t seem to actually work in 4.1 , and will cause a newly launched VM to just be killed off and possibly other undesirable outcomes. The memory balloon does seem to actually work in 4.2 though, so avoiding getting down to 0 memory in 4.2 is not so critical. While i still hesitate to leave my system at 0 gigs of xl memory for more then a min or two, it’s possible it’s actually fine to leave it there in 4.2. In that cause we would need to come up with a way to determine the free memory that’s available for the memory balloon inside of all the running qubes, and sum them together, and add them to the xl free memory.
Do to the recent discussion, in “/bin/dom0-Memory-Notification”, can you change free -g to free -m
and $FREE_MEMORY < 4 to $FREE_MEMORY < 500
and "Remaining: $FREE_MEMORY " to "Remaining: $FREE_MEMORY Megs"?
Also maybe add something to emphasize that it just warns when under the threshold and that they need to set the threshold to meet their needs? (i’m fine if you want to turn the 500 into a named variable)
Also, for /bin/Xen-Memory-Notification can you change "Remaining: $FREE_MEMORY" to "Remaining: $FREE_MEMORY Gigs"?
I run about 10-12 desktop qubes and 5-7 service qubes, starting and shutting down often.
It is 4.2, 342M free now in xl info, uptime is 52 days and the system is perfectly healthy.
If you are running 4.2, then try this for xen memory:
FREE_MEMORY=$((`xl info | grep free_memory | sed 's/^.*:\([0-9]*\)/\1/'` / 1000))
POSSIBLY_RECLAIMABLE_MEMORY=$((`xl vm-list | tail -n +3 | awk '{print $3}' | xargs -I {} qvm-run --pass-io {} "free -m| grep 'Mem:'" | awk '{ print $7}' | paste -sd+ | bc` / 1000))
if (( $FREE_MEMORY + $POSSIBLY_RECLAIMABLE_MEMORY < 8 )); then
notify-send --expire-time=360000 --urgency=critical 'RUNNING OUT OF XEN MEMORY!' "Xen memory is down to $FREE_MEMORY Gigs... with $POSSIBLY_RECLAIMABLE_MEMORY Gigs in possibly reclaimable memory... Kill some VM's!"
#setting expire-time to 6 min (cron check is every 5 min) (360000 = 6 min)
fi
It includes what I believe would be the reclaimable memory in running qubes, and so it should be good for 4.2 . Of course it’s more complicated then the old version, meaning it’s harder to understand and audit, thus meaning that people should (would?) be more reluctant to use it. Also, it’s substantially more processing on dom0 now.
I’ve done substantially more aggressive testing with 4.2, to where I overcommit the memory now and leave the “free xen memory” at 0 during standard operation.
Recently I got a “not enough memory” error when trying to start a VM. I checked and there was more then 8 gigs availble when looking at the free memory of each qube, and adding it to the “free xen memory”.
There must be something else going on. Like maybe it reserves a certain amount of memory per qube, that it refuses to reclaim, even though the memory is not in use, or something like that. If anyone has ideas, please speak up.
obviously I could just change the script to trigger at 9 gigs instead of 8, but would prefer to figure out what’s really going on to come up with a better way of estimating.
For the dom0 memory, I have found that the hard to explain memory consumption culprit seems to be “slab memory”
doing: echo 2 > /proc/sys/vm/drop_caches
in dom0 can reclaim a tiny bit of memory
(there is also a echo 3 > /proc/sys/vm/drop_caches option, but that does not seem to help any more then “echo 2” does)
Also of note, the dom0 slab memory seems to occasionally grow at a crazy rate (like a gigabyte every 5 min or so) when running libreoffice in a qube. I don’t know why this would be, but it seems very correlated.