CPU temperature sensors in Qubes 4.1

I wanted to monitor the thermal info to see if my CPU cooling works well or if there is a problem and my CPU is overheating.
But sensors can’t find the coretemp device and don’t show CPU temperature.
I’ve searched for this problem and found that in Xeb 4.8 it was possible for linux driver to get the thermal information but with this commit:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=72e038450d3d5de1a39f0cfa2d2b0f9b3d43c6c6
The thermal/performance leaf from CPUID was hidden from guests.
See related discussions:
https://xen.markmail.org/message/zmm4ug2ivp6y3uxo
https://xen.markmail.org/message/lufcl76bpbdnewpb
https://xen.markmail.org/thread/7dsviqxnchiyt456

Even without linux driver it is still possible to get the termal information by reading MSR in Xen up to version 4.14 like this:
0x1a2 - address of MSR_TEMPERATURE_TARGET and bits 24:16 is a value of target temperature. For my CPU it’s 100.
0x1b1 - address of IA32_PACKAGE_THERM_STATUS and bits 22:16 is a target temperature minus current temperature value.
So this command gives me the current temperature value:
echo $(($(eval “rdmsr -f 24:16 -u 0x1a2”)-$(eval “rdmsr -p 0 -f 22:16 -u 0x1b1”)))

But in Xen 4.15 this was restricted as well:
https://xen.markmail.org/message/7rjl55ag5bosadvg
So before Xen devs make some interface to get thermal info from guests it will be impossible to see the CPU temperature starting with Xen 4.15.

I just wanted to share this info for those who will look for why they can’t see the CPU thermal info with sensors.

3 Likes

Thanks for sharing!

Here is another quick method:

cat /proc/acpi/<something>/thermal

For me, the something is ibm, for you it might be something else.

You can find the path with

sudo find /proc -name "thermal"

I don’t have anything in /proc but I have ACPI sensor in /sys and I can see it with sensors as well:

[root@dom0 user]# cat /sys/class/thermal/thermal_zone0/hwmon0/name 
acpitz
[root@dom0 user]# cat /sys/class/thermal/thermal_zone0/hwmon0/temp1_input 
27800
[root@dom0 user]# sensors
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +105.0°C)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +25.9°C  (low  = -20.1°C, high = +89.8°C)
                       (crit = +94.8°C)

But this is not a CPU temperature but a temperature sensor on the motherboard near CPU socket.

1 Like

Running sensors-detect might help

I have MSI MAG Z690 Tomahawk motherboard and Intel i9-12900k.
I’ve run sensors-detect again and this time I’ve noticed that I do have some unknown sensors in the output.
Found unknown chip with ID 0xd592
So after searching a bit I found out that there is a NCT6686D hardware monitoring IC on the motherboard and it’s missing driver.
I’ve found the driver:

And tried to install it:

sudo qubes-dom0-update make automake gcc gcc-c++ kernel-latest-devel dkms
make dkms/install

But got this error because of some extra quotes:

make -C /lib/modules/5.17.5-1.fc32.qubes.x86_64/build M=/home/user/Downloads/nct6687d-main2/5.17.5-1.fc32.qubes.x86_64 modules
make[1]: Entering directory '/usr/src/kernels/5.17.5-1.fc32.qubes.x86_64'
/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `if [ "gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)" != ""gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)"" ]; then \'
make[1]: *** [Makefile:1717: prepare] Error 1
make[1]: Leaving directory '/usr/src/kernels/5.17.5-1.fc32.qubes.x86_64'
make: *** [Makefile:11: build] Error 2

And fixed it for now with patching kernel headers:

sudo sed -i "s/CONFIG_CC_VERSION_TEXT=\"gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)\"/CONFIG_CC_VERSION_TEXT='gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1)'/g" /usr/src/kernels/5.17.5-1.fc32.qubes.x86_64/include/config/auto.conf

Then kernel module installed successfully.
Now I can see some hardware monitoring info from motherboard:

nct6687-isa-0a20
Adapter: ISA adapter
+12V:           12.22 V  (min = +12.22 V, max = +12.22 V)
+5V:             5.01 V  (min =  +5.01 V, max =  +5.01 V)
+3.3V:           3.36 V  (min =  +0.00 V, max =  +3.36 V)
CPU Soc:         1.29 V  (min =  +1.29 V, max =  +1.35 V)
CPU Vcore:     678.00 mV (min =  +0.68 V, max =  +0.68 V)
CPU 1P8:         0.00 V  (min =  +0.00 V, max =  +0.00 V)
CPU VDDP:        0.00 V  (min =  +0.00 V, max =  +0.00 V)
DRAM:            2.55 V  (min =  +2.55 V, max =  +2.72 V)
Chipset:         1.35 V  (min =  +1.35 V, max =  +1.35 V)
CPU Fan:          0 RPM  (min =    0 RPM, max =    0 RPM)
Pump Fan:         0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #1:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #2:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #3:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #4:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #5:    0 RPM  (min =    0 RPM, max =    0 RPM)
System Fan #6:    0 RPM  (min =    0 RPM, max =    0 RPM)
CPU:            +33.0°C  (low  = +33.0°C, high = +42.0°C)
System:         +34.0°C  (low  = +34.0°C, high = +35.0°C)
VRM MOS:        +42.0°C  (low  = +42.0°C, high = +42.0°C)
PCH:            +51.0°C  (low  = +51.0°C, high = +51.0°C)
CPU Socket:     +34.0°C  (low  = +34.0°C, high = +34.0°C)
PCIe x1:         +8.0°C  (low  =  +8.0°C, high =  +8.0°C)
M2_1:           +31.0°C  (low  = +31.0°C, high = +31.0°C)

I guess CPU temperature is CPU package temperature here.

I also have some unknown SMBus device in sensors-detect
Found unknown SMBus adapter 8086:7aa3 at 0000:00:1f.4.
It seems to be Alder Lake-S PCH SMBus Controller and I can see that it’s driver is loaded:

[user@dom0 ~]$ lsmod | grep i2c_i801
i2c_i801               36864  0
i2c_smbus              20480  1 i2c_i801

But the coretemp driver can’t be loaded as it can’t find the device:

[user@dom0 ~]$ lsmod | grep coretemp
[user@dom0 ~]$ sudo modprobe coretemp
modprobe: ERROR: could not insert 'coretemp': No such device

I’ve found this page where it says:

For my system coretemp and i2c-i801 driver need to loaded in order to see sensors data.

https://www.cyberciti.biz/faq/howto-linux-get-sensors-information/

So it seems to me that coretemp can’t find the device because Xen hid it.

Also acpitz sensor seems to be a dummy one as its temperature is not changing.

2 Likes