NovaCustom NV54 Laptop no dGPU

FinBob · January 17, 2025, 5:44pm

I’ve switched from a NV54 with dGPU (see this thread) to one without the dGPU, since I don’t really need it and it’s expensive.
Thanks to @novacustom who facilitated that switch and was friendly and very responsive!

This thread is a continuation of my efforts to get the laptop working flawlessly with qubes.

Hardware:

Intel® Core™ Ultra 7 processor 155H
Display: 2880x1800 @ 120 Hz
Intel Corporation Wi-Fi 7(802.11be) AX1775*/AX1790*/BE20*/BE401/BE1750* 2x2 (rev 1a)
96 GB DDR5 RAM

BIOS Settings:

Secure Boot disabled
Intel ME disabled (HPA)
Everything else stock. Coreboot doesn’t have many settings though. Latest version 0.9.0 07/17/2024

Working out of the box:

Suspend / Resume with systemctl suspend (awesome!). However sometimes the fan keeps spinning while in suspend. Will investigate later.
LAN
Sound output & Mic
Keyboard, Touchpad
External 5k Monitor via USB-C (including charging the laptop, monitor USB hub, 2.5 Gbps LAN etc)
Webcam

Note: You need to run kernel-latest (6.12.5 currently) for the integrated graphics to work.

Problems yet unfixed:

~~Only 16 of the 22 CPU threads are detected~~
Solved: Qubes disables SMT for security reasons and 16 is indeed the real number of cores
Lots of errors in the kernel boot log (see next post)
The internal screen is too small to read comfortably at its native resolution. Workarounds with DPI not suited for me since I mostly work docked into a external Monitor with normal DPI. Running the internal panel at 1680x1050@60Hz for now.
Battery: When plugged in the battery oscillates between 93 and 98% (the latter is the max charge setting in the BIOS). I got a feeling that when connected to the power supply it still goes through the battery, which I guess is bad. Haven’t investigated too much yet.

FinBob · January 17, 2025, 5:47pm

I see quite a few errors in the dom0 kernel log:

Error #1:

[    1.030499] CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 20 != 10
[    1.030500] CPU topo: [Firmware Bug]: APIC enumeration order not specification compliant
[    1.030517] CPU topo: CPU limit of 16 reached. Ignoring further CPUs
...
[    1.030658] CPU topo: Max. logical packages:   1
[    1.030659] CPU topo: Max. logical dies:       1
[    1.030659] CPU topo: Max. dies per package:   1
[    1.030664] CPU topo: Max. threads per core:   2
[    1.030665] CPU topo: Num. cores per package:    10
[    1.030666] CPU topo: Num. threads per package:  16
[    1.030666] CPU topo: Allowing 16 present CPUs plus 0 hotplug CPUs
[    1.030667] CPU topo: Rejected CPUs 6

The CPU has 22 threads but in dom0 I only see 16 and 6 have explicitly been rejected.
xl_info output seems to confirm the problem:

nr_cpus                : 16
max_cpu_id             : 21
nr_nodes               : 1
cores_per_socket       : 1
threads_per_core       : 16

Error #2:

[    1.531804] ACPI BIOS Warning (bug): Incorrect checksum in table [BGRT] - 0x5D, should be 0x11 (20240827/utcksum-58)

I guess that’s harmless

Error #3:

[    1.531982] [Firmware Bug]: CPU   0: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0020
[    0.008760] [Firmware Bug]: CPU   1: APIC ID mismatch. CPUID: 0x0001 APIC: 0x0021
[    0.008760] [Firmware Bug]: CPU   1: APIC ID mismatch. Firmware: 0x0010 APIC: 0x0021
[    1.558163] cpu 1 spinlock event irq 171
[    0.008760] [Firmware Bug]: CPU   3: APIC ID mismatch. CPUID: 0x0003 APIC: 0x0029
[    0.008760] [Firmware Bug]: CPU   3: APIC ID mismatch. Firmware: 0x0018 APIC: 0x0029
(lots more of there, timestamps are really non in chronological error)
[    0.008760] [Firmware Bug]: CPU  11: APIC ID mismatch. CPUID: 0x000b APIC: 0x0006
[    0.008760] [Firmware Bug]: CPU  11: APIC ID mismatch. Firmware: 0x0039 APIC: 0x0006
[    1.563025] cpu 11 spinlock event irq 215
[    1.563945] smp: Brought up 1 node, 16 CPUs

Mabe related to the kernel not recognizing all CPUs?

Error #4: Missing vpu Firmware

[    4.605400] intel_vpu 0000:00:0b.0: [drm] *ERROR* ivpu_fw_request(): Failed to request firmware: -2
[    4.606833] intel_vpu 0000:00:0b.0: [drm] ivpu_hw_power_down(): NPU not idle during power down
[    4.608200] intel_vpu 0000:00:0b.0: probe with driver intel_vpu failed with error -2

The Vision Processing Unit on the CPU needs firmware that seems to exist but isn’t available in fedora yet. I guess there is no downside of it since I don’t need it.

Error #5: NVMe failed to allocate host memory buffer

[    4.617487] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    4.617500] nvme 0000:02:00.0: platform quirk: setting simple suspend
[    4.618617] nvme nvme0: pci function 0000:01:00.0
[    4.621010] nvme nvme1: pci function 0000:02:00.0
[    4.628239] nvme nvme0: D3 entry latency set to 10 seconds
[    4.634880] nvme nvme0: 16/0/0 default/read/poll queues
[    4.641186] nvme nvme1: failed to allocate host memory buffer.
[    4.651220]  nvme0n1: p1 p2 p3 p4 p5 p6
[    4.716788] nvme nvme1: 16/0/0 default/read/poll queues

This happens only on one of the 2 NVMes. It’s the brand new one that came with the laptop and hasn’t even been formatted yet. Qubes is running on nvme0 which is an old Samsung 990 Pro I transferred from another machine and doesn’t have this error.
Error seems harmless and at worst performance-degrading.

Error #6:

[   14.876440] proc_thermal_pci 0000:00:04.0: error: proc_thermal_add, will continue

The device in question seems to be the Meteor Lake-P Dynamic Tuning Technology and should be supported from kernel 5.19 onwards:

00:04.0 Signal processing controller: Intel Corporation Device 7d03 (rev 04)

Error #7: DDR5 temperature sensor fail after resuming from S3

[ 1621.796873] spd5118 16-0050: Failed to write b = 0: -6
[ 1621.796889] spd5118 16-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.796962] spd5118 16-0050: PM: failed to resume async: error -6
[ 1621.797139] spd5118 16-0052: Failed to write b = 0: -6
[ 1621.797148] spd5118 16-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.797170] spd5118 16-0052: PM: failed to resume async: error -6

Coreboot Issue Tracker:
I’ve commented with these errors on this coreboot issue.

FinBob · January 17, 2025, 8:38pm

Oh and that one is annoying:

When working undocked (on the internal panel) with 1680x1050@60Hz every ~30 minutes the whole system freezes for 10 seconds.

[ 8159.750028] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
[ 8169.990070] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 8169.990082] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] commit wait timed out

I’m not sure if it happens on the native resolution as well.

marmarek · January 18, 2025, 2:29am

This is intended, due to disabled hyper-threading.

BGRT table is not supported yet, and this is intentional.

This I think is a Xen issue that is fixed in newer version, but should be harmless.

Correct. And even with the firmware, it wouldn’t be very useful in dom0 anyway…

This I haven’t seen before, can you open separate issue on github about it?

FinBob:

Error #7: DDR5 temperature sensor fail after resuming from S3

[ 1621.796873] spd5118 16-0050: Failed to write b = 0: -6
[ 1621.796889] spd5118 16-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.796962] spd5118 16-0050: PM: failed to resume async: error -6
[ 1621.797139] spd5118 16-0052: Failed to write b = 0: -6
[ 1621.797148] spd5118 16-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.797170] spd5118 16-0052: PM: failed to resume async: error -6

This is worth separate tracking. At this point I don’t know if it’s Xen issue or maybe Coreboot (Dasharo) one.

FinBob:

[ 8159.750028] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
[ 8169.990070] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 8169.990082] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] commit wait timed out

This is likely related to lower resolution. I’ve seen similar (but not identical) messages with much lower resolution, but without freezes. And issue is gone with native resolution.
You can also try disable PageFlip feature by adding xorg conf snippet in /etc/X11/xorg.conf.d/10-page-flip.conf:

Section "OutputClass"
  Identifier "GPU"
  Driver "modesetting"
  Option "PageFlip" "off"
EndSection

Let me know if that helps (and whether you observe any side effects).

FinBob · January 18, 2025, 8:25am

Appreciate your replies!

Yeah, seems like the CPU is not yet fully “supported”, see this message in xl dmesg:

(XEN) Unrecognised CPU model 0xaa - assuming vulnerable to LazyFPU

github.com/QubesOS/qubes-issues

nvme1 failed to allocate host memory buffer (NovaCustom NV54 laptop)

opened 07:56AM - 18 Jan 25 UTC

lfinbob

P: default

### Qubes OS release Qubes OS 4.2 (latest stable release) ### Brief summary D…evice: NV54 Laptop from NovaCustom nvme0: Samsung SSD 990 4TB - added by me in the free M.2 slot nvme1: SSDPR-PX770-01T-80 1TB - came pre-installed with the laptop in the M.2 slot nvme1 has this error in the kernel log. It happens once on every boot and on every resume from S3. [ 4.617487] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 4.617500] nvme 0000:02:00.0: platform quirk: setting simple suspend [ 4.618617] nvme nvme0: pci function 0000:01:00.0 [ 4.621010] nvme nvme1: pci function 0000:02:00.0 [ 4.628239] nvme nvme0: D3 entry latency set to 10 seconds [ 4.634880] nvme nvme0: 16/0/0 default/read/poll queues **[ 4.641186] nvme nvme1: failed to allocate host memory buffer.** [ 4.651220] nvme0n1: p1 p2 p3 p4 p5 p6 [ 4.716788] nvme nvme1: 16/0/0 default/read/poll queues Qubes is running off nvme1 and nvme0 is yet unused. I experience no actual problems, only saw the error in the kernel log. dd indicates it's working fine, at reasonable speed: ``` sudo dd if=/dev/zero of=/dev/nvme1n1 bs=1G count=4 oflag=direct 4+0 records in 4+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 1.90272 s, 2.3 GB/s sudo dd if=/dev/nvme1n1 of=/dev/null bs=1G count=4 4+0 records in 4+0 records out 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.91503 s, 1.5 GB/s ``` Qubes Forum post about that laptop: https://forum.qubes-os.org/t/novacustom-nv54-laptop-no-dgpu/31574/2 ### Steps to reproduce 1. Boot Qubes 2. journalctl -b | grep nvme ### Additional information * `Linux version 6.12.5-2.qubes.fc37.x86_64`(from kernel-latest package)

While writing the issue I figured out that the DDR5 temperature sensors do work, even after these errors.

github.com/QubesOS/qubes-issues

DDR5 temperature sensor (spd5118) fails after resuming from S3 (NovaCustom NV54 laptop)

opened 08:14AM - 18 Jan 25 UTC

lfinbob

P: default

### Qubes OS release Qubes OS 4.2 (latest stable release) ### Brief summary D…evice: NV54 Laptop from NovaCustom Firmware: `coreboot: 0.9.0 07/17/2024` (the latest for this laptop) kernel: `6.12.5-2.qubes.fc37.x86_64` Error in kernel log: ``` [ 18.573184] spd5118 16-0050: DDR5 temperature sensor: vendor 0x0e:0x8f revision 1.2 [ 18.583174] spd5118 16-0052: DDR5 temperature sensor: vendor 0x0e:0x8f revision 1.2 ... (supend and resume) ... [ 1621.796873] spd5118 16-0050: Failed to write b = 0: -6 [ 1621.796889] spd5118 16-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6 [ 1621.796962] spd5118 16-0050: PM: failed to resume async: error -6 [ 1621.797139] spd5118 16-0052: Failed to write b = 0: -6 [ 1621.797148] spd5118 16-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6 [ 1621.797170] spd5118 16-0052: PM: failed to resume async: error -6 ``` It happens only after resuming from S3 (so far at least). Reading the temperature still seems to work after resume. I see values in `/sys/class/hwmon/hwmon{4,5}/temp1_input` that change very slightly every few seconds and seem sensible (around 48 degree Celsius). So very low-prio/minor issue. Detailed sensor readings: ``` # ll /sys/class/hwmon/hwmon4/ lrwxrwxrwx 1 root root 0 Jan 18 09:02 device -> ../../../16-0050 -r--r--r-- 1 root root 4096 Jan 17 18:32 name drwxr-xr-x 2 root root 0 Jan 18 09:02 power lrwxrwxrwx 1 root root 0 Jan 18 09:02 subsystem -> ../../../../../../../class/hwmon -rw-r--r-- 1 root root 4096 Jan 18 09:02 temp1_crit -r--r--r-- 1 root root 4096 Jan 18 09:02 temp1_crit_alarm -rw-r--r-- 1 root root 4096 Jan 18 09:02 temp1_enable -r--r--r-- 1 root root 4096 Jan 18 09:02 temp1_input -rw-r--r-- 1 root root 4096 Jan 18 09:02 temp1_lcrit -r--r--r-- 1 root root 4096 Jan 18 09:02 temp1_lcrit_alarm -rw-r--r-- 1 root root 4096 Jan 18 09:02 temp1_max -r--r--r-- 1 root root 4096 Jan 18 09:02 temp1_max_alarm -rw-r--r-- 1 root root 4096 Jan 18 09:02 temp1_min -r--r--r-- 1 root root 4096 Jan 18 09:02 temp1_min_alarm -rw-r--r-- 1 root root 4096 Jan 18 09:02 uevent # cat /sys/class/hwmon/hwmon4/* cat: /sys/class/hwmon/hwmon4/device: Is a directory spd5118 cat: /sys/class/hwmon/hwmon4/power: Is a directory cat: /sys/class/hwmon/hwmon4/subsystem: Is a directory 85000 0 1 47500 0 0 55000 0 0 0 # cat /sys/class/hwmon/hwmon5/* cat: /sys/class/hwmon/hwmon5/device: Is a directory spd5118 cat: /sys/class/hwmon/hwmon5/power: Is a directory cat: /sys/class/hwmon/hwmon5/subsystem: Is a directory 85000 0 1 48000 0 0 55000 0 0 0 ``` ### Steps to reproduce 1. Boot 2. Suspend to S3 with `systemctl suspend` 3. Resume

marmarek:

This is likely related to lower resolution. I’ve seen similar (but not identical) messages with much lower resolution, but without freezes. And issue is gone with native resolution.
You can also try disable PageFlip feature by adding xorg conf snippet in /etc/X11/xorg.conf.d/10-page-flip.conf:
Section "OutputClass"
  Identifier "GPU"
  Driver "modesetting"
  Option "PageFlip" "off"
EndSection
Let me know if that helps (and whether you observe any side effects).

Testing, will report back later.

marmarek · January 18, 2025, 9:24am

This one is misleading. Fixed model doesn’t exists yet…

absent · January 18, 2025, 10:38am

Thank you for the report! How is the performance, browsing, video? Fan noise under load?

FinBob · January 18, 2025, 5:09pm

Performance:

Feels smooth and fast. It’s an awesome machine for me, running 8-14 Qubes in parallel with browsers, emails and various office-type apps, plus occasionally compiling, copying, imaging external drives, etc.

Browsing foto + video collections on a CIFS share from the NAS with thunar is smooth as well. Playing an mp4 in a maximized vlc window at 1680x1050 occasionally has a tiny short tear/hang that is barely noticeable, but would annoy me when watching a full length video. I’ve set the vlc output driver to x11 and didn’t experiment further.
I really love the 96GB of RAM and got rid of swapping (also I reduced vm.swappiness from 60 to 1 in all qubes).

Only remaining issue: websites with heavy animations and video still feel a little sluggish.
Playing youtube in theater mode on the internal display is okay-ish, but fast moving scenes feel a little sluggish (slightly more than VLC). It would be annoying to watch for longer. Youtube stats for nerds shows 1280x720@50. The qube has 6 vCPUs and 16GB RAM assigned statically.
Even switching to mullvad browser and disabling hw accelaration as suggested in this thread doesn’t improve the experience.
Haven’t watched a movie on an external 4k screen, but guess it’s bad.

Fan:
Noise at full speed is quite good, it’s clearly audible but not noisy.
I use the silent profile which ramps up to 100% fan speed only at higher temps and I rarely hit that level in daily use.
At low speed (20%) you can hear the fan in a silent room.

Performance:
I have a qube that CPU mines Monero whenever I have solar surplus. I get ~4200H/s sustained and ~4500H/s peak using all 16 cores and can still use the system.

CPU frequencies:
I have yet to figure out how to watch CPU frequencies of the 6 performance cores, 8 efficient cores and 2 ultra-efficient cores live. Curious how they clock with and without thermal headroom.
Tools like htop in dom0 show a static frequency of 2995 MHz.

The output of xenpm get-cpufreq-para shows that xen seems to know the various cores’ base and turbo frequency correctly, AFAICS:

The output of xenpm get-cpufreq-states says all cores are spending all their time at P0 at 1401 MHz (highest state) - which can’t be true?!

Happy for any tipps how to check this.

Onto the next issue:

marmarek:

You can also try disable PageFlip feature by adding xorg conf snippet in /etc/X11/xorg.conf.d/10-page-flip.conf
Section "Device"
  Identifier "GPU"
  Driver "modesetting"
  Option "PageFlip" "off"
EndSection
Let me know if that helps (and whether you observe any side effects).

No more hangs, awesome, ty! No side effects (like tearing) so far.
For future readers, note that I fixed a syntax error in the quoted text by changing the Section from “OutputClass” to “Device”. So copy this quote, not the original.

FinBob · January 18, 2025, 5:22pm

Battery time with web surfing seems around 3.5 hours. As all Qubes users know this depends heavily on which tabs are open, as the software rendering can consume a lot of CPU. This was with ~20 tabs and 75% screen brightness.

JocularMarrow · January 18, 2025, 5:43pm

Interesting, mullvad fixed all my issues. Had to try 4k external display with 4k yt video. And I gotta say it runs better then expected, looks damn good and some choppiness (micro stutters) sometimes but very watchable, 3840x2160@60.

tanky0u · January 19, 2025, 12:59pm

That’s cool.

marmarek · January 20, 2025, 1:21pm

@Demi see above about the “PageFlip” option. That’s the second instance (I know of) where setting this option helps when using non-native resolution. What downsides it can have? Any idea why is it necessary? Maybe we should consider setting it by default?

solene · January 20, 2025, 1:41pm

It works fine for me on an i7-1260P in Firefox with 4 vCPU displaying it fullscreen in 1920x1080

Did you enable SMT?

FinBob · January 22, 2025, 9:41pm

Update: I experienced hangs again on the non-native resolution.
X log confirms that the PageFlip option is indeed off. The error message changed slightly:

# grep PageFlip /var/log/Xorg.0.log
[    35.380] (**) modeset(0): Option "PageFlip" "false"

# dmesg | grep drm
[ 3297.023326] i915 0000:00:02.0: [drm] *ERROR* The master control interrupt lied (DE PIPE)!
[ 3307.342322] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
[ 3318.606342] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3318.606361] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] commit wait timed out

The freezes are much less frequent, so turning off PageFlip reduced the problem

FinBob · January 22, 2025, 10:14pm

I did just now:

nr_cpus                : 22
max_cpu_id             : 21
nr_nodes               : 1
cores_per_socket       : 11
threads_per_core       : 2

And WOW it helps! I’ve just played the British Grand Prix 2024 highlights and watched it in one go. It was pretty awesome, only a few microglitches.
Laptop on battery consumed 13% battery at 1920x1200@60Hz.
Thanks for the great tipp!

solene · January 22, 2025, 10:17pm

You have to be aware it open new attack surface.

I guess that with twice performance core available (only performance core has hyperthreading available), the system can load balance better between P and E cores, while without SMT, you have more E cores than P cores, so the system is slowed down.

It’s possible to pin P cores to a qube, but it’s not super practical… Unfortunately, Xen is dumb and can’t tell the difference between E and P cores when a qube is needing CPU, it just assign the task to a random one.

FinBob · January 22, 2025, 10:23pm

Makes a lot of sense! It’s been a noticable improvement the ~30 minutes undocked on battery, feeling smoother under heavy load such as rendering a youtube home page with lots of thumbnail videos.

It makes total sense that smt=on skews towards the power cores, which explains what I feel.

@novacustom you guys could make this config (smt=on in /etc/default/grub) an option in your awesome configurator.
It improves performance a lot.

While I have you here, is it normal that docked with USB-C to a monitor with 90W PD there is a toast every ~15 minutes saying the battery is charging, done charging, discharging, charging … ?
Does it improve or worsen battery use (life expectancy)?

Battery is within the limits set in the firmware. I think it’s 92…98% or something, probably the inbuilt powermanagement?

FinBob · January 22, 2025, 10:24pm

I’m aware but I concluded it’s basically a non-factor in my attack surface

solene · January 22, 2025, 10:24pm

Thinking about this, Qubes OS may reset smt=on sometimes by forcing smt=off in the command line, this is something to keep an eye on. There is a fix to make it absolutely permanent but I can’t remember where it was explained

novacustom · January 23, 2025, 2:45pm

Qubes OS certification requires us to not do any modifications, I hope you can understand. I recommend to raise an issue on Github to discuss the performance difference further with Qubes developers.

I just raised a bug report (Dasharo), it would be helpful if you could confirm the issue there.