NovaCustom NV54 Laptop no dGPU

I’ve switched from a NV54 with dGPU (see this thread) to one without the dGPU, since I don’t really need it and it’s expensive.
Thanks to @novacustom who facilitated that switch and was friendly and very responsive!

This thread is a continuation of my efforts to get the laptop working flawlessly with qubes.

Hardware:

  • Intel® Core™ Ultra 7 processor 155H
  • Display: 2880x1800 @ 120 Hz
  • Intel Corporation Wi-Fi 7(802.11be) AX1775*/AX1790*/BE20*/BE401/BE1750* 2x2 (rev 1a)
  • 96 GB DDR5 RAM

BIOS Settings:

  • Secure Boot disabled
  • Intel ME disabled (HPA)
  • Everything else stock. Coreboot doesn’t have many settings though. Latest version 0.9.0 07/17/2024

Working out of the box:

  • Suspend / Resume with systemctl suspend (awesome!). However sometimes the fan keeps spinning while in suspend. Will investigate later.
  • LAN
  • Sound output & Mic
  • Keyboard, Touchpad
  • External 5k Monitor via USB-C (including charging the laptop, monitor USB hub, 2.5 Gbps LAN etc)
  • Webcam

Note: You need to run kernel-latest (6.12.5 currently) for the integrated graphics to work.

Problems yet unfixed:

  • Only 16 of the 22 CPU threads are detected
    Solved: Qubes disables SMT for security reasons and 16 is indeed the real number of cores

  • Lots of errors in the kernel boot log (see next post)

  • The internal screen is too small to read comfortably at its native resolution. Workarounds with DPI not suited for me since I mostly work docked into a external Monitor with normal DPI. Running the internal panel at 1680x1050@60Hz for now.

  • Battery: When plugged in the battery oscillates between 93 and 98% (the latter is the max charge setting in the BIOS). I got a feeling that when connected to the power supply it still goes through the battery, which I guess is bad. Haven’t investigated too much yet.

2 Likes

I see quite a few errors in the dom0 kernel log:

Error #1:

[    1.030499] CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 20 != 10
[    1.030500] CPU topo: [Firmware Bug]: APIC enumeration order not specification compliant
[    1.030517] CPU topo: CPU limit of 16 reached. Ignoring further CPUs
...
[    1.030658] CPU topo: Max. logical packages:   1
[    1.030659] CPU topo: Max. logical dies:       1
[    1.030659] CPU topo: Max. dies per package:   1
[    1.030664] CPU topo: Max. threads per core:   2
[    1.030665] CPU topo: Num. cores per package:    10
[    1.030666] CPU topo: Num. threads per package:  16
[    1.030666] CPU topo: Allowing 16 present CPUs plus 0 hotplug CPUs
[    1.030667] CPU topo: Rejected CPUs 6

The CPU has 22 threads but in dom0 I only see 16 and 6 have explicitly been rejected.
xl_info output seems to confirm the problem:

nr_cpus                : 16
max_cpu_id             : 21
nr_nodes               : 1
cores_per_socket       : 1
threads_per_core       : 16

Error #2:

[    1.531804] ACPI BIOS Warning (bug): Incorrect checksum in table [BGRT] - 0x5D, should be 0x11 (20240827/utcksum-58)

I guess that’s harmless

Error #3:

[    1.531982] [Firmware Bug]: CPU   0: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0020
[    0.008760] [Firmware Bug]: CPU   1: APIC ID mismatch. CPUID: 0x0001 APIC: 0x0021
[    0.008760] [Firmware Bug]: CPU   1: APIC ID mismatch. Firmware: 0x0010 APIC: 0x0021
[    1.558163] cpu 1 spinlock event irq 171
[    0.008760] [Firmware Bug]: CPU   3: APIC ID mismatch. CPUID: 0x0003 APIC: 0x0029
[    0.008760] [Firmware Bug]: CPU   3: APIC ID mismatch. Firmware: 0x0018 APIC: 0x0029
(lots more of there, timestamps are really non in chronological error)
[    0.008760] [Firmware Bug]: CPU  11: APIC ID mismatch. CPUID: 0x000b APIC: 0x0006
[    0.008760] [Firmware Bug]: CPU  11: APIC ID mismatch. Firmware: 0x0039 APIC: 0x0006
[    1.563025] cpu 11 spinlock event irq 215
[    1.563945] smp: Brought up 1 node, 16 CPUs

Mabe related to the kernel not recognizing all CPUs?

Error #4: Missing vpu Firmware

[    4.605400] intel_vpu 0000:00:0b.0: [drm] *ERROR* ivpu_fw_request(): Failed to request firmware: -2
[    4.606833] intel_vpu 0000:00:0b.0: [drm] ivpu_hw_power_down(): NPU not idle during power down
[    4.608200] intel_vpu 0000:00:0b.0: probe with driver intel_vpu failed with error -2

The Vision Processing Unit on the CPU needs firmware that seems to exist but isn’t available in fedora yet. I guess there is no downside of it since I don’t need it.

Error #5: NVMe failed to allocate host memory buffer

[    4.617487] nvme 0000:01:00.0: platform quirk: setting simple suspend
[    4.617500] nvme 0000:02:00.0: platform quirk: setting simple suspend
[    4.618617] nvme nvme0: pci function 0000:01:00.0
[    4.621010] nvme nvme1: pci function 0000:02:00.0
[    4.628239] nvme nvme0: D3 entry latency set to 10 seconds
[    4.634880] nvme nvme0: 16/0/0 default/read/poll queues
[    4.641186] nvme nvme1: failed to allocate host memory buffer.
[    4.651220]  nvme0n1: p1 p2 p3 p4 p5 p6
[    4.716788] nvme nvme1: 16/0/0 default/read/poll queues

This happens only on one of the 2 NVMes. It’s the brand new one that came with the laptop and hasn’t even been formatted yet. Qubes is running on nvme0 which is an old Samsung 990 Pro I transferred from another machine and doesn’t have this error.
Error seems harmless and at worst performance-degrading.

Error #6:

[   14.876440] proc_thermal_pci 0000:00:04.0: error: proc_thermal_add, will continue

The device in question seems to be the Meteor Lake-P Dynamic Tuning Technology and should be supported from kernel 5.19 onwards:

00:04.0 Signal processing controller: Intel Corporation Device 7d03 (rev 04)

Error #7: DDR5 temperature sensor fail after resuming from S3

[ 1621.796873] spd5118 16-0050: Failed to write b = 0: -6
[ 1621.796889] spd5118 16-0050: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.796962] spd5118 16-0050: PM: failed to resume async: error -6
[ 1621.797139] spd5118 16-0052: Failed to write b = 0: -6
[ 1621.797148] spd5118 16-0052: PM: dpm_run_callback(): spd5118_resume [spd5118] returns -6
[ 1621.797170] spd5118 16-0052: PM: failed to resume async: error -6

Coreboot Issue Tracker:
I’ve commented with these errors on this coreboot issue.

2 Likes

Oh and that one is annoying:

When working undocked (on the internal panel) with 1680x1050@60Hz every ~30 minutes the whole system freezes for 10 seconds.

[ 8159.750028] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
[ 8169.990070] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 8169.990082] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] commit wait timed out

I’m not sure if it happens on the native resolution as well.

1 Like

This is intended, due to disabled hyper-threading.

BGRT table is not supported yet, and this is intentional.

This I think is a Xen issue that is fixed in newer version, but should be harmless.

Correct. And even with the firmware, it wouldn’t be very useful in dom0 anyway…

This I haven’t seen before, can you open separate issue on github about it?

This is worth separate tracking. At this point I don’t know if it’s Xen issue or maybe Coreboot (Dasharo) one.

This is likely related to lower resolution. I’ve seen similar (but not identical) messages with much lower resolution, but without freezes. And issue is gone with native resolution.
You can also try disable PageFlip feature by adding xorg conf snippet in /etc/X11/xorg.conf.d/10-page-flip.conf:

Section "OutputClass"
  Identifier "GPU"
  Driver "modesetting"
  Option "PageFlip" "off"
EndSection

Let me know if that helps (and whether you observe any side effects).

4 Likes

Appreciate your replies!

Yeah, seems like the CPU is not yet fully “supported”, see this message in xl dmesg:

(XEN) Unrecognised CPU model 0xaa - assuming vulnerable to LazyFPU

While writing the issue I figured out that the DDR5 temperature sensors do work, even after these errors.

Testing, will report back later.

2 Likes

This one is misleading. Fixed model doesn’t exists yet…

3 Likes

Thank you for the report! How is the performance, browsing, video? Fan noise under load?

4 Likes

Performance:

Feels smooth and fast. It’s an awesome machine for me, running 8-14 Qubes in parallel with browsers, emails and various office-type apps, plus occasionally compiling, copying, imaging external drives, etc.

Browsing foto + video collections on a CIFS share from the NAS with thunar is smooth as well. Playing an mp4 in a maximized vlc window at 1680x1050 occasionally has a tiny short tear/hang that is barely noticeable, but would annoy me when watching a full length video. I’ve set the vlc output driver to x11 and didn’t experiment further.
I really love the 96GB of RAM and got rid of swapping (also I reduced vm.swappiness from 60 to 1 in all qubes).

Only remaining issue: websites with heavy animations and video still feel a little sluggish.
Playing youtube in theater mode on the internal display is okay-ish, but fast moving scenes feel a little sluggish (slightly more than VLC). It would be annoying to watch for longer. Youtube stats for nerds shows 1280x720@50. The qube has 6 vCPUs and 16GB RAM assigned statically.
Even switching to mullvad browser and disabling hw accelaration as suggested in this thread doesn’t improve the experience.
Haven’t watched a movie on an external 4k screen, but guess it’s bad.

Fan:
Noise at full speed is quite good, it’s clearly audible but not noisy.
I use the silent profile which ramps up to 100% fan speed only at higher temps and I rarely hit that level in daily use.
At low speed (20%) you can hear the fan in a silent room.

Performance:
I have a qube that CPU mines Monero whenever I have solar surplus. I get ~4200H/s sustained and ~4500H/s peak using all 16 cores and can still use the system.

CPU frequencies:
I have yet to figure out how to watch CPU frequencies of the 6 performance cores, 8 efficient cores and 2 ultra-efficient cores live. Curious how they clock with and without thermal headroom.
Tools like htop in dom0 show a static frequency of 2995 MHz.

The output of xenpm get-cpufreq-para shows that xen seems to know the various cores’ base and turbo frequency correctly, AFAICS:

The output of xenpm get-cpufreq-states says all cores are spending all their time at P0 at 1401 MHz (highest state) - which can’t be true?!

Happy for any tipps how to check this.

Onto the next issue:

No more hangs, awesome, ty! No side effects (like tearing) so far.
For future readers, note that I fixed a syntax error in the quoted text by changing the Section from “OutputClass” to “Device”. So copy this quote, not the original.

4 Likes

Battery time with web surfing seems around 3.5 hours. As all Qubes users know this depends heavily on which tabs are open, as the software rendering can consume a lot of CPU. This was with ~20 tabs and 75% screen brightness.

1 Like

Interesting, mullvad fixed all my issues. Had to try 4k external display with 4k yt video. And I gotta say it runs better then expected, looks damn good and some choppiness (micro stutters) sometimes but very watchable, 3840x2160@60.

3 Likes

That’s cool.

2 Likes

@Demi see above about the “PageFlip” option. That’s the second instance (I know of) where setting this option helps when using non-native resolution. What downsides it can have? Any idea why is it necessary? Maybe we should consider setting it by default?

2 Likes

It works fine for me on an i7-1260P in Firefox with 4 vCPU displaying it fullscreen in 1920x1080

Did you enable SMT?

2 Likes

Update: I experienced hangs again on the non-native resolution.
X log confirms that the PageFlip option is indeed off. The error message changed slightly:

# grep PageFlip /var/log/Xorg.0.log
[    35.380] (**) modeset(0): Option "PageFlip" "false"

# dmesg | grep drm
[ 3297.023326] i915 0000:00:02.0: [drm] *ERROR* The master control interrupt lied (DE PIPE)!
[ 3307.342322] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
[ 3318.606342] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[ 3318.606361] i915 0000:00:02.0: [drm] *ERROR* [CRTC:82:pipe A] commit wait timed out

The freezes are much less frequent, so turning off PageFlip reduced the problem

1 Like

I did just now:

nr_cpus                : 22
max_cpu_id             : 21
nr_nodes               : 1
cores_per_socket       : 11
threads_per_core       : 2

And WOW it helps! I’ve just played the British Grand Prix 2024 highlights and watched it in one go. It was pretty awesome, only a few microglitches.
Laptop on battery consumed 13% battery at 1920x1200@60Hz.
Thanks for the great tipp!

1 Like

You have to be aware it open new attack surface.

I guess that with twice performance core available (only performance core has hyperthreading available), the system can load balance better between P and E cores, while without SMT, you have more E cores than P cores, so the system is slowed down.

It’s possible to pin P cores to a qube, but it’s not super practical… Unfortunately, Xen is dumb and can’t tell the difference between E and P cores when a qube is needing CPU, it just assign the task to a random one.

2 Likes

Makes a lot of sense! It’s been a noticable improvement the ~30 minutes undocked on battery, feeling smoother under heavy load such as rendering a youtube home page with lots of thumbnail videos.

It makes total sense that smt=on skews towards the power cores, which explains what I feel.

@novacustom you guys could make this config (smt=on in /etc/default/grub) an option in your awesome configurator.
It improves performance a lot.

While I have you here, is it normal that docked with USB-C to a monitor with 90W PD there is a toast every ~15 minutes saying the battery is charging, done charging, discharging, charging … ?
Does it improve or worsen battery use (life expectancy)?

Battery is within the limits set in the firmware. I think it’s 92…98% or something, probably the inbuilt powermanagement?

1 Like

I’m aware but I concluded it’s basically a non-factor in my attack surface

2 Likes

Thinking about this, Qubes OS may reset smt=on sometimes by forcing smt=off in the command line, this is something to keep an eye on. There is a fix to make it absolutely permanent but I can’t remember where it was explained :frowning:

2 Likes

Qubes OS certification requires us to not do any modifications, I hope you can understand. I recommend to raise an issue on Github to discuss the performance difference further with Qubes developers.

I just raised a bug report (Dasharo), it would be helpful if you could confirm the issue there.

3 Likes