[qubes-users] Help me test fixes for Intel IGD graphical artifacts on Qubes R4.0

When using some Intel integrated graphic cards on Qubes R4.0, screen
glitches may manifest after switching VTs or entering suspend mode.

A known workaround does exist for this bug, which is to add a
configuration file with the following contents within
/etc/X11/xorg.conf.d:

Section "Device"
  Identifier "Intel Graphics"
  Driver "modesetting"
  Option "AccelMethod" "glamor"
  Option "DRI" "3"
EndSection

However, the X11 modesetting driver version in Fedora 25 has its own
drawbacks:

* It freezes briefly when re-configuring monitors (e.g. plugging in an
  external monitor or changing screen resolution)
* XRandR keystone support is buggy

To remediate this, I've patched the Linux i915 driver and it has been
working fine for months. Only the patch for Linux 4.19 has been tested.

If anyone is affected by the issue, please feel free to test the
follow-up patches and give some feedback here.

See also:

* https://github.com/QubesOS/qubes-issues/issues/5244
* https://github.com/QubesOS/qubes-issues/issues/5377
* https://github.com/QubesOS/qubes-issues/issues/5460

Cheers,
Jinoh Kang

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

If GUP-ineligible pages are passed to a GEM userptr object, -EFAULT is
returned only when the object is actually bound.

The xf86-video-intel userspace driver cannot differentiate this
condition, and marks the GPU as wedged. This disables graphics
acceleration and may cripple other functionalities such as VT switch.

Solve this by "prefaulting" user pages on GEM object creation, testing
whether all pages are eligible for get_user_pages() in the process.
On failure, return -EFAULT so that userspace can fallback to software
bliting.

This behavior can be controlled by using a new modparam
"gem_userptr_prefault", which is true by default.

The only known use for this patch is Qubes OS's GUI virtualization.
Userspace is expected to resort to DMA-BUF whenever possible.

Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>

Thanks for the path. I've tried this one on 5.10.5, and it needs a
couple fixes to build, see below.

But then, it seems to fix the issue with VT switch and disabling
acceleration.

Independently, starting with 5.9.x (without the patch) the system freezes
with specific monitor configuration (two monitors side by side, but one
scaled x1.5). It happens immediately after configuring the second
display (and I still got the message about it in Xorg.0.log). Not
setting scaling on the monitor workarounds the issue.
Switching to modesetting driver fixes it. This issue wasn't here on
5.8.x or earlier and looks to be unrelated to this patch. I'm not sure
whether it's purely kernel driver issue, or some kernel<->Xorg
incompatibility.

---
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 36 +++++++++++++++++++++
drivers/gpu/drm/i915/i915_params.c | 3 ++
drivers/gpu/drm/i915/i915_params.h | 1 +
3 files changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index f2eaed6aca3d..65596d2b284f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -712,6 +712,34 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
   .release = i915_gem_userptr_release,
};

+static int i915_gem_userptr_prefault(unsigned long start,
+ unsigned long nr_pages,
+ bool readonly)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned int gup_flags = (readonly ? 0 : FOLL_WRITE) | FOLL_NOWAIT;
+ int err = 0;
+
+ down_read(&mm->mmap_sem);

mmap_read_lock(mm);

+ while (nr_pages) {
+ long ret;
+
+ ret = get_user_pages(start, nr_pages, gup_flags, NULL, NULL);
+ if (ret < 0) {
+ err = (int)ret;
+ break;
+ }
+ if (ret == 0)
+ ret = 1; /* skip this page */
+
+ start += ret << PAGE_SHIFT;
+ nr_pages -= ret;
+ }
+ up_read(&mm->mmap_sem);

mmap_read_unlock(mm);

+
+ return err;
+}
+
/*
  * Creates a new mm object that wraps some normal memory from the process
  * context - user memory.
@@ -796,6 +824,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
   if (!access_ok((char __user *)(unsigned long)args->user_ptr, args->user_size))
     return -EFAULT;

+ if (READ_ONCE(i915_modparams.gem_userptr_prefault)) {
+ ret = i915_gem_userptr_prefault((unsigned long)args->user_ptr,
+ args->user_size >> PAGE_SHIFT,
+ args->flags & I915_USERPTR_READ_ONLY);
+ if (ret)
+ return ret;
+ }
+
   if (args->flags & I915_USERPTR_READ_ONLY) {
     /*
      * On almost all of the older hw, we cannot tell the GPU that
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 7f139ea4a90b..c39766adeda2 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -197,6 +197,9 @@ i915_param_named_unsafe(fake_lmem_start, ulong, 0400,
   "Fake LMEM start offset (default: 0)");
#endif

+i915_param_named(gem_userptr_prefault, bool, 0600,
+ "Prefault pages when userptr GEM object is created (default:true)");
+
static __always_inline void _print_param(struct drm_printer *p,
            const char *name,
            const char *type,
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 330c03e2b4f7..1169a610a73c 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -79,6 +79,7 @@ struct drm_printer;
   param(bool, disable_display, false, 0400) \
   param(bool, verbose_state_checks, true, 0) \
   param(bool, nuclear_pageflip, false, 0400) \
+ param(bool, gem_userptr_prefault, true) \

param(bool, gem_userptr_prefault, true, 0600)

   param(bool, enable_dp_mst, true, 0600) \
   param(bool, enable_gvt, false, 0400)

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

When using some Intel integrated graphic cards on Qubes R4.0, screen
glitches may manifest after switching VTs or entering suspend mode.

A known workaround does exist for this bug, which is to add a
configuration file with the following contents within
/etc/X11/xorg.conf.d:

> Section "Device"
> Identifier "Intel Graphics"
> Driver "modesetting"
> Option "AccelMethod" "glamor"
> Option "DRI" "3"
> EndSection

However, the X11 modesetting driver version in Fedora 25 has its own
drawbacks:

* It freezes briefly when re-configuring monitors (e.g. plugging in an
  external monitor or changing screen resolution)
* XRandR keystone support is buggy

To remediate this, I've patched the Linux i915 driver and it has been
working fine for months. Only the patch for Linux 4.19 has been tested.

If anyone is affected by the issue, please feel free to test the
follow-up patches and give some feedback here.

So, I can confirm the (fixed) 5.10 patch also improves the situation.
Have you sent it upstream? I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

After all, we're not going to need it with Qubes R4.1.

I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.
I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Cheers,
Jinoh Kang

> So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

> Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

If that bug indeed affects only Qubes OS, there is a greater chance to
accept the patch, if the option defaults to false.

After all, we're not going to need it with Qubes R4.1.

Are you sure? The issue affects dom0 windows, which suggests it still
may be necessary. On the other hand, your patch description suggests
it's just any VM-mapped window triggers the faulty path in the
xf86-video-intel driver, that later affects all of the output.

> I do consider including it in our standard
> kernel package, but I'd like to see i915 driver maintainer opinion
> first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.

Can you send a fixed patch (that builds), rebased on top of recent Linux
(5.11-rc3, or recent 5.10)? I'll re-test and add my Tested-by:.

I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Generally, Linux patches should be sent to whoever MAINTAINERS file
lists, which do include some mailing lists. I highly recommend using
scripts/get_maintainer.pl script for that purpose (if you use git
send-email, that's as easy as --cc-cmd=scripts/get_maintainer.pl).

PS The other (independent) issue I mentioned seems to be
https://bugzilla.suse.com/show_bug.cgi?id=1180543, which is supposed to
be already fixed in >=5.10.6. I've already uploaded 5.10.7, but haven't
tested it on this particular machine yet.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

So, I can confirm the (fixed) 5.10 patch also improves the situation.

Sounds good. Thanks for testing!

Have you sent it upstream?

No, qubes-users and qubes-devel are the only mailing list where I
posted this.

I guess chances of these patches being merged upstream would not be
that great.

If that bug indeed affects only Qubes OS, there is a greater chance to
accept the patch, if the option defaults to false.

After all, we're not going to need it with Qubes R4.1.

Are you sure?

Now that I think about it, I just realised that we are still going to
this patch in Qubes R4.1, while migrating old VMs still using MFNDUMP.

But then, Fedora has moved away from xf86-video-intel to modesetting, so
we aren't going to need this patch either way (unless we are going to
switch to Wayland tomorrow)?

I believe everything else has been covered in qubes-issues#5909 [1].

The issue affects dom0 windows, which suggests it still
may be necessary.

Does it? I personally haven't experienced this case yet.

On the other hand, your patch description suggests
it's just any VM-mapped window triggers the faulty path in the
xf86-video-intel driver, that later affects all of the output.

Yes, but for privcmd-backed pages only. gntdev pages are unaffected.

I do consider including it in our standard
kernel package, but I'd like to see i915 driver maintainer opinion
first.

If you mean you'd prefer to have it upstreamed, I'd appreciate some
Tested-by: and/or Reviewed-by: lines for the trailer from you.

Can you send a fixed patch (that builds), rebased on top of recent Linux
(5.11-rc3, or recent 5.10)? I'll re-test and add my Tested-by:.

OK, I'll send it upstream.

I'm fine as well if you'd rather just submit it yourself.

Otherwise I suppose I shall only CC' the maintainers and not the list?

Generally, Linux patches should be sent to whoever MAINTAINERS file
lists, which do include some mailing lists. I highly recommend using
scripts/get_maintainer.pl script for that purpose (if you use git
send-email, that's as easy as --cc-cmd=scripts/get_maintainer.pl).

PS: due to Google SMTP OAuth2 issues, I have this weird workflow where I
first stage my changes in my local IMAP daemon's mbox and bounce it via
Thunderbird. This is admittedly very annoying, and perhaps I should
tinker with (neo)mutt + gmail-oauth2-tools some time. Or maybe switch
to a "real" mail account. This will also explain my past embedded PGP
signing mistake. Sorry about that...;(

PS The other (independent) issue I mentioned seems to be
https://bugzilla.suse.com/show_bug.cgi?id=1180543, which is supposed to
be already fixed in >=5.10.6. I've already uploaded 5.10.7, but haven't
tested it on this particular machine yet.

Thanks, good to know.

- --

[1] qubes-gui: backport window pixmap grant references to R4.0 · Issue #5909 · QubesOS/qubes-issues · GitHub

- --
Sincerely,
Jinoh Kang

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Something like this. In fact the current (Fedora) package is already
built from git snapshot.
We do backport this package from newer Fedora already:

But I would prefer to get it upstream anyway (and then possibly build
xorg-x11-drv-intel from newer git snapshot).

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...

Something like this. In fact the current (Fedora) package is already
built from git snapshot.

Here's the catch: Fedora hasn't been bumping gitdate for almost a year,
as seen in Pagure [1].

We do backport this package from newer Fedora already:
GitHub - QubesOS/qubes-linux-dom0-updates: Qubes component: linux-dom0-updates

That one from Fedora 28 is a bit behind, too.

But I would prefer to get it upstream anyway (and then possibly build
xorg-x11-drv-intel from newer git snapshot).

Something like this? (haven't built it yet, will fix later)

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 6a35067c..8a7af809 100644
- --- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -7023,6 +7023,8 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
   struct kgem_bo *bo;
   uintptr_t first_page, last_page;
   uint32_t handle;
+ struct drm_i915_gem_set_domain set_domain;
+ bool move_to_gtt = false;

   assert(MAP(ptr) == ptr);

@@ -7043,20 +7045,10 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
            read_only);
   if (handle == 0) {
     if (read_only && kgem->has_wc_mmap) {
- - struct drm_i915_gem_set_domain set_domain;
- -
       handle = gem_userptr(kgem->fd,
                (void *)first_page, last_page-first_page,
                false);
- -
- - VG_CLEAR(set_domain);
- - set_domain.handle = handle;
- - set_domain.read_domains = I915_GEM_DOMAIN_GTT;
- - set_domain.write_domain = 0;
- - if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
- - gem_close(kgem->fd, handle);
- - handle = 0;
- - }
+ move_to_gtt = true;
     }
     if (handle == 0) {
       DBG(("%s: import failed, errno=%d\n", __FUNCTION__, errno));
@@ -7064,6 +7056,21 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
     }
   }

+ VG_CLEAR(set_domain);
+ set_domain.handle = handle;
+ if (move_to_gtt) {
+ set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+ set_domain.write_domain = 0;
+ } else {
+ set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+ set_domain.write_domain = I915_GEM_DOMAIN_CPU;
+ }
+ if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
+ gem_close(kgem->fd, handle);
+ DBG(("%s: set_domain in import failed, errno=%d\n", __FUNCTION__, errno));
+ return NULL;
+ }

>> Is qubes-xorg-x11-drv-intel an option? Upstream hasn't released for years after all...
>
> Something like this. In fact the current (Fedora) package is already
> built from git snapshot.

Here's the catch: Fedora hasn't been bumping gitdate for almost a year,
as seen in Pagure [1].

> We do backport this package from newer Fedora already:
> GitHub - QubesOS/qubes-linux-dom0-updates: Qubes component: linux-dom0-updates

That one from Fedora 28 is a bit behind, too.

>
> But I would prefer to get it upstream anyway (and then possibly build
> xorg-x11-drv-intel from newer git snapshot).

Something like this? (haven't built it yet, will fix later)

I guess, yes.

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 6a35067c..8a7af809 100644
--- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -7023,6 +7023,8 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
   struct kgem_bo *bo;
   uintptr_t first_page, last_page;
   uint32_t handle;
+ struct drm_i915_gem_set_domain set_domain;
+ bool move_to_gtt = false;

   assert(MAP(ptr) == ptr);

@@ -7043,20 +7045,10 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
            read_only);
   if (handle == 0) {
     if (read_only && kgem->has_wc_mmap) {
- struct drm_i915_gem_set_domain set_domain;
-
       handle = gem_userptr(kgem->fd,
                (void *)first_page, last_page-first_page,
                false);
-
- VG_CLEAR(set_domain);
- set_domain.handle = handle;
- set_domain.read_domains = I915_GEM_DOMAIN_GTT;
- set_domain.write_domain = 0;
- if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
- gem_close(kgem->fd, handle);
- handle = 0;
- }
+ move_to_gtt = true;
     }
     if (handle == 0) {
       DBG(("%s: import failed, errno=%d\n", __FUNCTION__, errno));
@@ -7064,6 +7056,21 @@ struct kgem_bo *kgem_create_map(struct kgem *kgem,
     }
   }

+ VG_CLEAR(set_domain);
+ set_domain.handle = handle;
+ if (move_to_gtt) {
+ set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+ set_domain.write_domain = 0;
+ } else {
+ set_domain.read_domains = I915_GEM_DOMAIN_CPU;
+ set_domain.write_domain = I915_GEM_DOMAIN_CPU;
+ }
+ if (do_ioctl(kgem->fd, DRM_IOCTL_I915_GEM_SET_DOMAIN, &set_domain)) {
+ gem_close(kgem->fd, handle);
+ DBG(("%s: set_domain in import failed, errno=%d\n", __FUNCTION__, errno));
+ return NULL;
+ }
+
   bo = __kgem_bo_alloc(handle, (last_page - first_page) / PAGE_SIZE);
   if (bo == NULL) {
     gem_close(kgem->fd, handle);

---

[1] Tree - rpms/xorg-x11-drv-intel - src.fedoraproject.org

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
Q: Why is top-posting such a bad thing?

Hi,

Is this patch included in last 5.10 kernel release? Last time I tried to
boot it I had a lot of graphic artifacts. If the patch is included I
could try again.

Regards.

Hi,

Is this patch included in last 5.10 kernel release? Last time I tried to
boot it I had a lot of graphic artifacts. If the patch is included I
could try again.

No, but this patch has been superseded by the latest intel xorg driver.
To try it, just enable the current-testing repo and upgrade
xorg-x11-drv-intel to at least v2.99.917-49.20210126.

    sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing --action=upgrade xorg-x11-drv-intel

Discussion: Screen only refreshed once mouse is moved · Issue #6356 · QubesOS/qubes-issues · GitHub

Hi,

It seems that I had it already installed with standard dom0 updates.

Since I was on doubt, I started running 5.4.91 and I felt some
strange visual effects compared to 5.4.83 which I was using
previously. Also I had a crash when waking up from suspend (I did not
have one in months or maybe never on this laptop). After the crash I did
three or four suspend/resume cycles without problems.

Today I am testing 5.10.8 and it seems working smooth, better than
5.4.83. Sadly I tried to suspend/resume before sending this email and it
crashed. Then I did it again 4 o 5 five times without problems.

There are some kernel warnings that maybe help. First does not seem
related with graphics, it only appears on 5.10.x versions and I think
that there are one per each processor:
https://share.riseup.net/#100I-6k6jsi4spX0BhGkzA

The other appears both on 5.4/5.10 with pretty different call trace:
https://share.riseup.net/#X55SR3mgJPqOndr8Be_IQQ

Regards.

No, but this patch has been superseded by the latest intel xorg driver.
To try it, just enable the current-testing repo and upgrade
xorg-x11-drv-intel to at least v2.99.917-49.20210126.

    sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing --action=upgrade xorg-x11-drv-intel

Discussion: Screen only refreshed once mouse is moved · Issue #6356 · QubesOS/qubes-issues · GitHub

Hi,

It seems that I had it already installed with standard dom0 updates.

Since I was on doubt, I started running 5.4.91 and I felt some
strange visual effects compared to 5.4.83 which I was using
previously.

FWIW this patch only fixes the issue related to the "failed to submit
rendering commands (Bad address)" message and nothing more.

If the issue starts to manifest without such message in the Xorg server
log, then it is highly likely to be platform-specific.

Also I had a crash when waking up from suspend (I did not
have one in months or maybe never on this laptop). After the crash I did
three or four suspend/resume cycles without problems.

This is a news to me. Has this issue been submitted to the tracker?
If not, can you post it there?

Today I am testing 5.10.8 and it seems working smooth, better than
5.4.83. Sadly I tried to suspend/resume before sending this email and it
crashed. Then I did it again 4 o 5 five times without problems.

You can collect panic and oops logs with a kernel that has
CONFIG_EFI_VARS_PSTORE enabled. If you don't already have one, you may
build qubes-linux-kernel via qubes-builder with the following line
added to `config-qubes`:

    CONFIG_EFI_VARS_PSTORE=y

After booting the kernel, subsequent panic and oops will be recorded
in EFI variables, which will then appear as /sys/fs/pstore/dmesg-efi-*
at the next boot. Also note that the logs may be split into multiple
parts that are numbered in a reversed order. For more information, see
https://www.kernel.org/doc/html/latest/admin-guide/pstore-blk.html

There are some kernel warnings that maybe help. First does not seem
related with graphics, it only appears on 5.10.x versions and I think
that there are one per each processor:
share.riseup.net

The other appears both on 5.4/5.10 with pretty different call trace:
share.riseup.net

Thanks for sharing.

Regards.

- --
Sincerely,
Jinoh Kang

FWIW this patch only fixes the issue related to the "failed to submit
rendering commands (Bad address)" message and nothing more.

If the issue starts to manifest without such message in the Xorg server
log, then it is highly likely to be platform-specific.

I am not really sure if experimented that problem before the patch.

Also I had a crash when waking up from suspend (I did not
have one in months or maybe never on this laptop). After the crash I did
three or four suspend/resume cycles without problems.

This is a news to me. Has this issue been submitted to the tracker?
If not, can you post it there?

As I saw in the IRC channel last kernels seem pretty unstable, better I
will wait when I have some reliable way for reproducing and some useful log.

Today I am testing 5.10.8 and it seems working smooth, better than
5.4.83. Sadly I tried to suspend/resume before sending this email and it
crashed. Then I did it again 4 o 5 five times without problems.

You can collect panic and oops logs with a kernel that has
CONFIG_EFI_VARS_PSTORE enabled. If you don't already have one, you may
build qubes-linux-kernel via qubes-builder with the following line
added to `config-qubes`:

    CONFIG_EFI_VARS_PSTORE=y

After booting the kernel, subsequent panic and oops will be recorded
in EFI variables, which will then appear as /sys/fs/pstore/dmesg-efi-*
at the next boot. Also note that the logs may be split into multiple
parts that are numbered in a reversed order. For more information, see
pstore block oops/panic logger — The Linux Kernel documentation

"CONFIG_EFI_VARS_PSTORE=y" seems set by default for kernels 5.10.X, now
I am running 5.4.91 (which I guess that has it disabled) and I can see
an empty '/sys/fs/pstore/' directory.

I will try to run the latest kernel again and look for when a crash occurs.

Hi, after booting again 5.10.8 kernel (with 5.10.11 I can't start any VM
but I think Marek is already aware) I can see the dmesg-efi "files".

Here is the concatenation of all files (probably in reverse or wrong order):
blob:https://share.riseup.net/3360675c-292f-4114-a109-c410e2518295

That's a wrong URL (blob:). Maybe copy it again?

- --
Sincerely,
Jinoh Kang