I have an AMD Raven Ridge (-like) APU with a Radeon Vega 8 (V1605B) as one of my R4.1 systems to play around with.
As R4.1 moved to kernel branches higher than 5.4 I have started to experience random freezes and crashes.
I was not always able to retrieve logs, but in cases I did, amdgpu
has been involved.
If new releases of stable-5.10
or latest
become available in the repositories, I try them out with mostly similar results.
If I do not want the system to freeze/ crash I start it with a self-compiled stable-5.4
kernel, which runs just rock solid.
The nature of the issues are especially Xorg
crashes and the whole system freezing. In the first case I was able to obtain some kernel buffer messages, in the latter I was not able to get any logs.
Xorg crash example
[59968.665419] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32770, for process Xorg pid 2412 thread X:cs0 pid 2512)
[59968.665421] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x0000800107a4a000 from client 27
[59968.665422] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
[59968.665423] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[59968.665424] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
[59968.665425] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
[59968.665426] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[59968.665427] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
[59968.665428] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
[59968.911907] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
This most likely a driver/ kernel/ Xen issue, but as here is the most likely place to find people who run Xen on Raven Ridge, I just wanted to drop this here to ask whether this is a problem that other people Raven Ridge users are having, too.