Conversation

Jarkko Sakkinen

Edited 6 months ago

I get something like this constantly with Steam:

[38438.072899] x86/split lock detection: #AC: ChaosGate.exe/11646 took a split_lock trap at address: 0x6fffe5f51242
[38441.268772] x86/split lock detection: #AC: ChaosGate.exe/11852 took a split_lock trap at address: 0x6ffff6801001
[38446.628277] x86/split lock detection: #AC: Loading.Preload/11762 took a split_lock trap at address: 0x6ffff6a3aee0
[38494.104580] i915 0000:03:00.0: [drm] GPU HANG: ecode 12:1:84dfd7f7, in ChaosGate.exe [11646]
[38494.104588] i915 0000:03:00.0: [drm] ChaosGate.exe[11646] context reset due to GPU hang

Any ideas? This is now KDE6/X11 but the similar hang happens also with Wayland.

CPU is i9-13900k, GPU is ARC A770 and OS is Tumbleweed. Kernel is latest mainline but this has happened also few previous kernel versions.

Also this seems to happen with any possible game but all of them are Windows games running with Proton.

3
0
0

Jarkko Sakkinen

Edited 6 months ago
@jani you know how fix or workaround this? :-) Never happened with same machine only exception being NVIDIA RTX card instead of Intel. Has been ongoing issue for half years but I have had neither time nor interest to play anything that much so have ignored it so far.

The irony is that I bought Intel card exactly for better "out-of-the-box" state for system after installing OS ;-)
0
0
1

@jarkko
I wonder if it's the general instability of latest Intel cpus, requiring limiting power usage or underclocking slightly https://www.io-tech.fi/uutinen/intelin-13-ja-14-sukupolven-core-i9-prosessoreissa-on-ilmennyt-vakausongelmia/

1
0
1
@timojyrinki let me remind that this happens neither with RTX nor with CPU heavy tasks such as compilation.

another different kind of workload where this does not happen is Bitwig Studio. So this is directly connected to this graphics card way or another.

I disabled upscaling (i.e. intel_pstate) but have not yet tried also disabling downscaling, which could be good test perhaps.
1
0
0

Jarkko Sakkinen

Edited 6 months ago
@timojyrinki i'm happy intel alumni but unhappy intel customer :-) does not match my quality expectations tbh.
1
0
0

@jarkko
I switched to AMD team (on CPU side, graphics since ~forever) in the Autumn. Grass is so much greener over there. I have good stories about how "easy" it's to buy 2x32GB DDR5-6000 memory chips out of the "recommended" lists..otherwise happy.

I appreciate Intel efforts on discrete graphics, but it wouldn't be a miracle if they weren't rock stable yet. In theory Intel has a long history in open drivers, in practice there have a been persistent stream of hw/sw bugs hard to workaround.

1
0
1
@timojyrinki ya, switching to AMD is not an option ;-) And as said CPU has caused zero issues so far. Not like by ideology but do not want to spend moneyz.

Still a major turndown, i.e. owning a GPU that cannot do graphics.
1
0
0

@jarkko
Yes yes, it would be a total rehaul to switch. With a new discrete graphics competitor I'd have bought AMD GPU for open drivers to pair with Intel CPU (I have Radeon 6600XT, earlier with i7-7700T, now with Ryzen 7900), but even with "only" GPU it'd be best to get the existing one working.

On integrated graphics side (all my laptops have always been all-Intel) there used to be GPU options to try that helped, but my Tiger Lake laptop has been stable from the beginning.

1
0
0
@timojyrinki yeah, i'll start to grep kernel tree, and see where the messages emit.
2
0
0

@timojyrinki unrelated side-note, i don’t undestand why people put this cruft to new code in kernel:

 * Authors:
 *    Eric Anholt <eric@anholt.net>
 *    Keith Packard <keithp@keithp.com>
 *    Mika Kuoppala <mika.kuoppala@intel.com>

Nobody cares as Git has author field. Totally useless information.

1
0
2

@timojyrinki

GPU HANG is emitted by error_msg. Would be better of by being i915_error_msg or i915_error for easier grepping and inserting probes and this also supports it:

$ git grep error_msg|wc -l
244
0
0
0

@jarkko
https://drm.pages.freedesktop.org/intel-docs/how-to-file-i915-bugs.html

Although with that mentioned what is worth trying at some point as a workaround is the totally new Xe kernel driver that was merged to 6.8. https://www.phoronix.com/review/intel-xe-benchmark

It may be both a workaround and more fruitful for debugging than beating the old i915 horse. For reference of that driver's name see https://en.m.wikipedia.org/wiki/Intel_GMA#Gen3

1
0
0
@timojyrinki I use only LKML if I decide to post anything. I hate bug trackers, sorry :-) it is a kernel bug so LKML is way to go. Going to ignore web forms for sure.
1
0
0

@jarkko
Yes I see your point of view, you're not a very mundane _Linux_ user :D But check the xe kernel driver out and how to enable it if you get tired of i915.

1
0
1
@timojyrinki yeah i look up now to kernel tree, dig up all info that i possibly can and then post to LKML :-) and thanks for XE driver tip! i'll try it but i still look up first if i can find something nasty in i915 driver.

If XE driver is a fix, then this is also tumbleweed bug, ain't it @vbabka ?
1
0
0
@jani OK fair enough since both you and timo try to put me to fill out web form I will do it although it feels like torture :-)

I'll still look through kernel tree first see what I can find...
0
0
1
@jani fair enough :-) i'll do as good report as I possibly can and submit it...
1
0
1
@jani the breaking point doing something to this was after watching fallout pilot and wanting to start playing fallout 3 one again (with latest graphics mods) ;-) fallout made me do it
0
0
1

@jarkko

@vbabka

He's probably busy at the SUSE Labs conference this week, but bugzilla is always open even before he's around 😁

I'm also interested in graphics bugs myself if I have the combination at hand, but right now I'm typing in fever from bed so can only post feverish debugging ideas 😁 It could be nice to even develop openQA gfx driver test cases for Tumbleweed, but as execution is largely done in qemu, and baremetal testing in general is hard, there are some limitations.

1
0
0
@jarkko That's very little information out of that log. Do you mind trying with drm-tip? mainline might be a bit behind.
2
0
1

Jarkko Sakkinen

Edited 6 months ago
@Andi I can try it once I have bandwidth. Thanks for tip!
1
0
0
@timojyrinki @vbabka there is some problem with sign on in SUSE sites. i've never been able to log in to that bugzilla. I've emailed to admins but they never have answered.

There was some other bug that I even fixed in OpenSUSE installation but I cannot remember what it was :-) I reported it here to mastodon only. I need to look that one up too if I ever get access to that bugzilla.
1
0
0

@jarkko

@vbabka

Darn, it seems LKML is simply so much better than these web form things indeed... I've no other idea than trying https://idp-portal.suse.com/univention/self-service/#page=passwordreset - I haven't had a problem myself, and I use Firefox with relatively strict settings.

1
0
0
@jani @timojyrinki yeah sorry for mentioned that. i can admit that being irritated it just spotted my attention and ofc i went to complain about it here in social media :-) apologies!
1
0
1

Jarkko Sakkinen

Edited 6 months ago
@jani @timojyrinki that said prefixes do have measurable benefits for a developer! :-)

e.g. if have such prefix missing from some function that i maintain and i get a patch with that rationale, i most likely ack it despite being somewhat cosmetic change.
0
0
1
@Andi Luckily I've recently tested SGX cgroups patches with Tumbleweed, i.e. know how to compile equivalent distro kernel (as it is made for OpenSUSE) for any possible kernel tree :-) So can easily try this out once the bandwidth is available.
0
0
0
@timojyrinki @vbabka i've tried this a million times and with both of the major browser, no luck. opensuse bugzilla is inaccessible for me unfortunately.
0
0
0
@Andi Hey, running drm-tip really does fix any issues. I tried to play for about 20 minutes so it is not like very comprehensive test but in the past things have failed with 2-5 minutes so at least to the right direction.
1
0
0

@Andi OK so it still trips but at least the dump is longer now:

[   48.070785] x86/split lock detection: #AC: CJobMgr::m_Work/4188 took a split_lock trap at address: 0xe768347f
[   48.151575] x86/split lock detection: #AC: CJobMgr::m_Work/4200 took a split_lock trap at address: 0xe768347f
[   48.830151] x86/split lock detection: #AC: CJobMgr::m_Work/4274 took a split_lock trap at address: 0xe768347f
[   50.154695] x86/split lock detection: #AC: CJobMgr::m_Work/4392 took a split_lock trap at address: 0xe768347f
[   62.952187] x86/split lock detection: #AC: IPC:CSteamEngin/4183 took a split_lock trap at address: 0xe76834ba
[   80.611973] umip: ChaosGate.exe[5397] ip:6ffff686aa76 sp:6357f9d0: SGDT instruction cannot be used by applications.
[   80.611981] umip: ChaosGate.exe[5397] ip:6ffff686aa76 sp:6357f9d0: For now, expensive software emulation returns the result.
[   80.616544] umip: ChaosGate.exe[5397] ip:6fffeb42bb50 sp:6357f9d0: SGDT instruction cannot be used by applications.
[   80.616548] umip: ChaosGate.exe[5397] ip:6fffeb42bb50 sp:6357f9d0: For now, expensive software emulation returns the result.
[   81.334429] umip: ChaosGate.exe[5361] ip:6fffe874c11e sp:10f6c8: SGDT instruction cannot be used by applications.
[   81.357831] x86/split lock detection: #AC: ChaosGate.exe/5361 took a split_lock trap at address: 0x6fffe5f51242
[   84.552845] x86/split lock detection: #AC: ChaosGate.exe/5554 took a split_lock trap at address: 0x6ffff6801001
[   89.917136] x86/split lock detection: #AC: Loading.Preload/5471 took a split_lock trap at address: 0x6ffff6a3aee0
[  673.720113] BTRFS info (device dm-2): qgroup scan completed (inconsistency flag cleared)
[  907.829729] umip_printk: 51 callbacks suppressed
[  907.829732] umip: ChaosGate.exe[5361] ip:6ffff4fc53a0 sp:10e0c8: SGDT instruction cannot be used by applications.
[  907.829737] umip: ChaosGate.exe[5361] ip:6ffff4fc53a0 sp:10e0c8: For now, expensive software emulation returns the result.
[ 7778.421556] umip: ChaosGate.exe[13366] ip:6ffff686aa76 sp:6357f9d0: SGDT instruction cannot be used by applications.
[ 7778.421561] umip: ChaosGate.exe[13366] ip:6ffff686aa76 sp:6357f9d0: For now, expensive software emulation returns the result.
[ 7778.425809] umip: ChaosGate.exe[13366] ip:6fffeb42bb50 sp:6357f9d0: SGDT instruction cannot be used by applications.
[ 7778.425811] umip: ChaosGate.exe[13366] ip:6fffeb42bb50 sp:6357f9d0: For now, expensive software emulation returns the result.
[ 7778.499061] umip: ChaosGate.exe[13330] ip:6fffe874c11e sp:10f6c8: SGDT instruction cannot be used by applications.
[ 7778.516548] x86/split lock detection: #AC: ChaosGate.exe/13330 took a split_lock trap at address: 0x6fffe5f51242
[ 7781.596951] x86/split lock detection: #AC: ChaosGate.exe/13562 took a split_lock trap at address: 0x6ffff6801001
[ 7786.871080] x86/split lock detection: #AC: Loading.Preload/13448 took a split_lock trap at address: 0x6ffff6a3aee0
[ 7911.623629] i915 0000:03:00.0: [drm] GPU HANG: ecode 12:1:84dfd7f7, in ChaosGate.exe [13330]
[ 7911.623637] i915 0000:03:00.0: [drm] ChaosGate.exe[13330] context reset due to GPU hang
[ 7922.254173] umip_printk: 41 callbacks suppressed
[ 7922.254176] umip: ChaosGate.exe[13330] ip:6ffff4fc53a0 sp:10d0c8: SGDT instruction cannot be used by applications.
[ 7922.254182] umip: ChaosGate.exe[13330] ip:6ffff4fc53a0 sp:10d0c8: For now, expensive software emulation returns the result.

Modules loaded:

$ lsmod|grep i915
i915                 4284416  115
i2c_algo_bit           24576  2 xe,i915
drm_buddy              20480  2 xe,i915
ttm                   110592  3 drm_ttm_helper,xe,i915
drm_display_helper    282624  2 xe,i915
cec                    94208  3 drm_display_helper,xe,i915
video                  77824  4 asus_wmi,asus_nb_wmi,xe,i915
1
0
0
@Andi For what it is worth this was now in X11. I used it because thought it might be stabler with Steam. I can revert back to Wayland and see if that makes any difference.
0
0
0