Opened 4 months ago

Last modified 6 weeks ago

#1775 assigned defect

fuse tests hang

Reported by: Mario Domenech Goulart Owned by: evhan
Priority: minor Milestone: someday
Component: extensions Version:
Keywords: fuse Cc:
Estimated difficulty:

Description (last modified by Mario Domenech Goulart)

The test suite of fuse hangs sometimes. I don't know a way to reproduce that, but they hang from time to time.

Hangs have been observed with:

  • CHICKEN 4 jobs on the salmonella-linux-x86-64 machine (Debian 9)
  • CHICKEN 5 jobs on the salmonella-linux-x86 machine (Debian 10)

Below is some info:

$ cat /etc/debian_version 
9.13

$ clang --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ strace -p 23651
strace: Process 23651 attached
restart_syscall(<... resuming interrupted poll ...>

Change History (3)

comment:1 Changed 6 weeks ago by Mario Domenech Goulart

Description: modified (diff)
Version: 4.13.0

Just caught fuse tests hanging on a machine running C5 too.

comment:2 Changed 6 weeks ago by Mario Domenech Goulart

Could also reproduce the issue on my laptop (Debian 11, CHICKEN 5.3.0rc4 built with tcc 0.9.27):

-- testing fuse (smoke test) -------------------------------------------------
(filesystem-start! path fs) .......................................... [ PASS]

The process seems to get stuck in a CPU-intensive loop:

getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768248}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768257}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
poll([{fd=8, events=POLLIN}, {fd=34, events=POLLIN}, {fd=10, events=POLLIN}, {fd=56, events=POLLIN}, {fd=58, events=POLLIN}, {fd=54, events=POLLIN}, {fd=52, events=POLLIN}, {fd=50, events=POLLIN}, {fd=48, events=POLLIN}, {fd=46, events=POLLIN}, {fd=44, events=POLLIN}, {fd=42, events=POLLIN}, {fd=40, events=POLLIN}, {fd=38, events=POLLIN}, {fd=36, events=POLLIN}, {fd=32, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=26, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}, {fd=20, events=POLLIN}, {fd=18, events=POLLIN}, {fd=16, events=POLLIN}, {fd=14, events=POLLIN}, {fd=12, events=POLLIN}], 26, 0) = 0 (Timeout)
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768467}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768477}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
poll([{fd=8, events=POLLIN}, {fd=34, events=POLLIN}, {fd=10, events=POLLIN}, {fd=56, events=POLLIN}, {fd=58, events=POLLIN}, {fd=54, events=POLLIN}, {fd=52, events=POLLIN}, {fd=50, events=POLLIN}, {fd=48, events=POLLIN}, {fd=46, events=POLLIN}, {fd=44, events=POLLIN}, {fd=42, events=POLLIN}, {fd=40, events=POLLIN}, {fd=38, events=POLLIN}, {fd=36, events=POLLIN}, {fd=32, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=26, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}, {fd=20, events=POLLIN}, {fd=18, events=POLLIN}, {fd=16, events=POLLIN}, {fd=14, events=POLLIN}, {fd=12, events=POLLIN}], 26, 0) = 0 (Timeout)
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768687}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768696}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
poll([{fd=8, events=POLLIN}, {fd=34, events=POLLIN}, {fd=10, events=POLLIN}, {fd=56, events=POLLIN}, {fd=58, events=POLLIN}, {fd=54, events=POLLIN}, {fd=52, events=POLLIN}, {fd=50, events=POLLIN}, {fd=48, events=POLLIN}, {fd=46, events=POLLIN}, {fd=44, events=POLLIN}, {fd=42, events=POLLIN}, {fd=40, events=POLLIN}, {fd=38, events=POLLIN}, {fd=36, events=POLLIN}, {fd=32, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=26, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}, {fd=20, events=POLLIN}, {fd=18, events=POLLIN}, {fd=16, events=POLLIN}, {fd=14, events=POLLIN}, {fd=12, events=POLLIN}], 26, 0) = 0 (Timeout)
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768915}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=294, tv_usec=768928}, ru_stime={tv_sec=5, tv_usec=462431}, ...}) = 0
poll([{fd=8, events=POLLIN}, {fd=34, events=POLLIN}, {fd=10, events=POLLIN}, {fd=56, events=POLLIN}, {fd=58, events=POLLIN}, {fd=54, events=POLLIN}, {fd=52, events=POLLIN}, {fd=50, events=POLLIN}, {fd=48, events=POLLIN}, {fd=46, events=POLLIN}, {fd=44, events=POLLIN}, {fd=42, events=POLLIN}, {fd=40, events=POLLIN}, {fd=38, events=POLLIN}, {fd=36, events=POLLIN}, {fd=32, events=POLLIN}, {fd=30, events=POLLIN}, {fd=28, events=POLLIN}, {fd=26, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}, {fd=20, events=POLLIN}, {fd=18, events=POLLIN}, {fd=16, events=POLLIN}, {fd=14, events=POLLIN}, {fd=12, events=POLLIN}], 26, 0) = 0 (Timeout)

After some attempts to kill that process it went to the "uninterruptible sleep" state (doesn't respond to signals) and I could not even kill it.

comment:3 Changed 6 weeks ago by Mario Domenech Goulart

Attempting to unload the fuse kernel module causes:

[Wed Oct 27 20:28:46 2021] Disabling lock debugging due to kernel taint
[Wed Oct 27 20:28:46 2021] =============================================================================
[Wed Oct 27 20:28:46 2021] BUG fuse_inode (Tainted: G  R  U           ): Objects remaining in fuse_inode on __kmem_cache_shutdown()
[Wed Oct 27 20:28:46 2021] -----------------------------------------------------------------------------

[Wed Oct 27 20:28:46 2021] INFO: Slab 0x000000001779a0bd objects=19 used=18 fp=0x0000000095932986 flags=0xfffffc0010200
[Wed Oct 27 20:28:46 2021] CPU: 0 PID: 42390 Comm: rmmod Tainted: G  R BU            5.10.0-8-amd64 #1 Debian 5.10.46-5
[Wed Oct 27 20:28:46 2021] Hardware name: LENOVO 20GG001KUS/20GG001KUS, BIOS N1LET90W (1.90 ) 11/11/2020
[Wed Oct 27 20:28:46 2021] Call Trace:
[Wed Oct 27 20:28:46 2021]  dump_stack+0x6b/0x83
[Wed Oct 27 20:28:46 2021]  slab_err+0xb7/0xdc
[Wed Oct 27 20:28:46 2021]  ? kernel_poison_pages+0xc0/0xc0
[Wed Oct 27 20:28:46 2021]  ? slub_cpu_dead+0xa0/0xa0
[Wed Oct 27 20:28:46 2021]  __kmem_cache_shutdown.cold+0x31/0xfe
[Wed Oct 27 20:28:46 2021]  kmem_cache_destroy+0x53/0x110
[Wed Oct 27 20:28:46 2021]  fuse_exit+0x29/0x2e [fuse]
[Wed Oct 27 20:28:46 2021]  __do_sys_delete_module+0x190/0x300
[Wed Oct 27 20:28:46 2021]  do_syscall_64+0x33/0x80
[Wed Oct 27 20:28:46 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Wed Oct 27 20:28:46 2021] RIP: 0033:0x7f5d534887d7
[Wed Oct 27 20:28:46 2021] Code: 73 01 c3 48 8b 0d b9 f6 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 f6 0b 00 f7 d8 64 89 01 48
[Wed Oct 27 20:28:46 2021] RSP: 002b:00007ffecd151eb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[Wed Oct 27 20:28:46 2021] RAX: ffffffffffffffda RBX: 000055ca8f2b0760 RCX: 00007f5d534887d7
[Wed Oct 27 20:28:46 2021] RDX: 000055ca8f2b0860 RSI: 0000000000000a00 RDI: 000055ca8f2b07c8
[Wed Oct 27 20:28:46 2021] RBP: 0000000000000200 R08: 000055ca8f2b07e0 R09: 00007f5d53548be0
[Wed Oct 27 20:28:46 2021] R10: 000055ca8f2b1760 R11: 0000000000000206 R12: 00007ffecd1520e8
[Wed Oct 27 20:28:46 2021] R13: 00007ffecd1528d9 R14: 000055ca8f2b02a0 R15: 000055ca8f2b0760
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000a4069606 @offset=0
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000db760b9f @offset=832
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000004fbcb2d4 @offset=1664
[Wed Oct 27 20:28:46 2021] INFO: Object 0x0000000054e6106f @offset=2496
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000003ecd50ed @offset=3328
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000004d60ea7b @offset=4160
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000006e39884a @offset=5824
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000006e5e8a15 @offset=6656
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000738d55e2 @offset=7488
[Wed Oct 27 20:28:46 2021] INFO: Object 0x0000000042644689 @offset=8320
[Wed Oct 27 20:28:46 2021] INFO: Object 0x0000000020afe7b0 @offset=9152
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000db14db06 @offset=9984
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000f5e6ea8a @offset=10816
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000000db8fe50 @offset=11648
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000000542a9e5 @offset=12480
[Wed Oct 27 20:28:46 2021] INFO: Object 0x0000000037060d99 @offset=13312
[Wed Oct 27 20:28:46 2021] INFO: Object 0x000000007503a134 @offset=14144
[Wed Oct 27 20:28:46 2021] INFO: Object 0x00000000dbc4ef19 @offset=14976

After that the system is mostly unusable:

$ sudo sync ; echo $?
Killed
137

Not sure if it is directly related to the fuse egg (probably not), but apparently the egg causes processes that use it to go into a weird state. I suppose trying to remove the kernel module when such processes are running triggers some edge case in the kernel code, which leads to that kernel backtrace.

Just in case, I've added the fuse egg to the skip list of salmonella tests for they time being, as once processes hang they cannot be killed.

Last edited 6 weeks ago by Mario Domenech Goulart (previous) (diff)
Note: See TracTickets for help on using tickets.