docs/howtos/HowToDebugVKernels/index.mdwn

   1 # Introduction
   2 The purpose of this document is to introduce the reader with vkernel debugging.
   3 The vkernel architecture allows us to run DragonFly kernels in userland. These virtual
   4 kernels can be paniced or otherwise abused, without affecting the host operating system.
   5
   6 To make things a bit more interesting, we will use a real life example.
   7
   8 # Once upon a time
   9 ... I wrote a simple program that used the AIO interface. As it turned out we don't support
  10 this feature, but at that point I didn't know.
  11
  12     $ gcc t_aio.c -o t_aio -Wall -ansi -pedantic
  13     $ ./t_aio
  14     aio_read: Function not implemented
  15     $
  16
  17 Ktrace'ing the process and seeing with my own eyes what was going on, seemed like a good idea.
  18 Here comes the fun. I misread the [ktrace(1)](http://leaf.dragonflybsd.org/cgi/web-man?command=ktrace&section=1) man page and typed:
  19
  20     $ ktrace -c ./t_aio
  21
  22 And the system hang.
  23
  24 My intention was to track the system calls of t_aio, but what I typed would actually disable all traces from all processes to ktrace.out, the default tracing file. Out of pure luck, a bug has been discovered.
  25
  26 # Setup a vkernel
  27 To setup a vkernel, please consult [this man page](http://leaf.dragonflybsd.org/cgi/web-man?command=vkernel&section=ANY).
  28 It's very straightforward.
  29
  30
  31 # Reproduce the problem
  32 We boot into our vkernel:
  33
  34     # cd /var/kernel
  35     # ./boot/kernel -m 64m -r rootimg.01 -I auto:bridge0
  36     [...]
  37     login: root
  38     #
  39 And then try to reproduce the system freeze:
  40
  41     # ktrace -c ./t_aio
  42
  43     Fatal trap 12: page fault while in kernel mode
  44     mp_lock = 00000001; cpuid = 1
  45     fault virtual address   = 0x0
  46     fault code              = supervisor read, page not present
  47     instruction pointer     = 0x1f:0x80aca52
  48     stack pointer           = 0x10:0x5709d914
  49     frame pointer           = 0x10:0x5709dbe0
  50     processor eflags        = interrupt enabled, resume, IOPL = 0
  51     current process         = 692 (ktrace)
  52     current thread          = pri 6
  53      <- SMP: XXX
  54     kernel: type 12 trap, code=4
  55
  56     CPU1 stopping CPUs: 0x00000001
  57      stopped
  58     Stopped at      0x80aca52:      movl    0(%eax),%eax
  59     db>
  60
  61 This db> prompt is from [ddb(4)](http://leaf.dragonflybsd.org/cgi/web-man?command=ddb&section=4), the interactive kernel debugger.
  62 The
  63
  64     fault virtual address   = 0x0
  65
  66 field is indicative of a NULL pointer dereference inside the kernel.
  67
  68 Let's get a trace of what went wrong:
  69
  70     db> trace
  71     ktrdestroy(57082700,5709dc5c,0,57082700,5709dca0) at 0x80aca52
  72     allproc_scan(80aca14,5709dc5c,be,2,0) at 0x80b2e91
  73     sys_ktrace(5709dca0,6,0,0,57082700) at 0x80acffe
  74     syscall2(5709dd40,6,57082700,0,0) at 0x8214b6d
  75     user_trap(5709dd40,570940e8,8214185,0,8215462) at 0x8214d9c
  76     go_user(5709dd38,0,0,7b,0) at 0x82151ac
  77     db>
  78
  79 Here sys_ktrace, allproc_scan, etc represent function names. Functions are listed in the _reverse_ order they were called. Thus, in this particular example, the last function which was called is ktrdestroy(). The hex values in parentheses are the first five items on the stack. Since ddb doesn't really know how many arguments a function takes, it always prints five. The last hex value is the [instruction address](http://en.wikipedia.org/wiki/Program_counter).
  80
  81 # Gdb
  82
  83 Quoting from [vkernel(7)](http://leaf.dragonflybsd.org/cgi/web-man?command=vkernel&section=7):
  84
  85 It is possible to directly gdb the virtual kernel's process.  It is recommended that you do a `handle SIGSEGV noprint' to ignore page faults processed by the virtual kernel itself and `handle SIGUSR1 noprint' to ignore signals used for simulating inter-processor interrupts (SMP build only).
  86
  87 You can add these two commands in your ~/.gdbinit to save yourself from typing them again and again.
  88
  89     $ cat ~/.gdbinit
  90     handle SIGSEGV noprint
  91     handle SIGUSR1 noprint
  92
  93 So we are going to attach to the vkernel process:
  94
  95     # ps aux | grep kernel
  96     root  25408  0.0  2.3 1053376 17772  p0  IL+   8:32PM   0:06.51 ./boot/kernel -m 64m -r rootimg.01 -I auto:bridge0
  97     # gdb kernel 25408
  98     GNU gdb 6.7.1
  99     [...]
 100
 101 Let's get a trace from inside gdb:
 102
 103     (gdb) bt
 104     #0  0x282d60d0 in read () from /usr/lib/libc.so.6
 105     #1  0x2828389f in read () from /usr/lib/libthread_xu.so.2
 106     #2  0x0821cd86 in vconsgetc (private=0x56758168) at /usr/src/sys/platform/vkernel/platform/console.c:373
 107     #3  0x080e431d in cngetc () at /usr/src/sys/kern/tty_cons.c:482
 108     #4  0x080813d0 in db_readline (lstart=0x82806a0 "", lsize=120) at /usr/src/sys/ddb/db_input.c:314
 109     #5  0x08081c43 in db_read_line () at /usr/src/sys/ddb/db_lex.c:55
 110     #6  0x080804ff in db_command_loop () at /usr/src/sys/ddb/db_command.c:467
 111     #7  0x08082ef8 in db_trap (type=12, code=4) at /usr/src/sys/ddb/db_trap.c:71
 112     #8  0x082125aa in kdb_trap (type=12, code=4, regs=0x5746c8cc) at /usr/src/sys/platform/vkernel/i386/db_interface.c:151
 113     #9  0x082143e1 in trap_fatal (frame=0x5746c8cc, usermode=<value optimized out>, eva=0)
 114         at /usr/src/sys/platform/vkernel/i386/trap.c:1031
 115     #10 0x0821453e in trap_pfault (frame=0x5746c8cc, usermode=0, eva=0) at /usr/src/sys/platform/vkernel/i386/trap.c:948
 116     #11 0x0821468d in kern_trap (frame=0x5746c8cc) at /usr/src/sys/platform/vkernel/i386/trap.c:709
 117     #12 0x0821528c in exc_segfault (signo=11, info=0x5746cb98, ctxp=0x5746c8b8)
 118         at /usr/src/sys/platform/vkernel/i386/exception.c:181
 119     #13 <signal handler called>
 120     #14 0x080aca52 in ktrace_clear_callback (p=0x567480c0, data=0x5746cc5c) at /usr/src/sys/kern/kern_ktrace.c:347
 121     #15 0x080b2e91 in allproc_scan (callback=0x80aca14 <ktrace_clear_callback>, data=0x5746cc5c)
 122         at /usr/src/sys/kern/kern_proc.c:533
 123     #16 0x080acffe in sys_ktrace (uap=0x5746cca0) at /usr/src/sys/kern/kern_ktrace.c:276
 124     #17 0x08214b6d in syscall2 (frame=0x5746cd40) at /usr/src/sys/platform/vkernel/i386/trap.c:1273
 125     #18 0x08214d9c in user_trap (frame=0x5746cd40) at /usr/src/sys/platform/vkernel/i386/trap.c:413
 126     #19 0x082151ac in go_user (frame=0x5746cd38) at /usr/src/sys/platform/vkernel/i386/trap.c:1473
 127     #20 0x08215462 in pmsg4 () at /usr/src/sys/platform/vkernel/i386/fork_tramp.s:103
 128     (gdb)
 129
 130 At this point we can examine the data of various variables. Keep in mind that bare addresses must be cast to the respective data type, prior to accessing. E.g.:
 131
 132     (gdb) print ((struct proc *)0x567480c0)->p_pid
 133     $6 = 690
 134     (gdb)
 135
 136
 137 Let's try this time to break into the kernel _before_ it crashes. sys_ktrace() seems like a good candidate.
 138 We stop the old vkernel and fire off a new one. Once we are logged in, we attach to it as before:
 139
 140     # gdb kernel 25532
 141     GNU gdb 6.7.1
 142     [...]
 143     (gdb) break sys_ktrace
 144     Breakpoint 1 at 0x80acf43: file ./machine/thread.h, line 83.
 145     (gdb)
 146
 147 Next we type 'c' in the gdb prompt to resume vkernel execution:
 148
 149     (gdb) c
 150     Continuing.
 151
 152 We switch now to our vkernel and type in the offending command:
 153
 154     # ktrace -c
 155
 156 Gdb stops the execution of vkernel and a message pops up in gdb buffer:
 157
 158     Breakpoint 1, sys_ktrace (uap=0x573e2ca0) at ./machine/thread.h:83
 159     83          __asm ("movl %%fs:globaldata,%0" : "=r" (gd) : "m"(__mycpu__dummy));
 160     (gdb)
 161
 162 At this point, kernel hasn't paniced yet, because we are inside sys_ktrace().
 163 We navigate through source code with the 'step' and 'next' gdb commands.
 164 They are identical, except that 'step' follows function calls. When we meet this call:
 165
 166     276                     allproc_scan(ktrace_clear_callback, &info);
 167
 168 we 'step' inside it. The alloproc_scan() function, iterates through the process list and calls ktrace_clear_callback() for each one of them. Later we see this:
 169
 170     347                     if (p->p_tracenode->kn_vp == info->tracenode->kn_vp) {
 171
 172 Here p is a pointer to the current process:
 173
 174     (gdb) print p
 175     $1 = (struct proc *) 0x57098c00
 176
 177 Let's see if this process is traced (if it is, the p->p_tracenode->kn_vp shall point to a vnode where all logs are directed):
 178
 179     (gdb) print p->p_tracenode
 180     $2 = (struct ktrace_node *) 0x0
 181     (gdb)
 182
 183 Oops. There is no trace to any vnode for this process. The code will try to access p->p_tracenode->kn_vp and is bound to crash. This is the _zero virtual address_ we saw before. It seems that the kernel tries to disable tracing of all processes indiscriminately, even of those that aren't traced. Now that we know the root of problem we write a [patch](http://gitweb.dragonflybsd.org/dragonfly.git/commit/a4a639859f6bc14f9f55142b4bd2289b2a56d7f2) and poke someone to review/commit it.
 184
 185 # Possible places of confusion
 186
 187     (gdb) bt
 188     #0  0x282d4c10 in sigsuspend () from /usr/lib/libc.so.6
 189     #1  0x28287eb2 in sigsuspend () from /usr/lib/libthread_xu.so.2
 190     #2  0x0821530a in stopsig (nada=24, info=0x40407d2c, ctxp=0x40407a4c) at /usr/src/sys/platform/vkernel/i386/exception.c:112
 191     #3  <signal handler called>
 192     #4  0x282d4690 in umtx_sleep () from /usr/lib/libc.so.6
 193     #5  0x08213bde in cpu_idle () at /usr/src/sys/platform/vkernel/i386/cpu_regs.c:722
 194     #6  0x00000000 in ?? ()
 195     (gdb)
 196
 197 When the vkernel is sitting at a db> prompt all vkernel threads representing virtual cpu's except the one handling the db> prompt itself
 198 will be suspended in stopsig(). The backtrace only sees one of the N threads.
 199
 200 # Additional notes
 201 ## Accessing Vkernels memory
 202 For those using HEAD, some changes in libkvm have been introduced so vkernel's memory can be accessed directly now on /proc/$pid/mem.
 203
 204 Among other things, you can have a look at vkernel's process list using ps:
 205
 206
 207     # ps axl -M /proc/829/mem -N /var/vkernel/boot/kernel
 208     UID   PID  PPID CPU PRI  NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
 209      0     0    -1   1 152   0     0 3068 nowork DL    ??    0:00.00  (swapper)
 210      0     1     0   0 152   0   760 3068 wait   IL    ??    0:00.00  (init)
 211     77   212     1   0 152   0   788 3068 poll   S     ??    0:00.00  (dhclient)
 212      0   323     1   0 152   0  1288 3068 select S     ??    0:00.00  (syslogd)
 213      0   627     1 115 222   0  3332 3068 select I     ??    0:00.00  (sshd)
 214      0   641     1   0 152   0  3772 3068 select S     ??    0:00.00  (sendmail)
 215     25   645     1  22 165   0  3668 3068 pause  I     ??    0:00.00  (sendmail)
 216      0     0     0   0   0 -52     0    0 -      ?    con-   0:00.00  ()
 217      0     0     0   0   0 -52     0    0 -      ?    con-   0:00.00  ()
 218      0     0     0   0   0 -52     0    0 -      ?    con-   0:00.00  ()
 219      0   188     1   2 153   0   788 3068 poll   I     v0-   0:00.00  (dhclient)
 220
 221
 222 ## Gdb + vkernel issues
 223 gdb and vkernel (SMP or not) don't play well together anymore.  It is possible to get into
 224 a state where the vkernel is in state "stop" and the vkernel is in "wait", and nothing moves on.
 225 The only help is to kill gdb, which either makes the vkernel run again, or kills it as well.
 226
 227 See also [this bug report](http://bugs.dragonflybsd.org/issue1301).