Fix typo
[ikiwiki.git] / docs / howtos / HowToDebugVKernels / index.mdwn
CommitLineData
49413a25
B
1# Introduction
2The purpose of this document is to introduce the reader with vkernel debugging.
3The vkernel architecture allows us to run DragonFly kernels in userland. These virtual
4kernels can be paniced or otherwise abused, without affecting the host operating system.
5
6To make things a bit more interesting, we will use a real life example.
7
8# Once upon a time
9... I wrote a simple program that used the AIO interface. As it turned out we don't support
10this feature, but at that point I didn't know.
11
12 [beket@sadness ~]$ gcc t_aio.c -o t_aio -Wall -ansi -pedantic
13 [beket@sadness ~]$ ./t_aio
14 aio_read: Function not implemented
15 [beket@sadness ~]$
16
17Ktrace'ing the process and seeing with my own eyes what was going on, seemed like a good idea.
18Here comes the fun. I misread the ktrace(1) man page and typed:
19
20 [beket@sadness ~]$ ktrace -c ./t_aio
21
22And the system hang.
783fb99b
B
23
24(My intention was to track the system calls of t_aio, but what I typed would actually disable all traces from all process to the t_aio file.)
49413a25
B
25
26# Setup a vkernel
27To setup a vkernel, please consult this [man page](http://leaf.dragonflybsd.org/cgi/web-man?command=vkernel&section=ANY).
28It's very straightforward.
29
30# Reproduce the problem
31We boot into our vkernel:
32
33 # cd /var/kernel
34 # ./boot/kernel -m 64m -r rootimg.01 -I auto:bridge0
35 [...]
36 login: root
37 #
38And then try to reproduce the system freeze:
39
40 # ktrace -c ./t_aio
41
42 Fatal trap 12: page fault while in kernel mode
43 mp_lock = 00000001; cpuid = 1
44 fault virtual address = 0x0
45 fault code = supervisor read, page not present
46 instruction pointer = 0x1f:0x80aca52
47 stack pointer = 0x10:0x5709d914
48 frame pointer = 0x10:0x5709dbe0
49 processor eflags = interrupt enabled, resume, IOPL = 0
50 current process = 692 (ktrace)
51 current thread = pri 6
52 <- SMP: XXX
53 kernel: type 12 trap, code=4
54
55 CPU1 stopping CPUs: 0x00000001
56 stopped
57 Stopped at 0x80aca52: movl 0(%eax),%eax
58 db>
59
60This db> prompt is from ddb(4), the interactive kernel debugger.
61The
62
63 fault virtual address = 0x0
64
65field is indicative of a NULL pointer dereference inside the kernel.
66
67Let's get a trace of what went wrong:
68
69 db> trace
70 ktrdestroy(57082700,5709dc5c,0,57082700,5709dca0) at 0x80aca52
71 allproc_scan(80aca14,5709dc5c,be,2,0) at 0x80b2e91
72 sys_ktrace(5709dca0,6,0,0,57082700) at 0x80acffe
73 syscall2(5709dd40,6,57082700,0,0) at 0x8214b6d
74 user_trap(5709dd40,570940e8,8214185,0,8215462) at 0x8214d9c
75 go_user(5709dd38,0,0,7b,0) at 0x82151ac
76 db>
77
78# Gdb
180d4fc3
B
79Quoting from vkernel(7):
80
81It is possible to directly gdb the virtual kernel's process. It is recommended that you do a `handle SIGSEGV noprint' to ignore page faults processed by the virtual kernel itself and `handle SIGUSR1 noprint' to ignore signals used for simulating inter-processor interrupts (SMP build only).
82
53877b8d 83You can add these two commands in your ~/.gdbinit to save yourself from typing them again and again.
eb531287 84
180d4fc3
B
85 [beket@sadness ~]$ cat ~/.gdbinit
86 handle SIGSEGV noprint
87 handle SIGUSR1 noprint
88
89So we are going to attach to the vkernel process:
49413a25
B
90
91 # ps aux | grep kernel
92 root 25408 0.0 2.3 1053376 17772 p0 IL+ 8:32PM 0:06.51 ./boot/kernel -m 64m -r rootimg.01 -I auto:bridge0
93 # gdb kernel 25408
94 GNU gdb 6.7.1
95 [...]
96
97Let's get a trace from inside gdb:
98
99 (gdb) bt
100 #0 0x282d4c10 in sigsuspend () from /usr/lib/libc.so.6
101 #1 0x28287eb2 in sigsuspend () from /usr/lib/libthread_xu.so.2
102 #2 0x0821530a in stopsig (nada=24, info=0x40407d2c, ctxp=0x40407a4c) at /usr/src/sys/platform/vkernel/i386/exception.c:112
103 #3 <signal handler called>
104 #4 0x282d4690 in umtx_sleep () from /usr/lib/libc.so.6
105 #5 0x08213bde in cpu_idle () at /usr/src/sys/platform/vkernel/i386/cpu_regs.c:722
106 #6 0x00000000 in ?? ()
107 (gdb)
108
109Why does it differ from the ddb's trace ?
110Well, when the vkernel is sitting at a db> prompt all vkernel threads representing virtual cpu's except the one handling the db> prompt itself will be suspended in stopsig(). The backtrace only sees one of the N threads.
111
112We need to do better this time. Let's break into the kernel _before_ it crashes. sys_ktrace() seems like a good candidate.
113
114 # gdb kernel 25532
115 GNU gdb 6.7.1
116 [...]
117 (gdb) break sys_ktrace
118 Breakpoint 1 at 0x80acf43: file ./machine/thread.h, line 83.
119 (gdb)
120
41b49ce5 121Next we type 'c' in the gdb prompt to resume vkernel execution:
49413a25
B
122
123 (gdb) c
124 Continuing.
125
126Now we go to our vkernel and type the offending command:
127
128 # ktrace -c
129
130Gdb stops the execution of vkernel and a message pops up in gdb buffer:
131
132 Breakpoint 1, sys_ktrace (uap=0x573e2ca0) at ./machine/thread.h:83
133 83 __asm ("movl %%fs:globaldata,%0" : "=r" (gd) : "m"(__mycpu__dummy));
134 (gdb)
135
41b49ce5 136We navigate through source code with the 'step' and 'next' gdb commands. They are identical, except that 'step' follows function calls. When we meet this call:
49413a25 137
41b49ce5
B
138 276 allproc_scan(ktrace_clear_callback, &info);
139
140we 'step' inside it. alloproc_scan() iterates through the process list and applies the ktrace_clear_callback() to each one of them.
141Later we see this:
142
143 347 if (p->p_tracenode->kn_vp == info->tracenode->kn_vp) {
144
145Here p is a pointer to the current process:
264538dc 146
41b49ce5
B
147 (gdb) print p
148 $1 = (struct proc *) 0x57098c00
149
150Let's see if this process is traced:
264538dc 151
41b49ce5
B
152 (gdb) print p->p_tracenode
153 $2 = (struct ktrace_node *) 0x0
154 (gdb)
155
264538dc 156Oops. There is no trace to a vnode for this process. The code will try to access p->p_tracenode and crash. This is the zero virtual address we saw before.