←back to thread

395 points wrycoder | 1 comments | | HN request time: 0.21s | source
Show context
synergy20 ◴[] No.41084355[source]
qemu is a good way to experience with kernel hacking

Hopefully someone can update the LDD(linux device driver) and Linux kernel books. In fact Linux Foundation should sponsor such efforts since technical book like this is hard to make any profit.

replies(5): >>41084982 #>>41085046 #>>41085320 #>>41085893 #>>41087046 #
1. iam-TJ ◴[] No.41087046[source]
I use qemu extensively especially for early-stage kernel debugging when no console is available; one such was just this week with v6.8 where, on arm64, any kernel command-line parameter >= 146 characters hangs the kernel instantly and silently.

Here's how I used qemu + gdb (on Debian 12 Bookworm amd64 host) to emulate and execute the arm64 kernel build to single-step the problematic code to identify the cause.

1. In a prepared kernel build system (i.e; all build dependencies and cross-compile tools installed) build the kernel image. I do this in an unprivileged systemd-nspawn amd64 container to avoid messy -dev package installs on the host. Nspawn bind-mounts the host's source-code tree which includes a separate build directory:

  cd "${SRC_DIR}"
  # copy/install/configure a suitable ${BUILD_DIR}/.config; review/edit with:
  make V=1 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=${BUILD_DIR} -j 4 menuconfig
  # build the kernel
  export KBUILD_BUILD_USER=linux; export KBUILD_BUILD_HOST=iam.tj; time make V=1 LOCALVERSION="" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=${BUILD_DIR} -j 12 Image
  # build gdb helper (Python) scripts 
  export KBUILD_BUILD_USER=linux; export KBUILD_BUILD_HOST=iam.tj; time make V=1 LOCALVERSION="" ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=${BUILD_DIR} scripts_gdb
This will create the debug symbols needed by gdb in ${BUILD_DIR}/vmlinux and the executable kernel in ${BUILD_DIR}/arch/arm64/boot/Image

2. Install "gdb" (and if doing foreign architecture debugging "gdb-multiarch") on the host as well as "qemu-system-arm"

3. Execute the kernel but -S[uspend] it and have QEMU listen for a connection from gdb:

  qemu-system-aarch64 -machine virt,gic-version=3 -cpu max,pauth-impdef=on -smp 2 -m 4096 -nographic -kernel ${BUILD_DIR}/arch/arm64/boot/Image -append "debug $( for l in {144..157}; do echo -n param$l=$(pwgen $((l-9)) 1)' '; done )" -initrd rootfs/boot/initrd.img-6.8.12-arm64-debug -S -gdb tcp::1234
The -append and -initrd shown here are optional; in my case no -initrd is actually needed since the (silent) panic occurs in the first few instructions the kernel executes. If debugging loadable modules however they would be in the initrd and loaded in the usual way. If the problem being diagnosed occurs after the root file-system and userspace proper are active then one would need to add the appropriate qemu options for the emulated storage device where the root file-system lives.

4. In another terminal shell (I use "tmux" and create a new tmux window) start the debugger:

  cd ${BUILD_DIR}
  # this cd is important - gdb needs to be in the base of the BUILD directory
  gdb-multiarch ./vmlinux
5. In the gdb shell:

  target remote :1234
  break __parse_cmdline
  continue
At this point the usual gdb functionality is available to examine memory, variables, single-step, view the stack and so on.

For more details on debugging kernel using gdb and the gdb scripts lx-* see

https://www.kernel.org/doc/html/latest/dev-tools/gdb-kernel-...

Edit: Forgot to note that for gdb to be able to use the lx-* Python scripts it usually needs the path authorising:

  echo "add-auto-load-safe-path ${SRC_DIR}/scripts/gdb/vmlinux-gdb.py" > ~/.gdbinit