Project

General

Profile

OpenWrt KGDB

As shown in the Kernel debugging with_qemu's GDB server documentation, it is easy to debug the Linux kernel in an emulated system. But some problems might only be reproducible on actual hardware (connected to the emulation setup). It is therefore sometimes necessary to debug a whole system.

In best case, the system can be debugged using JTAG. But this is often not possible and an in-kernel gdb remote stub like KGDB has to be used. The only requirement it has on the actual board is a simple serial console with poll_{get,put}_char() support.

Preparing OpenWrt

Turning off watchdog

Most CPUs have some kind of watchdog integrated. They can often be turned off and are often inactive when the watchdog driver is not loaded. For example, ath79 can be build without the internal watchdog support by changing in target/linux/ath79/config-*:

-CONFIG_ATH79_WDT=y
+# CONFIG_ATH79_WDT is not set

Unfortunately, there are also external watchdog chips which cannot be turned off. They have to be manually triggered regularly during the debugging process to prevent a sudden reboot. The details depend on the actual hardware but it often ends up in writing to a specific (GPIO control/set/clear) register. An example how to manually trigger an GPIO connected watchdog manually can be found in GDB Linux snippets

It is also possible to stop the watchdog service at runtime without disabling the driver. This should work for many optional watchdogs in SoCs:

ubus call system watchdog '{"magicclose":true}'
ubus call system watchdog '{"stop":true}'

Enabling KGDB in kernel

OpenWrt must be modified slightly to expose the kernel gdbstub (CONFIG_KERNEL_KGDB):

From: Sven Eckelmann <sven@narfation.org>
Date: Thu, 13 Oct 2022 16:40:21 +0200
Subject: openwrt: Add support for easily selectable kernel debugger support

When enabling this KERNEL_KGDB, make sure to clean some packages to make
sure that they are compiled with the correct settings:

  make toolchain/gdb/clean
  make toolchain/gdb/compile -j$(nproc || echo 1)
  make target/linux/clean
  make -j$(nproc || echo 1)

The session can (after initializing agent-proxy) on serial via:

  ubus call system watchdog '{"magicclose":true}'
  ubus call system watchdog '{"stop":true}'

  echo ttyMSM0,115200 > /sys/module/kgdboc/parameters/kgdboc
  echo g > /proc/sysrq-trigger

The rest has then to be done with the gdb(-remote) instance on the host.
But the system must not be stopped too long because the external (GPIO)
will otherwise kill the system.

Important here, is that OpenWrt 22.03 on some targets doesn't provide
the correctly mapped vmlinux in the source directory, So it is necessary to
run it like this:

  $ cd "{LINUX_DIR}" 
  $ cp -r vmlinux-gdb.py vmlinux.debug-gdb.py
  $ cp ../vmlinux.debug vmlinux.debug
  $ "${GDB}" -iex "set auto-load safe-path `pwd`/scripts/gdb/" -iex "target remote localhost:5551" vmlinux.debug
  (gdb) lx-symbols ..

Signed-off-by: Sven Eckelmann <sven@narfation.org>

diff --git a/config/Config-kernel.in b/config/Config-kernel.in
index 21a56e864098b8f652f06e319ce795a9456d5dcb..d0bc5e5d8b45cf6a0c63d86f5a2140980605373b 100644
--- a/config/Config-kernel.in
+++ b/config/Config-kernel.in
@@ -11,6 +11,43 @@ config KERNEL_IPQ_MEM_PROFILE
       This option select memory profile to be used,which defines
       the reserved memory configuration used in device tree.

+config KERNEL_VT
+    bool
+
+config KERNEL_GDB_SCRIPTS
+    bool
+
+config KERNEL_HW_CONSOLE
+    bool
+
+config KERNEL_CONSOLE_POLL
+    bool
+
+config KERNEL_MAGIC_SYSRQ
+    bool
+
+config KERNEL_MAGIC_SYSRQ_SERIAL
+    bool
+
+config KERNEL_KGDB_SERIAL_CONSOLE
+    bool
+
+config KERNEL_KGDB_HONOUR_BLOCKLIST
+    bool
+
+config KERNEL_KGDB
+    select KERNEL_VT
+    select KERNEL_GDB_SCRIPTS
+    select KERNEL_HW_CONSOLE
+    select KERNEL_CONSOLE_POLL
+    select KERNEL_MAGIC_SYSRQ
+    select KERNEL_MAGIC_SYSRQ_SERIAL
+    select KERNEL_KGDB_SERIAL_CONSOLE
+    select KERNEL_KGDB_HONOUR_BLOCKLIST
+    select GDB_PYTHON
+    bool "Enable kernel debugger over serial" 
+
+
 config KERNEL_BUILD_USER
     string "Custom Kernel Build User Name" 
     default "builder" if BUILDBOT
diff --git a/include/kernel-build.mk b/include/kernel-build.mk
index 80da4455bc04fccd1c7834fe8b94c29399289bd2..4cbb8a861ed01a48d65a28f2d0b6e34837284cf2 100644
--- a/include/kernel-build.mk
+++ b/include/kernel-build.mk
@@ -143,6 +143,7 @@ define BuildKernel
   $(LINUX_DIR)/.image: $(STAMP_CONFIGURED) $(if $(CONFIG_STRIP_KERNEL_EXPORTS),$(KERNEL_BUILD_DIR)/symtab.h) FORCE
     $(Kernel/CompileImage)
     $(Kernel/CollectDebug)
+    +[ -z "$(CONFIG_KERNEL_GDB_SCRIPTS)" ] || $(KERNEL_MAKE) scripts_gdb
     touch $$@

   mostlyclean: FORCE
diff --git a/target/linux/generic/config-5.10 b/target/linux/generic/config-5.10
index 4a6efc88012580691b52493685992a2af7fa1c65..00238982863b98c2f340c7e6a76a19652c26d2c1 100644
--- a/target/linux/generic/config-5.10
+++ b/target/linux/generic/config-5.10
@@ -7184,3 +7184,12 @@ CONFIG_ZONE_DMA=y
 # CONFIG_ZRAM_MEMORY_TRACKING is not set
 # CONFIG_ZSMALLOC is not set
 # CONFIG_ZX_TDM is not set
+
+# KGDB specific "disabled" options
+# CONFIG_CONSOLE_TRANSLATIONS is not set
+# CONFIG_VT_CONSOLE is not set
+# CONFIG_VT_HW_CONSOLE_BINDING is not set
+# CONFIG_SERIAL_KGDB_NMI is not set
+# CONFIG_KGDB_TESTS is not set
+# CONFIG_KGDB_KDB is not set
+# CONFIG_KGDB_LOW_LEVEL_TRAP is not set

Start debugging session

Turning off kASLR

The kernel address space layout randomization complicates the resolving of addresses of symbols. It is highly recommended to start the kernel with the parameter "nokaslr". For example by adding it to CONFIG_CMDLINE or by adjusting the bootargs in the bootloader. It should be checked in /proc/cmdline whether it was really booted with this parameter.

Configure KGDB serial

The kgdb needs a serial device to work. This has to be set in the module parameter. We assume now that the serial console on our device is ttyS0 with baudrate 115200:

echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc

Switch to kgdb

The gdb frontend cannot directly talk to the kernel over serial and create breakpoints. The sysrq mechanism has to be used to switch from Linux to kgdb before gdb can be used. Under OpenWrt, this can be done using

echo g > /proc/sysrq-trigger

Connecting gdb

I would use following folder in my x86-64 build environment but they will be different for other architectures or OpenWrt versions:

  • LINUX_DIR=${OPENWRT_DIR}/build_dir/target-x86_64_musl/linux-x86_64/linux-5.10.146/
  • GDB=${OPENWRT_DIR}/staging_dir/staging_dir/toolchain-x86_64_gcc-11.2.0_musl/bin/x86_64-openwrt-linux-gdb
  • BATADV_DIR=${OPENWRT_DIR}/build_dir/target-x86_64_musl/linux-x86_64/batman-adv-2022.0/

When kgdb is activated using sysrq, we can configure gdb. It has to connect via a serial adapter to the target device. We must change to the LINUX_DIR first and can then start our target specific GDB with our uncompressed kernel image before we will connect to the remote device.

cd "${LINUX_DIR}" 
cp ../vmlinux.debug vmlinux
"${GDB}" -iex "set auto-load safe-path scripts/gdb/" -iex "set serial baud 115200" -iex "target remote /dev/ttyUSB0" ./vmlinux

In this example, we are using an USB TTL converter (/dev/ttyUSB0). It has to be configured in gdb

lx-symbols ..

continue

You should make sure that it doesn't load any *.ko files from ipkg-* directories. These files are stripped and doesn't contain the necessary symbol information. When necessary, just delete these folders or specify the folders with the unstripped kernel modules:

lx-symbols ../batman-adv-2022.0/.pkgdir/ ../backports-5.15.58-1/.pkgdir/ ../button-hotplug/.pkgdir/

The rest of the process works similar to debugging using gdbserver. Just set some additional breakpoints and let the kernel run again. kgdb will then inform gdb whenever a breakpoints was hit. Just keep in mind that it is not possible to interrupt the kernel from gdb (without a Oops or an already existing breakpoint) - use the sysrq mechanism again from Linux to switch back to kgdb.

Some other ideas are documented in GDB Linux_snippets.

The kernel hacking debian image page should also be checked to increase the chance of getting debugable modules which didn't had all information optimized away. The relevant flags could be set directly in the routing feed like this:

diff --git a/batman-adv/Makefile b/batman-adv/Makefile
index 967965e..0abd42f 100644
--- a/batman-adv/Makefile
+++ b/batman-adv/Makefile
@@ -17,6 +17,9 @@ PKG_LICENSE_FILES:=LICENSES/preferred/GPL-2.0 LICENSES/preferred/MIT

 STAMP_CONFIGURED_DEPENDS := $(STAGING_DIR)/usr/include/mac80211-backport/backport/autoconf.h

+RSTRIP:=:
+STRIP:=:
+
 include $(INCLUDE_DIR)/kernel.mk
 include $(INCLUDE_DIR)/package.mk

@@ -77,7 +80,7 @@ define Build/Compile
         $(KERNEL_MAKE_FLAGS) \
         M="$(PKG_BUILD_DIR)/net/batman-adv" \
         $(PKG_EXTRA_KCONFIG) \
-        EXTRA_CFLAGS="$(PKG_EXTRA_CFLAGS)" \
+        EXTRA_CFLAGS="$(PKG_EXTRA_CFLAGS) -fno-inline -Og -fno-optimize-sibling-calls" \
         NOSTDINC_FLAGS="$(NOSTDINC_FLAGS)" \
         modules
 endef

Agent-Proxy

Instead of switching all the time between gdb and the terminal emulator (via UART/TTL), it can be rather helpful to use a splitter which can multiplex the kgdb and the normal terminal. So instead of using screen/minicom/... + gdb against the tty device, the different sessions are just started against a TCP port.

Installation

$ git clone https://git.kernel.org/pub/scm/utils/kernel/kgdb/agent-proxy.git/
$ make -C agent-proxy

Starting up session

$ ./agent-proxy/agent-proxy '127.0.0.1:5550^127.0.0.1:5551' 0 /dev/ttyUSB0,115200

To connect to the terminal session, a simple telnet or telnet-like tool is enough:

$ screen //telnet localhost 5550

The setup of the kgdboc must happen exactly as described before. Including the switch to the debugging mode via sysrq.

The gdb has to be attached like to a remote gdb session

$ cd "${LINUX_DIR}" 
$ "${GDB}" -iex "set auto-load safe-path scripts/gdb/" -iex "target remote localhost:5551" ./vmlinux

Enable KGDB on panic

Usually, a debugger catches problems like segfaults and allows a user to debug the problem further. On modern setups with kgdb, this is not the case because the system will automatically reboot after n-seconds.

This can be avoided by changing the sysctl config kernel.panic to 0. Either in /etc/sysctl.d/ or by manually issuing

sysctl -w kernel.panic=0

If a kgdb(oc) is attached then it should automatically receive a message when the Oops was noticed and can then be debugged further.