Commit 32a3a9db authored by Lukas Bulwahn's avatar Lukas Bulwahn Committed by Jonathan Corbet
Browse files

docs: admin-guide: for kernel bugs refer to other kernel documentation



The current section 'If something goes wrong' makes a number of suggestions
for debugging, bug hunting and reporting issues, which are quite briefly
described in that section.

However, the suggestions are also well covered in other kernel
documentation or sometimes simply outdated. Here, each suggestion in that
section is summarized, and then followed with its assessment, and the
derived action for each suggestion:

  - use MAINTAINERS and mailing list: covered in 'Reporting issues',
    summarized in the short guide, detailed in its further section.
    Reporting issues even provides some specific examples that guides
    readers well through the needed steps. Refer to 'Reporting issues'.

  - contact Linus Torvalds: probably outdated as currently described.
    nevertheless covered in 'Reporting issues'. Reporting issues points out
    to contact the relevant kernel maintainers first, and after some
    patience and failed attempts with those maintainers, contacting Linus
    Torvalds might be okay. Refer to 'Reporting issues'.

  - tell what kernel, how to duplicate, the setup, if the problem is new
    or old and when did you notice: covered in 'Reporting issues',
    especially in Step-by-step guide how to report issues to the kernel
    maintainers. Refer to 'Reporting issues'.

  - duplicate kernel bug reports exactly: covered in 'Reporting issues',
    especially in Write and send the report. Refer to 'Reporting issues'.

  - read 'Bug hunting': keep this reference. Refer to 'Bug hunting'.

  - compile the kernel with CONFIG_KALLSYMS: covered in 'Reporting issues',
    especially in Decode failure messages. Refer to 'Reporting issues'.

  - alternatively, use ksymoops: ksymoops at the mentioned URL seems not to
    be maintained anymore. It was released roughly once a year until
    version 2.4.11 in 2005, but has not seen a new release since then. The
    information in ./scripts/ksymoops/README is from 1999, and does not
    give more insight on its actual maintenance state either. Ksymoops is
    mentioned as system utility in changes.rst, but also not recommended
    there. Drop the explanation on using ksymoops.

  - alternatively, lookup dump manually with the EIP and nm to determine
    the function in which the kernel crashes: this method seems already a
    quite advanced and low-level debugging method. Even all the further
    references on bug hunting and debugging do not mention it. Drop this
    alternative method and limit mentioning methods explained in the other
    existing kernel documentation.

  - read 'Reporting issues': keep this reference.
    Refer to 'Reporting issues'.

  - use gdb for debugging: some specific details, e.g., edit
    arch/x86/Makefile, are probably outdated or limited to one (historic
    important) setup. Using gdb is covered in 'Bug hunting', 'Debugging
    kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
    debugger internals'. Refer to those three documents.

Overall, it is sufficient to refer to reporting-issues.rst,
bug-hunting.rst, gdb-kernel-debugging.rst and kgdb.rst and this way cover
the existing suggestions.

'Reporting issues' is quite new and probably up to date. 'Bug hunting',
'Debugging kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
debugger internals' might need some revisit and update, but they are
generally in an acceptable state for referring to them.

Replace the existing suggestions by reference to other existing kernel
documentation covering those suggestions---partly even nicely summarized
and then explained in greater detail.

Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220720041325.15693-3-lukas.bulwahn@gmail.com


Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
parent 3f10b508
Loading
Loading
Loading
Loading
+7 −82
Original line number Diff line number Diff line
@@ -330,85 +330,10 @@ Compiling the kernel
If something goes wrong
-----------------------

 - If you have problems that seem to be due to kernel bugs, please check
   the file MAINTAINERS to see if there is a particular person associated
   with the part of the kernel that you are having trouble with. If there
   isn't anyone listed there, then the second best thing is to mail
   them to me (torvalds@linux-foundation.org), and possibly to any other
   relevant mailing-list or to the newsgroup.

 - In all bug-reports, *please* tell what kernel you are talking about,
   how to duplicate the problem, and what your setup is (use your common
   sense).  If the problem is new, tell me so, and if the problem is
   old, please try to tell me when you first noticed it.

 - If the bug results in a message like::

     unable to handle kernel paging request at address C0000010
     Oops: 0002
     EIP:   0010:XXXXXXXX
     eax: xxxxxxxx   ebx: xxxxxxxx   ecx: xxxxxxxx   edx: xxxxxxxx
     esi: xxxxxxxx   edi: xxxxxxxx   ebp: xxxxxxxx
     ds: xxxx  es: xxxx  fs: xxxx  gs: xxxx
     Pid: xx, process nr: xx
     xx xx xx xx xx xx xx xx xx xx

   or similar kernel debugging information on your screen or in your
   system log, please duplicate it *exactly*.  The dump may look
   incomprehensible to you, but it does contain information that may
   help debugging the problem.  The text above the dump is also
   important: it tells something about why the kernel dumped code (in
   the above example, it's due to a bad kernel pointer). More information
   on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst

 - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
   as is, otherwise you will have to use the ``ksymoops`` program to make
   sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
   This utility can be downloaded from
   https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ .
   Alternatively, you can do the dump lookup by hand:

 - In debugging dumps like the above, it helps enormously if you can
   look up what the EIP value means.  The hex value as such doesn't help
   me or anybody else very much: it will depend on your particular
   kernel setup.  What you should do is take the hex value from the EIP
   line (ignore the ``0010:``), and look it up in the kernel namelist to
   see which kernel function contains the offending address.

   To find out the kernel function name, you'll need to find the system
   binary associated with the kernel that exhibited the symptom.  This is
   the file 'linux/vmlinux'.  To extract the namelist and match it against
   the EIP from the kernel crash, do::

     nm vmlinux | sort | less

   This will give you a list of kernel addresses sorted in ascending
   order, from which it is simple to find the function that contains the
   offending address.  Note that the address given by the kernel
   debugging messages will not necessarily match exactly with the
   function addresses (in fact, that is very unlikely), so you can't
   just 'grep' the list: the list will, however, give you the starting
   point of each kernel function, so by looking for the function that
   has a starting address lower than the one you are searching for but
   is followed by a function with a higher address you will find the one
   you want.  In fact, it may be a good idea to include a bit of
   "context" in your problem report, giving a few lines around the
   interesting one.

   If you for some reason cannot do the above (you have a pre-compiled
   kernel image or similar), telling me as much about your setup as
   possible will help.  Please read
   'Documentation/admin-guide/reporting-issues.rst' for details.

 - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
   cannot change values or set break points.) To do this, first compile the
   kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
   clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).

   After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
   You can now use all the usual gdb commands. The command to look up the
   point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
   with the EIP value.)

   gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
   disregards the starting offset for which the kernel is compiled.
If you have problems that seem to be due to kernel bugs, please follow the
instructions at 'Documentation/admin-guide/reporting-issues.rst'.

Hints on understanding kernel bug reports are in
'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel
with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and
'Documentation/dev-tools/kgdb.rst'.