Received: by volition.pa.dec.com; id AA24906; Mon, 30 Jul 90 10:48:29 PDT
Received: by jove.pa.dec.com; id AA23843; Mon, 30 Jul 90 10:47:19 -0700
Received: by gildor.dco.dec.com (5.57/ULTRIX-prepak-072790)
	id AA03142; Mon, 30 Jul 90 13:47:04 EDT
Message-Id: <9007301747.AA03142@gildor.dco.dec.com>
From: Frederick M. Avolio <avolio@gildor.dco.dec.com>
Organization: Digital Equipment Corp., Washington ULTRIX Resource Center
Phone: +1 301 306 2247, DTN: 341-2247
To: vixie@wrl.dec.com
Subject: Kernel debugging on ULTRIX/RISC
Date: Mon, 30 Jul 90 13:47:02 EDT
Sender: avolio@gildor.dco.dec.com

Paul:

Could this be put out on gatekeeper and then "announced" in comp.unix.ultrix?
Al Delorey says it is okay to distribute outside.

Thanks

Fred

------- Forwarded Message


 
 
 
 
 
			Kernel debugging on ULTRIX/RISC
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
			     Copyright (c) 1989 by
 
		  Digital Equipment Corporation, Maynard, MA

							    RISC debug - afd
				Address Space
 
    The system is always in virtual address mode (no physical address mode)
 
    Address spaces
 
	KSEG0	- not mapped, cached; for kernel text
		  virtual: 8000 0000 -> 9fff ffff (512 MB)
 
	KSEG1	- not mapped, not cached; for I/O space
		  virtual: a000 0000 -> bfff ffff (512 MB)
 
	KSEG2	- mapped, cached; for stacks, kernel mallocs
		  virtual: c000 0000 -> ffff ffff (1 GB)
 
	KUSEG	- mapped, cached; for user space
		  virtual: 0 -> 7fff ffff (2 GB)
 
 
 
				    Stacks
    
    No interrupt stack, only Kernel and User stacks [idle stack added in 4.0]
    
    Startup Stack 	- at 0x8001f7ff (0x8002ffff in 3.0/RISC) -
		          grows downward and is used during system startup,
		          until a kernel stack is available.
 
    Kernel Stack	- starts at 0xffffe000 (kseg2 space)
 
    User Struct		- starts at 0xffffc000 (kseg2 space)
 
    Per cpu data base	- starts at 0xffff8000 (kseg2 space) [as of 4.0 - smp]
 
    User Stack		- starts at 0x7ffff000 (kuseg space)
			  (one guard page 0x7ffff000 to 7fffffff)

							    RISC debug - afd
 
			    Using nm
 
    For a system crash that gives an Exception PC (EPC) on the console,
    you can use nm(1) to determine what routine was executing.
 
	    nm -n /vmunix
 
    This command will display the name list (symbol table), in numerical
    order, of the vmunix image.  Find the address that is closest to (but
    less than) the given EPC from the crash.  That address is the starting
    address of the routine that was executing.
 
    You can then subtract the start address of the routine from the
    faulting PC, to determine the offset from the beginning of the
    routine where the error occured.  Then using dbx the offending
    instruction can be found.
 
    Sample nm output
    ----------------
    First Kernel text address: 8003,0000 (192k bytes above 8000,0000)
	    80030000 T start
	    80030000 T eprol
	    800300ac T putstr
	    80030148 T lputc
	    8003018c T cn_reset
 
	    ...
 
    First Kernel data address: is approximately 8011,0000 
	    80112030 D Sysmap
	    8011c830 D Usrptmap
	    8011f920 D camap
	    8011f930 D kmempt
	    8011f930 D ecamap
	    80123930 D Forkmap

							    RISC debug - afd
Overall system memory map (in sys area see mips/entrypt.h)
 
physical	kseg1			use
- --------	-----			---
0x00030000	0xa0030000 upward	Ultrix kernel text, data, and bss
 
0x0002ffff	0xa002ffff 
to		 			additional Prom Space (64K)
0x00020000	0xa0020000
 
0x0001ffff	0xa001ffff 		
to					1K netblock (host/client net boot info)
0x0001fc00	0xa001fc00 	
 
0x0001fbff	0xa001fbff 		
to					1K Ultrix Save State area (NEW in 3.1)
0x0001f800	0xa001f800
 
0x0001f7ff	0xa001f7ff downward	1K Ultrix temporary startup stack
		|			(at 0x2ffff in 3.0; here in 3.1)
		v
0x0001f400	0xa001f400 
 
0x0001f3ff	0xa001f3ff downward	dbgmon stack (a few K less than 64K)
		|
		V
		^
		|
0x00010000	0xa0010000 upward	dbgmon text, data, and bss
 
0x0000ffff	0xa000ffff downward	prom monitor stack
		|
		V
		^
		|
0x00000500	0xa0000500 upward	prom monitor bss
 
0x000004ff	0xa00004ff
to					restart block
0x00000400	0xa0000400
 
0x000003ff	0xa00003ff
to					general exception code
0x00000080	0xa0000080		(note cpu addresses as 0x80000080)
 
0x0000007f	0xa000007f
to					utlbmiss exception code
0x00000000	0xa0000000		(note cpu addresses as 0x80000000)
 

							    RISC debug - afd
			Some Usefull dbx Commands
 
Command (alias)
- ---------------
alias [name[(args)]cmd]	show all aliases or define an alias
assign (a) var=value	assign a value to a program variable
stop at (b)		set a breakpoint at a given line
cont (c)		continue after breakpint
delete (d)		delete the given item from the status list
down			move down an activation level in the stack
dump			print variable info for current routine
dump .			print global variable info for all routines
file			what is the current file
file filename		set the current file to 'filename'
func (f)		set context to specified func name (selects the file)
history (h)		print history list
status (j)		show status list, shows breakpoints (journal)
list (l)		list the next 10 lines of source code
l line:range		list 'range' lines of code, starting at 'line'
next (n or S)		step specified # of lines (don't stop in calls)
nexti (ni)		step specified # of assembly lines (don't stop in calls)
print (p)		print the value of the specified expr or var
pd 			print the value of the specified expr or var in decimal
po 			print the value of the specified expr or var in octal
px 			print the value of the specified expr or var in hex
			note: Can't print register variables.
pr			print values of all registers
quit (q)		exit dbx
run [args]		run the program with specified cmd line args
rerun (r)		rerun the program with same arguments
return			finish executing the function and stop back in caller
set			show setting of dbx variables
set $var=value		set a value to a dbx var (can define a new variable)
step (s)		step specified # of lines (stopping in calls)
stepi (si)		step specified # of assembly lines (stopping in calls)
stop at (b)		set a breakpoint at a given line
stopi at <addr>		set a breakpoint at a given assembly instruction addr
u			list the previous 10 lines
unset $var		unset a dbx variable
up			move up an activation level in the stack
w			list 5 lines before and after current line
W			list 10 lines before and after current line
where (t)		where are we & how did we get here (stack trace)
			this can also be done when stopped at a breakpoint
			(this will show where/how a system crashed)
whatis <var>		show the variable/symbol definition
whereis <var>		show all versions of the specified variable
which <var>		print the default (current) version of the variable
/<regex>		search ahead in the source code for the regular expr
?<regex>		search back in the source code for the regular expr
!<history-item>		specify a cmd from the history list (by number or str)
line edit		there is an emacs like line edit capability.  To enable
			it you must set the shell environment variable LINEEDIT
			(setenv LINEEDIT)
    ^A				Move cursor to start of line
    ^B				Move cursor back one char
    ^D				Delete char at the cursor
    ^E				Move cursor to end of line
    ^F				Move cursor ahead one char
    ^H,delete			Delete char preceding cursor
    ^N				Move ahead one cmd line in hist list
    ^P				Move back one cmd line in hist list

							    RISC debug - afd
 
signals
- -------
catch			list signals that dbx will catch
catch SIGNAL		add a signal to the catch list
ignore			list signals that dbx will ignore (not catch)
ignore SIGNAL		add a signal to the ignore list
 
History
- -------
Ultrix/RISC dbx is (based on) MIPS dbx.
    Both Ultrix/VAX dbx and MIPS dbx are based on BSD dbx.
    Mips did a lot of work on dbx: enhanced it to work on the kernel.
 
There is no adb for Ultrix/RISC

							    RISC debug - afd
			Using dbx to debug the kernel
 
dbx -k vmunix.n vmcore.n
 
t			get a back trace (to show where/how the system crashed)
 
routine-name/ni		dump out n instructions from given routine
 
&<symbol>/<fmt>		print address and contents of a symbol
 
<address>/<cnt><mode>	print the contents of image at the given address
			valid modes are:
			  d,D	short, long decimal
			  o,O	short, long octal
			  x,X	short, long hex
			  c	a byte as a char
			  s	a null-terminated string
			  f	single precision real
			  g	double precision real
			  i	machine instructions
  example:
    If the system crashes and reports an EPC of 0x8000dead, then dbx can be
    used to determine where in the kernel that PC is located.
 
    0x8000dead/9i	decode 9 instructions (& show line #s) @ 0x8000dead
			Beware: that code that's ifdef'ed out will not count
				in dbx's line numbering
 
p gnode[n]              print the gnode struct n in the gnode table
 
p text[n]               print the text struct n in the text table
 
set $pid=n		set processs context to given pid
			can then do trace, p *up, p *up.u_procp, etc. on proc
 
p *up			print the u_area (of current proc)
 
p *up.u_procp		print the proc struct of the current pid ("$pid")
 
 
Using dbx on running vmunix
- ---------------------------
dbx -k /vmunix
 
&<symbol>/<fmt>		print address and contents of symbol
 
a <symbol>=<value>	to change the value of a symbol (must be root)

							    RISC debug - afd
			Examining the Exception Frame
 
All error traps & interrupts (except cache parity errors) generate
an "exception condition".
 
Exception conditions trap to VECTOR(exception) in locore.s.
Exception routine saves state in the exception frame (on stack).
 
For interrupts, VECTOR(VEC_int) is called, which saves additional
state on the exception frame, & calls intr() (in trap.c).
intr() calls the specific interrupt handler thru "c0vec_tbl".
 
For traps, the individual trap routine is called thru the "causevec",
these routines (VEC_addrerr, VEC_ibe, VEC_dbe) in
turn call VECTOR(VEC_trap), which saves additional state on the
exception frame, and calls trap() (in trap.c).
 
A pointer to the exception frame (ep) is passed as an argument
to the following routines:
 
	trap, intr, tlbmod, tlbmiss, syscall
 
Thus by using dbx to get a trace, you can find the address of the
exception frame (the ep argument).  You can then dump out the exception
frame with a dbx command like:
 
	dbx> 0xffffnnnn/41X
 
						(cont on next page)

							    RISC debug - afd
		    Examining the Exception Frame (cont)
 
The offsets within the exception frame are defined as follows (see mips/reg.h):
 
#define	EF_ARGSAVE0	0		/* arg save for c calling seq */
#define	EF_ARGSAVE1	1		/* arg save for c calling seq */
#define	EF_ARGSAVE2	2		/* arg save for c calling seq */
#define	EF_ARGSAVE3	3		/* arg save for c calling seq */
#define	EF_AT		4		/* r1:  assembler temporary */
#define	EF_V0		5		/* r2:  return value 0 */
#define	EF_V1		6		/* r3:  return value 1 */
#define	EF_A0		7		/* r4:  argument 0 */
#define	EF_A1		8		/* r5:  argument 1 */
#define	EF_A2		9		/* r6:  argument 2 */
#define	EF_A3		10		/* r7:  argument 3 */
#define	EF_T0		11		/* r8:  caller saved 0 */
#define	EF_T1		12		/* r9:  caller saved 1 */
#define	EF_T2		13		/* r10: caller saved 2 */
#define	EF_T3		14		/* r11: caller saved 3 */
#define	EF_T4		15		/* r12: caller saved 4 */
#define	EF_T5		16		/* r13: caller saved 5 */
#define	EF_T6		17		/* r14: caller saved 6 */
#define	EF_T7		18		/* r15: caller saved 7 */
#define	EF_S0		19		/* r16: callee saved 0 */
#define	EF_S1		20		/* r17: callee saved 1 */
#define	EF_S2		21		/* r18: callee saved 2 */
#define	EF_S3		22		/* r19: callee saved 3 */
#define	EF_S4		23		/* r20: callee saved 4 */
#define	EF_S5		24		/* r21: callee saved 5 */
#define	EF_S6		25		/* r22: callee saved 6 */
#define	EF_S7		26		/* r23: callee saved 7 */
#define	EF_T8		27		/* r24: code generator 0 */
#define	EF_T9		28		/* r25: code generator 1 */
#define	EF_K0		29		/* r26: kernel temporary 0 */
#define	EF_K1		30		/* r27: kernel temporary 1 */
#define	EF_GP		31		/* r28: global pointer */
#define	EF_SP		32		/* r29: stack pointer */
#define	EF_S8		33		/* r30: callee saved 8 */
#define	EF_RA		34		/* r31: return address */
#define	EF_SR		35		/* status register */
#define	EF_MDLO		36		/* low mult result */
#define	EF_MDHI		37		/* high mult result */
#define	EF_BADVADDR	38		/* bad virtual address */
#define	EF_CAUSE	39		/* cause register */
#define	EF_EPC		40		/* program counter */

							    RISC debug - afd
		    Examining any Process in the System
 
    ps -axlk vmunix.n vmcore.n	- Flags (see ps(1))
				  -a All processes (not just your own)
				  -x Even processes w/ no tty
				  -l Long format (more info given)
				  -k Kernel files given
				- get <pid> of process that you want to look at
 
    Back in dbx...
    set $pid=n		set processs context to given pid (in dbx)
 
    Can then do t (trace), p *up, p *up.u_procp, etc. on the process
 
    The process' stored registers in the u_area are in "exception frame format"
    and can be obtained as follows:
 
    	px up.u_ar0[n]

							    RISC debug - afd
			Forcing a Panic (not hung)
 
    dbx -k /vmunix /dev/mem	- run as root to write
 
    a ln_softc=0
    				- will panic on next network interrupt
				  (even works in single user mode)
				  (Don't do this if system is diskless or
				   it won't dump)
 
    a gnodeops=0
				- will also panic the system
 
    Note: don't bash the proc struct, or then dbx can't work on the image
    Note: don't bash the console structs or you won't see the panic messages

							    RISC debug - afd
	    Forcing a Memory Dump on DS3100/2100 when hung
 
    If you set the bootmode to 'r' (restart), then when the restart
    button (on a DS3100) is pressed, the system will do a memory dump,
    and then a reboot, as opposed to halting and clearing memory.
 
    Note that the dump may be silent, so be patient.
 
    To set the bootmode to restart use the console command:
    
	>>> setenv bootmode r
 
 
 
    If the system "hangs" or drops into console mode without doing
    a memory dump, the memory dump routine can be started manually.
 
    If a DS3100 "hangs", you can press the reset button to enter console
    mode.  The default action on the DS3100 is for the reset to re-initialize
    memory.  To prevent this (preserve memory), set the bootmode to debug
    by typing the following command in console mode (prior to debugging a
    "hang" situation):
    
	>>> setenv bootmode d
 
    With bootmode set this way, if the system is "hung" you can press the
    reset button to enter console mode (with memory contents preserved).
    The crash dump code can then be run by typing the "go" command with
    a special address (the kernel start address + 8) that will call the
    memory dump routine.  In Ultrix V3.0/3.1 the kernel start address is
    0x80030000, so the dump routine is started by:
    
	>>> go 0x80030008
 
    If the system was in multi-user mode when the reset button was
    pressed, then the dump will occur silently, no messages will be
    printed.  The memory dump will take several minutes, then the
    console prompt will re-appear.  After the dump is completed, you
    can re-initialize the system and reboot as follows:
 
	>>> init
	>>> auto
    
    Note: When bootmode is set to 'd' it is important to type "init" before
	  "boot" or "auto" when the system has been shutdown to console mode,
	  or reset to console mode.  Failure to use the init command may
	  cause the system boot to fail.

							    RISC debug - afd
			Forcing a Memory Dump on DS5400
 
    Set the break enable switch up (the dot in the circle).
 
    Press the break key to get the console prompt.
    
    The crash dump code can then be run by typing the "go" command with
    a special address that will call the memory dump routine:
    
	>>> go 0x80030008

							    RISC debug - afd
			Forcing a Memory Dump on DS5800
 
    Set the break enable switch up (the dot in the circle).
 
    Press the break key to get the console prompt.
    
    The crash dump code can then be run by typing the "go" command with
    a special address that will call the memory dump routine:
    
	>>> go 0x80030008

							    RISC debug - afd
		      Debugging "hung" systems
 
		    (Finding the real kernel stack)
 
When you force a dump from a "hung" system, the standard back trace done
by dbx will not be useful for the currently active process.
Dbx will get the process context out of the u_area, which is old.
That is, the u_area will have the process context for the last time
that the process was context switched out.
 
The kernel stack for each process in the system is located at virtual
address 0xffff,e000 in KSEG2 space.  The system has an array of NPROC
u_areas that are 8k bytes each.  Even though each user process has its
u_area and kernel stack at the same virtual address in KSEG2 address
space, each uarea/kstack maps to a unique physical address.
 
On context switches the first 2 entries in the TLB ("safe entries")
are set up to map the u_area and kernel stack for that user process.
 
 
Kernel Stack: 0xffff,e000	+-------+ higher addresses
				|   .	|
				|   .	|
				|   .	|	8 K bytes for kernel stack
				|   v	|	    and u area
				|	|	In Kseg2 space (see param.h)
				|_______|
				|   ^	|
				|   |	|
				|   |	|
U area: 0xffff,c000		+-------+ lower addresses
 
Within dbx, you can dump out the kernel stack with a command such as:
 
	0xffffd000/1028X
 
This will dump the kernel stack from low to high memory (most recent events
to oldest events).

							    RISC debug - afd
			Examining stack frames with dbx
 
odump(1)
    The utility odump(1) can be used to get a symbol table dump of vmunix.n
 
	odump -P vmunix.n > vmunix.syms
    
    See /usr/include/sym.h (struct runtime_pdr) for the format of the
    runtime procedure descriptor created by the loader.
 
    The "fpoff" field as shown by odump is the frame size for the particular
    procedure entry.
 
The general format of the stack (stack frames) is:
 
	high memory	+---------------+
			|    arg n	|	Space for all args, even though
			|      .	|	first 4 args passed in regs
			|      .	|
			|      .	|
	virtual frame	|    arg 1	|
	ptr ->	       /|---------------|\
		frame <	|  local vars	| \
		offset \|---------------|  \
			|saved R31 (ret)|   \
			|...............|    \
			|more saved regs|     > framesize
			|   16-23, 30	|    /
			|---------------|   /
			|  arg passing	|  /
			|     area 	| /
	stack ptr ->	|---------------|/
	(framereg)	|	.	|
			|	.	|
			|	.	|
			|	 	|
	low memory	+---------------+
 
Using this information, you should be able to work your way back up
the call history on the stack.
 
Examples of usage are in libexc: unwind.c, exception.c, exception.h

							    RISC debug - afd
		    More on Examining stack frames with dbx
 
You may find it equally productive to start at the top (high memory) end
of the kernel stack and look for the return address of VEC_syscall on the
stack.  This is where VEC_syscall calls "syscall" and the stack frame for
entry into "syscall" has the return address of VEC_syscall saved on the
stack. Using the following dbx command will show the instructions in
VEC_syscall, and in particular where "syscall" is called, thus you can
see the return address that will be on the stack.
 
    (dbx) VEC_syscall/30i
      [VEC_syscall, 0x800c3868]     	    ori     r5,r16,0x1
      [VEC_syscall:590, 0x800c386c]         mtc0    r5,sr
      [VEC_syscall:591, 0x800c3870]         sw      r2,20(sp)
      [VEC_syscall:592, 0x800c3874]         sw      r3,24(sp)
      [VEC_syscall:593, 0x800c3878]         move    r5,r2
      [VEC_syscall:594, 0x800c387c]         move    r6,r16
      [VEC_syscall:595, 0x800c3880]         jal     syscall
      [VEC_syscall:595, 0x800c3884]         nop
**>   [VEC_syscall:596, 0x800c3888]         bne     r2,r0,0x800c3810
      [VEC_syscall:596, 0x800c388c]         nop
 
The return address will be: 0x800c3888
 
Using dbx in this way and the dump of the kernel stack, you can pick and
guess your way down the stack to find where the system went.

							    RISC debug - afd
			Disassemble Utility
 
    dis -p routine image-file	- will disassemble a routine in the image file
				  see dis(1)

							    RISC debug - afd
			Some Usefull Console Commands
    
    Dump
	dump -w -x ADDR#CNT		- dump the contents of memory, starting
					  at given ADDR & dumping CNT locations
					  (long words in hex format)
 
	dump -w -x ADDR:ADDR		- dump the contents of memory, starting
					  and ending at given ADDRs
					  (long words in hex format)
 
	dump -w -x 0x8001f400:0x8001f800	- dump the startup stack
 
    Examine
	e [-(b|h|w)] ADDR		- examine byte, halfword, word;
					  ADDR is a virt addr; to examine
					  physical loc 0 use 0x80000000
	
    Go
	go [pc]				- transfer control to given entry point
    
    Help
	help [cmd]
	? [cmd]				- if no cmd given, display cmd menu
 
    Printenv
        printenv [evar]			- display current value of specified
					  environment variable
    Setenv
	setenv EVAR STRING		- set the specified environment
					  variable to the given string
					  environment variable
    Unsetenv
	unsetenv EVAR 			- remove the environment variable
					  from the environment variable table
    Test
	t a				- test all components and subsystems
    
    Booting
	auto				- use environment variable "bootpath"
					  to boot to multiuser: DS3100/2100 only
 
	boot				- use environment variable "bootpath"
					  boots to singleuser on DS3100/2100
					  boots to multiuser on other systems
 
	boot -s				- boot to single user (this cmd option
					  not on DS3100/2100)
 
	boot -f rz(CTRL,UNIT,PART)vmunix- boot the specified image to singleuser
	boot -f mop()			- boot from the network to singleuser
 
	boot ... memlimit=<#bytes of mem> - to artificially reduce memory size
 

							    RISC debug - afd
		    References For Further Info
 
    Header files:
	/sys/h/
	    proc.h
	    user.h
 
	/sys/machine/mips
	    entrypt.h
	    frame.h
	    pcb.h
	    pte.h
	    reg.h
 
    Crash(1M)
	Sys V "crash" program that is only partially converted to
	understand the ULTRIX kernel data structures.
    
    

------- End of Forwarded Message

