
Received: from MIT.MIT.EDU by e40-po.MIT.EDU (5.61/4.7) id AA01194; Fri, 3 Dec 93 14:21:09 EST
Received: from OOBLECK.MIT.EDU by MIT.EDU with SMTP
	id AA11207; Fri, 3 Dec 93 14:21:00 EST
Received: by oobleck.mit.edu (5.65/DEC-Ultrix/4.3)
	id AA16092; Fri, 3 Dec 1993 14:20:59 -0500
Date: Fri, 3 Dec 1993 14:20:59 -0500
From: jh@oobleck.mit.edu (Joe Harrington)
Message-Id: <9312031920.AA16092@oobleck.mit.edu>
To: epeisach@MIT.EDU
In-Reply-To: epeisach@MIT.EDU's message of Fri, 3 Dec 93 13:56:22 -0500 <9312031856.AA02901@paris>
Subject: Make DECstations (and alpha's) core dump on unaligned access
Cc: jh@MIT.EDU
Reply-To: jh@MIT.EDU

I got the following from DEC explaining what to do and what the errors
mean.  Feel free to forward it to watchmakers if you feel they'd care.

- --jh--

From haley@iguana.alf.dec.com Fri Nov 20 13:24:31 1992
Date: Fri, 20 Nov 92 13:18:03 -0500
From: David Haley <haley@iguana.alf.dec.com>
To: jh@MIT.EDU
Subject: unaligned access errors


unaligned and fixed up access errors

The matrix below attempts to illustrate the various access violation
error messages and signals that a user process can receive under the
ULTRIX MIPS operating system.

There are three possible error messages that a user can receive when
a read from or write to an address is attempted.   These messages are:

    1)  Fixed up unaligned data access for pid ### (image) at pc 0x########
    2)  pid ### (image) was killed on an unaligned access, at pc 0x########
    3)  pid ### (image) was killed on a kernel access, at pc 0x########


            kernel address    valid user address     invalid user address
          +-----------------+--------------------+--------------------+
  aligned |A.   SIGBUS      |B.      ok          |C.    SIGSEGV       |
          | "kernel access" |                    |    no message      |
          +-----------------+--------------------+--------------------+
unaligned |D.   SIGBUS      |E.                  |F.    SIGBUS        |
          | "kernel access" |     "Fixed up"     | "unaligned access" |
          +-----------------+--------------------+--------------------+


DEFINITIONS:
    A "valid user address" implies a virtual address in the "kuseg" region
    which has been mapped by the process.  The "kuseg" region includes
    virtual addresses between 0x00000000 and 0x7ffffffc.
        note: 0x7ffffffc = 0x80000000 - sizeof(int) = 0x80000000 - 4

    A "invalid user address" implies virtual addresses in the "kuseg" region
    that has NOT been mapped by the process.

    A "kernel address" implies virtual addresses 0x7ffffffd to 0xffffffff.

    The term "aligned" implies an address that has an appropriate boundary
    for the data type associated with the address.  See the diagram below
    for the appropriate boundaries per data type.

            +--------+---------+
            |boundary|data type|
            +--------+---------+
            |   4    | int     |     note: boundary info taken from
            |   4    | long    |           "Table B-1, Appendix B,
            |   4    | short   |           Languages and Programming,
            |   2    | char    |           Volume I, Software Development"
            |   4    | float   |
            |   8    | double  |
            |   4    | pointer |
            +--------+---------+


MATRIX INTERPRETATION:
type A:
    If a read to or write from an aligned kernel address is attempted:
        a) the kernel will write to the user's screen the message:
               pid ### (image) was killed on a kernel access, at pc 0x########
        b) the kernel will send a SIGBUS (signal #10) to the process
        c) if this process can write to the current directory, a core is dumped

type B:
    If a read to or write from an aligned valid user address is attempted,
    there is no problem and the read/write will take place.


type C:
    If a read to or write from an aligned, but invalid, user address is
    attempted:
        a) the kernel will send a SIGSEGV (signal #11) to the process
        b) if this process can write to the current directory, a core is dumped


type D:
    If a read to or write from an unaligned kernel address is attempted:
        a) the kernel will write to the user's screen the message:
              pid ### (image) was killed on a kernel access, at pc 0x########
        b) the kernel will send a SIGBUS (signal #10) to the process
        c) if this process can write to the current directory, a core is dumped


type E:
    If a read to or write from an unaligned, but valid, user address is
    attempted:
        a) the kernel adjusts the address as appropriate and performs the
           read or write
        b) the kernel will write to the user's screen AND to the system
           error log the message:
             Fixed up unaligned data access for pid ### (image) at pc 0x########
           note: this message can be disabled via use of the uac(1) command
        c) the process continues to the next instruction

    NOTE:  A process which receives this error message should be debugged
           to correct the offending instruction(s) because the fixup routine
           in the kernel (fixade()) is expensive and the performance of the
           application will be significantly reduced.


type F:
    If a read to or write from an unaligned and invalid user address is
    attempted:
        a) the kernel will write to the user's screen the message:
             pid ### (image) was killed on an unaligned access, at pc 0x########
        b) the kernel will send a SIGBUS (signal #10) to the process
        c) if this process can write to the current directory, a core is dumped


DEBUGGING THE PROBLEM:
    To determine where in your source code this problem is originating:
        1) record the pc address from the error message
        2) run dbx(1) on your executable image:
               $ dbx image
        3) at the dbx(1) prompt, enter the recorded pc address followed
           by the instruction mode code to determine the line number of
           the offending instruction.  For example, if your pc address was
           0x40019c, you would enter the command:
               (dbx) 0x40019c/i
                 [main:13, 0x40019c]   sw      r15,1(r0)
           In this case, dbx(1) points to line #13 in main().


EXAMPLES:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    $ cat -n typeE.c
     1 main ()
     2     {
     3    char *ptr;
     4    int   i = 10;
     5
     6    /*
     7    ** malloc returns wordaligned chunk of storage
     8    */
     9    ptr = (char *)malloc(4);
    10
    11    /*
    12    ** works fine since address is aligned
    13    */
    14    *(int *)ptr = i;
    15
    16    ptr++;
    17
    18    /*
    19    ** breaks - address is unaligned
    20    */
    21    *(int *)ptr = i;
    22    }

    $ make typeE
    cc -O  typeE.c -o typeE

    $ typeE
    Fixed up unaligned data access for pid 4798 (typeE) at pc 0x4001ac

    $ dbx caseE
    dbx version 2.10.1
    Type 'help' for help.
    reading symbolic information ...
    main:   9  ptr = (char *)malloc(4);
    (dbx) 0x4001ac/5i                           # use the pc address from above
      [main:22, 0x4001ac]   lw      r31,20(sp)  # indicates line #22 is culprit
      [main:22, 0x4001b0]   addiu   sp,sp,24
      [main:22, 0x4001b4]   jr      r31
      [main:22, 0x4001b8]   move    r2,r0
    (dbx) quit


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: This article based on an analysis of the source file
          "../sys/machine/mips/trap.c"
