MH Format Strings

[previous] [next] [table of contents] [index]

NOTE: for users of the online version of this book: This chapter has a lot of examples followed by long explanations. To avoid jumping between the example and its explanation, it's a good idea to open a new browser window to show an example. (Check your browser's menu for a command like New Web Browser or Open in New Window.) Then, use the original browser to read the explanation while you view the example in the second browser window.

The MH 6.8.3 mh-format(5) manual page says: "Format strings are designed to be efficiently parsed by MH which means they are not necessarily simple to write and understand. This means that... users of MH should not have to deal with them."

The MH 6.6 page said just the opposite: format strings "...represent an integral part of MH. This means that... users of MH should deal with them."

I tend to agree with the MH 6.6 wording. Unless you're doing something very complex, MH format strings really aren't that tough to figure out. And they're very useful. MH format strings let you:

You can use format strings to build message headers, or entire messages, from other messages. That's how the replcomps file works, by the way. The scan command can also use format strings to customize its output. And format strings are great for programming in Perl, the shells, C, etc. -- you can use them to parse message headers, a real time saver. For more information, see the Chapter Introduction to UNIX Programming with MH.

Until recently, the mh-format(5) manual page was fairly brief; it didn't document all of mh-format. The most recent version of the manual page, released with MH 6.7, has quite a bit of information. If your online version isn't up to date, the Section Online Manual Pages explains how to get a newer one.

One term you'll need to know is escape. An escape is a lot like a variable in programming or mathematics: it stands for (and is usually replaced with) something else. There are three kinds of escapes in MH format strings.

The easiest escapes to define are component escapes. These are replaced with the fields' values from your message header. (Remember, MH calls a header field a "component.") Here's an example. To get the subject of a message into your MH format string, you use the subject component escape. Write it this way:

There are two other kinds of escapes: function and control. You'll see examples of those below, and the mh-format(5) manual page defines them.

In fact, this is a good time to spend a few minutes with your online manual page. You don't need to read it word for word, but you should see what sections are there and what topics they cover.

The following sections will take you through MH format strings by example, like the mhl sections did. An easy way to get started with MH format strings is the scan command.

scan Format Strings

A scan format string is an mh-format string. It tells scan how to format the output for each message it scans.

It's time for a few examples. I have a folder with two messages in it. In the Example below, I'll use show to display the header of the first message for reference. Then I'll scan both messages with the normal scan command. Because there's no -form or -format string, scan uses its default format.

Example: Sample folder with two messages

% show 1
(Message scantest:1)
Forwarded: Fri, 13 Jan 1995 03:41:35 -0500
Forwarded: alicia
Replied: Mon, 09 Jan 1995 10:25:45 -0500
Replied: Joe Doe <>
Date: Thu 14-Dec-89 17:31:21 est 
Received: by (5.54/PHL)
        id AA29237; Thu, 14 Dec 89 17:31:21 EST
Message-Id: <>
From:  Al Bok <>
Reply-to: Joe Doe <>
Subject:  Query about "repl -query"

I have a question about repl -query...
% scan
   1+-12/14 Al Bok             Query about "repl -query"<<I have
   2  01/09 To:Joe Doe         Re: Query about "repl -query"<<Jo
Now let's give scan a format string. Either you can put format strings in a format file and use scan's -form switch or you can type them on the command line with the -format switch. I'll start with -format.

A simple format string that prints a hash mark followed by the message number and a colon, then the subject, works like this:

% scan -format "#%(msg): %{subject}"
#1: Query about "repl -query"
#2: Re: Query about "repl -query"
Here are some points about that last example: This is a good place to compare component escapes with function escapes. A component escape gets the contents of a component. A function escape performs some sort of calculation, operation, or other function. For example, the component escape {to} gets the contents of the To: field from a message. The (size) function escape counts the number of characters in a message.

If you don't use the percent sign (%) characters, MH won't treat what comes next as an escape. Look what happens without the % characters:

% scan -format "#(msg): {subject}"
#(msg): {subject}
#(msg): {subject}
You've already seen examples of two of the three types of escapes: component and function escapes. The third type, a control escape, does an if-else_if-else-endif operation. The parts are:
%< = if      %? = else_if      %| = else      %> = endif

NOTE: MH 6.7.2 added the else-if operator, %?, to that list. To keep things simple at the start, I won't cover %? until the Section The Default scan Format File.

Let's add a control escape to this example. It will test to see who each message is from. If a message was sent by me, this control escape will display the words FROM ME. Otherwise, it'll display the sender's address by printing the %{from} component escape. The control escape looks like this:

%<(mymbox{from})FROM ME%|%{from}%>
That's not as hard as it might look -- we'll dissect it in a minute. Let's try it first, then explain.
% scan -format "#%(msg): %<(mymbox{from})FROM ME%|%{from}%> %{subject}"
#1: Al Bok <> Query about "repl -query"
#2: FROM ME Re: Query about "repl -query"
The first message is from someone else, so scan prints his address. The second message is from me, so FROM ME is printed instead.

Let's dig into that control escape. Here's a diagram of the if-then-else parts:

%< (mymbox{from})    FROM ME    %|    %{from}  %>
if               then          else
    this is true     do this          do this
Actually, that's a nested set of all three kinds of escapes -- control, function, and component.

The %< is the start of the control escape. It tests the return value of (in other words, the "answer" from) (mymbox{from}). The (mymbox) function escape tests whether an address belongs to the person who's running the MH command. The {from} component escape is the address to test. Note the following:

Look back at the result of running that command. When the first message was scanned, it was not from me, the test failed, and the From: address was printed. When the second message was scanned, it was from me, the test was true and FROM ME was printed.

An escape returns one of two kinds of values, either numeric (integer) or string. The return values of escapes are put into registers (holding places) named num and str, respectively.

For simple format files, you don't need to know about registers. That's because the return value of an escape is always printed, unless the escape is nested in another escape. The outermost escape should always start with a percent sign (%); inner (nested) escapes shouldn't.

For instance, in the previous format string, the %(msg) and %{subject} escapes are not nested in others -- so their values are just printed. But the nested set of escapes (mymbox{from}) is itself nested in a control escape. There the return value of {from} is passed to (mymbox), and the return value of (mymbox) is passed to the control escape. What's printed is the value of the control escape (which starts with a percent sign (%); that's a clue that it'll be printed).

It's a good idea to test yourself as you look at the other mh-format strings in this section. Experiment to be sure how they work, what will be printed, and so on. The mh-format(5) manual page has more precise information.

NOTE: Most address-parsing function escapes won't work if your MH is configured with [BERK]. scan -help lists your configuration.

The Table below summarizes the four kinds of escapes.

Table: MH Format Escapes

If you're still not exactly sure how this works, this is a good time to practice. To help you get started if you haven't done much programming before, you might want to lure a computer guru from down the hall somewhere. (Hint: all computer gurus like pizza.)

scan Format Files

Because the format strings in the example are getting pretty long to type, I'll start using format files in the examples.

A format file has the same syntax as the format strings we used above, but you type the format string into the file without quotes around it. (Use a text editor like vi or emacs.) You give the filename to scan with its -form switch -- if the file is in your MH directory, you don't need to type a pathname. 'br For example, here's what the above format string would look like in a format file named scan.from in your MH directory. I've left in the backslash at the end of the short first line, so you can see how to continue lines if you need to. (You can also get this little file from the book's online archive. See download/split/mh/Mail/scan.from.)

% cat scan.from
#%(msg): \
%<(mymbox{from})FROM ME%|%{from}%> %{subject}
% scan -form scan.from
#1: Al Bok <> Query about "repl -query"
#2: FROM ME Re: Query about "repl -query"
Another note about these example format files: if you don't want to type them in yourself, you can get them electronically. For instructions, see the Section Obtaining Example Files From This Book.

The scan.answer Format File

Let's turn the simple scan.from format file into one that's more useful: Here's the output and the format file. (You can also get this file from the book's online archive. It's in download/split/mh/Mail/scan.answer.)
% scan -form scan.answer
   1R Al Bok < Query about "repl -query"
   2  ****** FROM ME ***** Re: Query about "repl -query"
% cat scan.answer
%4(msg)%<{replied}R%| %> \
%<(mymbox{from})****** FROM ME *****%|\
%<{reply-to}%20{reply-to}%|%20{from}%>%> \
Okay; let's take this step by step again:
  1. The %4(msg) prints the message number in a field that's four characters wide.
  2. %<{replied}R%| %> tests the value of the {replied} component escape. If there is a Replied: field in this message header, the test is true and the R is printed. If the message header doesn't have a Replied: field, the test will fail and a space is printed instead (to keep the columns neat).
  3. %<(mymbox{from})****** FROM ME ***** starts the same test as in the scan.from format file: if the message is from me, it prints FROM ME. There are enough asterisks to make the output exactly 20 characters wide.
  4. %|%<{reply-to}%20{reply-to}%|%20{from}%> is the else part of the previous if (the (mymbox{from}) escape). This else is actually made up of another complete if-then-else, as shown below: If you use MH 6.7.2 or later, that test can be shortened by using the %? else-if operator. The Section The Default scan Format File introduces %?.
  5. And finally, the rest of the file:

The Default scan Format File

When you use scan without a format file or format string, you get the default format. Here's an example of the default format:
 436+-06/28 Al Bok             <<I have a very complicated ques
 441  06/29 Jerry Peek         That complicated message Al sent
 443  06/30*To:ehuser,emmab    More about lunch<<The meeting is

NOTE: In earlier versions of MH, message 441 showed a problem in the default format. It would scan this way:

441  06/29 To:                That complicated message Al sent
If a message was from you and its header didn't have a To: field, scan would show To: followed by an empty field. That happened when a particular message was a reply sent with repl -query, where the reply wasn't sent to to the person who wrote the original message.

The default scan format is not read from a file each time scan runs. It's built into the scan command. The compiled-in definition is in the file h/scansbr.c in the MH source tree. There are two versions. If your MH is configured without the [UK] option (see the Section The -help Switches to find out), look at the first Example below (or the book's online archive, download/split/mh/Mail/scan.default). I've added line numbers (like 8>) for reference; those aren't part of the file. In the [UK] configuration, the day of the month is printed before the month. That file is shown in the second Example below; it's also in the book's online archive at download/split/mh/Mail/

nmh provides a copy of its default format file in the file scan.default. But, like MH, nmh doesn't use that actual file; it uses an internal version. There's an important difference in the nmh default format, though: it decodes MIME characters in the message header. For details, see the end of this section. Example: Default scan format file

1> %; NOTE: This file is supplied for reference only; it shows the default
2> %;  format string (for non-UK sites) which was compiled into "scan".
3> %;  See the source file "h/scansbr.h" for details.
4> %4(msg)%<(cur)+%| %>%<{replied}-%?{encrypted}E%| %>\
5> %02(mon{date})/%02(mday{date})%<{date} %|*%>\
6> %<(mymbox{from})%<{to}To:%14(friendly{to})%>%>%<(zero)%17(friendly{from})%>  \
7> %{subject}%<{body}<<%{body}>>%>

Example: Default UK scan format file

1> %4(msg)%<(cur)+%| %>%<{replied}-%?{encrypted}E%| %>\
2> %02(mday{date})/%02(mon{date})%<{date} %|*%>\
3> %<(mymbox{from})%<{to}To:%14(friendly{to})%>%>%<(zero)%17(friendly{from})%>  \
4> %{subject}%<{body}<<%{body}>>%>
The non-UK scan format (in the Example Default scan format file) is also available, in MH 6.8 and above, as the file scan.default in the MH library directory. I made the UK version by swapping the day and month entries from scan.default.

Let's take a walk through the non-UK Example, Default scan format file. As we work through this example and the ones after it, keep the mh-format(5) manual page close by and refer to it as we go. To help with the explanation, here are two scan output lines with each character (column position) numbered:

 436+-06/28 Al Bok             <<I have a very complicated ques
 443  06/30*To:ehuser,emmab    More about lunch<<The meeting is
0        1         2         3         4         5         6         7
  1. Lines 1-3 are comments. They start with the comment escape %;, which was added in MH 6.8.

    Before MH 6.8, an ugly way to add a comment looked like this:

    %<{-comment-}This is a comment%>
    If the message header doesn't have a field named -comment-: (with a dash at the start and end of its name), the comment This is a comment won't be printed. I don't recommend making comments that way.
  2. Line 4 prints the first six characters of each line, columns 1-6:
  3. Line 5 prints the next six characters, columns 7-12: the date and possibly an asterisk (*). The Date: header field is parsed twice, by the (mon) and (mday) functions, to get the numeric month and day. Those two numbers are printed with the format specifier %02 and a slash (/) between. The leading 0 in %02 means that if the number has fewer digits than the field width (in this case, if it has just one digit), it's printed with a leading zero. So, days like March 5 would be printed as 03/05.

    If the message doesn't have a Date: header field, {date} gives the date that the message file itself was last modified. In that case, column 12 will have an asterisk (*) instead of a blank. This is handy for scanning draft folders, where messages usually don't have Date: fields. The note after Table MH-format Special Component and Function Escapes has more information.

  4. Line 6 is fairly complex in MH 6.8 and above. (Earlier versions were simpler, but they had the bug explained in the previous footnote.) Line 6 prints 19 characters: Up to 17 characters of text with two spaces at the end. That's columns 13-31.

    Line 6 starts with a nested control escape. It ends with a control escape that tests a register set by the first control escape. Let's take it in steps.

    Finally, the two spaces before the backslash (\) at the end of line 6 print two blank columns of output: characters 30 and 31.

    That wasn't so bad, was it? :-) Line 6 shows a good example of the num register: holding the result of a test to be used later.

  5. Line 7 starts by printing the Subject: field. If there's any room left, the first part of the message body is printed. (The Section scan Widths explains how scan decides whether there's room left.) {body} is a special component escape set by scan. It holds the first part of the message body, "compressed": newlines and multiple space characters are replaced by a single space.

    If the message doesn't have a body, {body} evaluates false and nothing is printed after the subject. If the message has a body, << is printed, followed by as much of the body as will fit:

Recent versions of nmh have some changes to the default scan format. Here are the last three lines of that file with the changes boldfaced:

%<(zero)%17(decode(friendly{from}))%>  \
The To:, From: and Subject: header fields use the decode function escape. This decodes any RFC 2047 encoding in those fields. For example, this changes a Subject: field encoded as Un =?iso-8859-1?Q?d=EDa_dif=EDcil?= into Un día difícil.

scan only decodes these fields if your terminal can natively display the character set used in the encoding. You should set the MM_CHARSET environment variable to your native character set if it is not US-ASCII.

More Header Information: scan.hdr

Next, let's try a small change to the default (non-UK) format file. This new format file, scan.hdr, shows more information about the message header. The file, or one that you adapt from it, might be useful for you. Its output looks like this:
 435+          05/20 root               <<The job you submitted to a
 436 C       R 06/28 Al Bok             <<I have a very complicated
 441 C         06/29 Jerry Peek         That complicated message Al
 443  DF       06/30*To:ehuser,emmab    More about lunch<<The meetin
This new version has five "field letters" between the message number and date:
The message header has a cc: field in it. This is useful for figuring out messages like number 441, which doesn't have a To: address (but, as you can tell, does have a cc: field).
The message has been distributed (either distributed from someone else to you or sent by you with the dist -annotate command). So message 443 has at least one Resent-To:, Resent-cc:, or Resent: field.
The message has been forwarded to someone. Message 443 has been forwarded with forw -annotate, and it has a Forwarded: field.
The message has a MIME MIME-Version: field.
The message has been replied to with repl -annotate. (The default format file uses a dash (-), instead of an R, for this.)
The next Example shows the format file. You can also get this file from the book's online archive in download/split/mh/Mail/scan.hdr.)

Example: scan.hdr format file

1> %4(msg)%<(cur)+%| %>%<{cc}C%| %>\
2> %<{resent-to}D%?{resent-cc}D%?{resent}D%| %>\
3> %<{forwarded}F%| %>%<{mime-version}M%| %>%<{replied}R%| %>\
4>  %02(mon{date})/%02(mday{date})\
5> %<{date} %|*%>\
6> %<(mymbox{from})%<{to}To:%14(friendly{to})%>%>%<(zero)%17(friendly{from})%>  \
7> %{subject}%<{body}<<%{body}>>%>
The differences between scan.default and scan.hdr are in the first four lines of the Example above. Compare those to lines 4-7 of the Example Default scan format file.

Most of the changes are new control escapes to make the field letters. For example:

%<{cc}C%| %>
tests for a cc: header. If there is one, it prints a C; otherwise it prints a space.

The three-part control escape on the second line prints a D or a space. It uses the %? else-if operator. Here is the same line for versions of MH before 6.7.2 which don't have %?:

%<{resent-to}D%|%<{resent-cc}D%|%<{resent}D%| %>%>%>
You might try adding another column for, say, a Sender: field. To test your new format file, use a text editor to add a Sender: field to a couple of mail messages. In MH 6.7 and later, you can also use a command like the following to add a dummy Sender: field. (In MH 6.6 and before, anno doesn't have a -nodate switch.)
% anno -nodate -component Sender -text someone@somewhere

scan Widths

When scan writes to your screen, it tries to determine the width and fill it (if your format gives it that much text). For instance, the standard format string (stored internally in scan) will fill an 80-column screen to column 79. The same format string will fill a 40-column screen to column 39; the right-hand end will be cut off. For instance, here's the output of the same standard format string at three different screen widths:
  18+ 02/13 To:omderose@mvus   Lunch<<Let's eat now. OK? >>
  18+ 02/13 To:omderose@mvus   Lunch<<Let's
  18+ 02/13 To:omderose@m
As another example, notice that adding the five status letters in the Section More Header Information: scan.hdr didn't make the scan.hdr output any wider than the scan.default output.

As an output line is printed, you can get the amount of space left by using the function escape (charleft). The (width) function escape gives the total output width.

The scan.dateparse Format File

Let's try another example: the format file scan.dateparse. It uses date parsing functions to show the dates of messages. The output changes to fit the width available.

scan.dateparse isn't a format file you'd want to use every day, but it's a good demonstration of some important things:

Let's see what the file does and then dig into a line-by-line explanation. First, here's a normal scan of a folder with four messages. The messages were sent from different systems in different time zones. Message 3 has an illegal Date:.
% scan
   1+-12/14 Al Bok             Query about "repl -query"<<I hav
   2  01/09 To:Joe Doe         Re: Query about "repl -query"<<J
   3  01/00 randy@atlantic.or  Meeting is on!<<Be sure to get y
   4  08/16 randy@atlantic.or  Meeting is on!<<Be sure to get y
The scan.dateparse format file makes about 330 characters of output for each message. The amount depends on the length of the Date: field in the message. Here's an example of scanning the same folder with scan.dateparse:
% scan -form scan.dateparse -width 330

MESSAGE 1: Thu 14-Dec-89 17:31:21 est (STANDARD time)
 Official: Thu, 14 Dec 89 17:31:21 -0500
 "Pretty": Thu, 14 Dec 89 17:31:21 EST
629677881 seconds since UNIX, 160540976 seconds before now
Thu|Thursday |   4|yes |Dec  |December | 12|1989|  17| 31| 21

MESSAGE 2: Mon, 09 Jan 1995 10:25:45 -0500 (STANDARD time)
 Official: Mon, 09 Jan 1995 10:25:45 -0500
 "Pretty": Mon, 09 Jan 1995 10:25:45 EST

789665145 seconds since UNIX, 553712 seconds before now
Mon|Monday   |   1|yes |Jan  |January  |  1|1995|  10| 25| 45

MESSAGE 3: -0400 16 Aug 89 16:54:59 CAN'T PARSE DATE

MESSAGE 4: 16 Aug 89 16:54:59 -0400 (DAYLIGHT time)
 Official: 16 Aug 89 16:54:59 -0400
 "Pretty": 16 Aug 89 16:54:59 EDT
619304099 seconds since UNIX, 170914758 seconds before now
Wed|Wednesday|   3|no  |Aug  |August   |  8|  89|  16| 54| 59
Notice (on the first line of the listings) that each message has a different date format, but scan can parse all of them -- except the one in message 3.

Format files you've seen up to now just let scan truncate their output when the width limit is reached. But scan.dateparse checks the available width. It prints the last two lines that show the parsed date only if there is enough room for all of both lines. In this next example, the width isn't quite enough, so the last two lines for each message aren't displayed:

% scan -form scan.dateparse -width 300
	...These lines omitted...
629825145 seconds since UNIX, 8553736 seconds before now

MESSAGE 3: -0400 16 Aug 89 16:54:59 CAN'T PARSE DATE

MESSAGE 4: 16 Aug 89 16:54:59 -0400 (DAYLIGHT time)
 Official: 16 Aug 89 16:54:59 -0400
 "Pretty": 16 Aug 89 16:54:59 EDT
619304099 seconds since UNIX, 170914758 seconds before now
If you were going to use a format file like that a lot, you'd probably want to make a new version of scan called something like scandp. When you make the new version, you'd put this entry in your MH profile:
scandp: -form scan.dateparse -width 330
Then, you could just type scandp to use scan.dateparse without having to remember the width.

The next Example shows scan.dateparse. (It's also in the book's online archive at download/split/mh/Mail/scan.dateparse.)

Example: Date parsing demonstration: scan.dateparse

 1> MESSAGE %(msg): %{date} \
 2> %<(nodate{date})CAN'T PARSE DATE%|\
 3> (%<(dst{date})DAYLIGHT%|STANDARD%> time)\n\
 4>  Official: %(tws{date})\n\
 5>  "Pretty": %(pretty{date})\n\
 6> %(clock{date}) seconds since UNIX, %(rclock{date}) seconds before now\
 7> %(void(charleft))%<(gt 125)\n\
 9> %(day{date})|%9(weekday{date})|%4(wday{date})|%4(sday{date})|\
10> %5(month{date})|%9(lmonth{date})|%3(mon{date})|%4(year{date})|\
11> %4(hour{date})|%3(min{date})|%3(sec{date})%>%>\n
Next, here's a line-by-line explanation of how scan.dateparse works: If your version of MH uses four-digit years that you need to convert to two digits -- for example, to make an old MH format string the same way in a newer version of MH -- here's how. Replace the old %(year{date}) with this:
%(void(year{date}))%02(modulo 100)
The mhl.prodsumry format file uses that technique. It starts by writing the year into the num register. Next, the (modulo) function computes the value of num modulo 100 -- in other words, it divides the year by 100 and gives the remainder.

To make format files that are portable to both the two-and four-digit versions of MH, try this string that I found in the MH packmbox script:

%(void(year{date}))%<(gt 100)%4(putnum)%|19%02(putnum)%>
If the output of (year) is over 100, the string outputs the four-digit year. Otherwise, it outputs 19 and the two-digit year.

The scan.more Format File

The scan.more format file is a "do-it-all" format file that gives you a lot of information about messages in a short space. The output changes depending on which header fields the message has. For example, here are four messages scanned with scan.more (by the way, if you were going to use this file a lot, you'd probably store the -form and -width switches in your MH profile):
% scan 435-443 -form scan.more -width 230
 435  SENT: 20 May  CHARS: 383
      FROM: root (Super User)
    APP-TO: jdpeek
    <<BODY: The job you submitted to at, "/u3/acs/jdpeek/.l
 436  SENT: Thursday   CHARS: 29387  REPLIED: Friday
      FROM: Al Bok <>
    <<BODY: I have a very complicated question about the ph
 441+ SENT: Friday   CHARS: 499
        CC:, jdpeek
      SUBJ: That complicated message Al Bok sent us
 443 FILED: 16:44  CHARS: 52
        TO: ehuser, emmab
      SUBJ: More about lunch
If you compare the four messages, you'll see how the output changes:
  1. Message 435 was sent more than seven days ago, so its SENT: field shows the date and month that the message was sent. (scan.more uses the same date-formatting as the standard scan.timely format file. This message has 383 characters. It doesn't have a TO: field, but it does have an Apparently-to: field (shown as APP-TO:). There's no Subject: (SUBJ:), so the first part of the message body is shown.
  2. Message 436 was sent within the last week, so the day name is shown. I replied the next day (Friday) with repl -annotate.
  3. Message 441 is the current message -- the plus sign (+) shows that. I sent it on Friday. There's no TO: field (this can happen when you use the repl -query command and don't send your reply to the person who sent you the original message). Here, scan.more shows the CC: addresses instead. Finally, when a message is one that I sent (like this one), scan.more saves space by not showing a FROM: me line.
  4. Message 443 is a draft that was refiled from the What now? prompt. It doesn't have a Date: field, so scan.more shows the time that the message file itself was last modified.
The scan.more command is also used with the version of scan called cur. To save lines on the screen when you scan several messages, the format file hangs the message numbers into the left margin instead of putting blank lines between messages.

The next Example shows the format file. You can also get it from the book's online archive in download/split/mh/Mail/scan.more.)

Example: Lots of information: The scan.more format file

 1> %; $Id: scan.more 1.3 1994/11/26 19:36:21 jerry book3 $
 2> %4(msg)%<(cur)+%| %>\
 3> %<{date} SENT%|FILED%>: \
 4> %(void(rclock{date}))\
 5> %<(gt 15768000)%03(month{date})%(void(year{date}))%02(modulo 100)\
 6> %?(gt 604800)%02(mday{date}) %03(month{date})\
 7> %?(gt 86400)%(weekday{date}) \
 8> %|%02(hour{date}):%02(min{date})%>  \
 9> CHARS: %(size) \
10> %<{forwarded} (FORWARDED)%>\
11> %<{resent} (RESENT)%>\
12> %<{mime-version} (MIME)%>\
13> %<{replied} REPLIED: \
14> %(void(rclock{replied}))\
15> %<(gt 15768000)%03(month{replied})%(void(year{replied}))%02(modulo 100)\
16> %?(gt 604800)%02(mday{replied}) %03(month{replied})\
17> %?(gt 86400)%(weekday{replied}) \
18> %|%02(hour{replied}):%02(min{replied})%>%>\n\
19> %<{apparently-from}  APP-FROM: %{apparently-from}\n%|\
20> %<(mymbox{from})%|      FROM: %{from}\n%>%>\
21> %<{to}        TO: %{to}%|\
22> %<{apparently-to}    APP-TO: %{apparently-to}%|\
23>         CC: %{cc}%>%>\
24> %<{subject}\n      SUBJ: %60{subject}%|\
25> %<{body}\n    <<BODY: %60{body}%>%>
Most of scan.more uses the same techniques and escapes as other format files in this chapter. The parts of scan.more that print the SENT:/FILED: and REPLIED: fields are new, though. They were adapted from the MH scan.timely format file. Here's a look at one of the "date" sections: lines 4-8. (The REPLIED: section, lines 13-17, is almost identical.) The are two other things worth mentioning on line 5:
%<(gt 15768000)%03(month{date})%(void(year{date}))%02(modulo 100)\
This format file needs an output width of about 230 characters. The exact amount depends on how wide each field is. scan.more limits the width of the subject and body to 60 characters each. But if the text of the address field (like TO:) is long, it can "steal" width from the subject or body. That almost never happens to me -- if it's a problem for you, you should be able to fix it by now...

The replcomps.addrfix Format File

This section, and the rest of the sections in this chapter, show format files used by programs other than scan.

The Example below shows a replcomps-like format file for the repl command. (You can also get this file from the book's online archive. It's in download/split/mh/Mail/replcomps.addrfix.) This file handles an addressing problem I have with some of the email I get. I can't reply directly to the From: addresses on those messages; I have to edit the To: address in my reply before I send it. Like replcomps, the replcomps.fixaddr format file gets the best reply address from the message header. Then it uses a series of (match) escapes to decide whether the address is one I can't reply to. If a bad address matches, the file outputs To: good-address.

To make the series of tests, I used the "else-if" operator %?. If you have MH 6.7.1a or before, use the nested tests shown in the Section The scan.answer Format File.

Example: The replcomps.addrfix format file

 1> %(lit)\
 2> %(formataddr %<{reply-to}%?{from}%?{sender}%?{return-path}%>)\
 3> %<(nonnull)\
 4> %<(match isla!tim)To: tim\
 5> %?(match djkortz@apl23r)To:\
 6> %?(match !sparc2gx!vanes@uunet)To:\
 7> %|%(void(width))%(putaddr To: )%>\n%>\
 8> %(lit)%(formataddr{to})%(formataddr{cc})%(formataddr(me))\
 9> %<(nonnull)%(void(width))%(putaddr cc: )\n%>\
10> %<{fcc}Fcc: %{fcc}\n%>\
11> %<{subject}Subject: Re: %{subject}\n%>\
12> In-reply-to: Message from (%<{from}%{from}\
13> %?{sender}%{sender}%|%{apparently-from}%>)\n\
14>    of "%<(nodate{date})%{date}%|%(tws{date})%>."%<{message-id} %{message-id}%>\n\
15> --------
After lines 1-3 store an address in the str register and test for it, lines 4-6 see if the address is one of the three that needs to be rewritten.

For instance, if the original message was From: isla!tim, line 4 would match it. The string To: tim would be output. The else-if operator $?, at the start of line 5, would see that the previous test succeeded; control would go to the matching end-if which is the first %> on line 7.

Here's another example. If the message had a Return-Path: field with the address ...!frobozz!sparc2gx!, it wouldn't match at line 4 or line 5. The %? operator would keep executing tests until the matching test in line 6 was found. You could add many more of these else-if tests.

If none of the %? operators match, the final else (after the %| operator) is executed. Here, the address is printed with no changes.

There's one more %? operator used. It picks an address for the In-reply-to: field in lines 12-13.

The rcvtty.format File

rcvtty will read an MH format file, as explained in the Section Changing the Output Format. My rcvtty.format file is in The Example below -- and also in the book's online archive, at download/split/mh/Mail/rcvtty.format.

Example: The rcvtty.format file

1> ^[[7m\
2> * MAIL: %(size)ch @ %(hour{dtimenow}):%02(min{dtimenow}) *\n\r\
3> ^[[m\
4> %<(mymbox{from})To:%14(friendly{to})%|%17(friendly{from})%>\n\r\
5>   %{subject}%<{body}<<%{body}>>%>
That file uses a few tricks worth explaining:

The rcvdistcomps File

When a message is redistributed with the rcvdist command, the formatting of the Resent-xxx: header fields is controlled by an rcvdistcomps file. The default file is shown in The Example below; you can also get it from the book's online archive in download/split/mh/Mail/rcvdistcomps.)

Example: The rcvdistcomps file

%<(nonnull)%(void(width))%(putaddr Resent-To: )\n%>\
Resent-Fcc: outbox
Addresses you use on the rcvdist command line are available in the {addresses} component escape. By default, rcvdistcomps puts a copy of every message into your outbox folder. You can change all of this by making your own rcvdistcomps file in your MH directory.

Summary of MH Format Strings

This summary was adapted from the MH 6.8.3 mh-format(5) manual page. It gives a complete, detailed and fast-paced overview of MH format strings. Earlier versions of MH may not have all of these features.

A format string consists of ordinary text and special multi-character escape sequences which begin with % (percent sign). You can use C backslash characters in a format string: \b (backspace), \f (formfeed), \n (newline), \r (carriage return), and \t (tab). Continuation lines in format files end with a backslash (\) followed by the newline character. To put a literal % or \ in a format string, use two of them: %% and \\. There are three types of escape sequences: header fields (called components by MH format), built-in functions, and flow control.

The following two subsections explain control and function escapes. Next, after an explanation of Return values, are three tables of function escapes. The following table lists special escapes that are defined only for certain commands. Then comes a subsection that shows how to nest escapes. The last subsection explains field width and output width.

Control-flow Escapes

A control escape is one of: %&lt;, %?, %|, or %>. These are combined into the conditional execution construct:

        format text 1
        format text 2
        format text 3
        format text N
Extra white space is shown here only for clarity. These constructs may be nested without ambiguity. They form a general if-elseif-else-endif block where only one of the format text segments is interpreted.

The %< and %? control escapes cause a condition to be evaluated. This condition may be either a component or a function. The four constructs have the following syntax:

These control escapes test whether the function or component value is non-zero (for integer-valued escapes), or non-empty (for string-valued escapes). The %? control escape and its following format text is optional, and may be included zero or more times. The %| control escape and its following format text is also optional, and may be included zero or one times.

Function Escapes

Most functions expect an argument of a particular type, as shown in the Table below:

Table: Argument Types for MH-format Functions

The types date and addr have the same syntax as comp, but require that the header field be a date string (such as Date:), or address string (such as From:), respectively.

All arguments except those of type expr are required. For the expr argument type, the leading % must be omitted for component and function escape arguments, and must be present (with a leading space) for control escape arguments.

The evaluation of format strings is based on a simple machine with an integer register num, and a text string register str. When a function escape is processed, if it accepts an optional expr argument which is not present, it reads the current value of either num or str as appropriate.

Return Values

Component escapes write the value of their message field in str. Function escapes write their return value in num for functions returning integer or boolean values, and in str for functions returning string values. (The boolean type is a subset of integers with usual values 0=false and 1=true.) Control escapes return a boolean value, and set num.

All component escapes, and those function escapes which return an integer or string value, pass this value back to their caller in addition to setting str or num. These escapes will print out this value unless called as part of an argument to another escape sequence. (To prevent printing, use the (void) function escape.) Escapes which return a boolean value do pass this value back to their caller in num, but will never print out the value.

Tables of Function Escapes

The next three tables list MH-format function escapes.

Table: MH-format Function Escapes (1 of 3)

msg (argument: none) (return: integer)
Message number
msg (argument: none) (return: integer)
In forw -digest: issue number
cur (argument: none) (return: integer)
Message is current
cur (argument: none) (return: integer)
In forw -digest: volume number
size (argument: none) (return: integer)
Size of message
strlen (argument: none) (return: integer)
Length of str
width, more... (argument: none) (return: integer)
Output buffer size in bytes
charleft (argument: none) (return: integer)
Bytes left in output buffer
timenow (argument: none) (return: integer)
Seconds since the UNIX epoch
me (argument: none) (return: string)
The user's mailbox
eq (argument: literal) (return: boolean)
True if argument equals value in num register
ne (argument: literal) (return: boolean)
True if argument doesn't equal value in num register
gt, more... (argument: literal) (return: boolean)
True if argument is greater than value in num register
match (argument: literal) (return: boolean)
True if value in str register contains the argument
amatch (argument: literal) (return: boolean)
True if value in str register starts with the argument
plus (argument: literal) (return: integer)
Add value in num register to argument
minus (argument: literal) (return: integer)
Subtract value in num register from argument
divide (argument: literal) (return: integer)
Divide value in num register by argument
modulo (argument: literal) (return: integer)
Value in num register modulo the argument (divide value in num by the argument, give the remainder)
num (argument: literal) (return: integer)
Store argument in num register; if no argument, erase num
lit (argument: literal) (return: string)
Store argument in str register; if no argument, erase str
getenv (argument: literal) (return: string)
Store value of environment variable named by argument into str register
profile (argument: literal) (return: string)
Set str register to value of MH profile or context entry named by argument
nonzero (argument: expr) (return: boolean)
True if value in num register is non-zero
zero (argument: expr) (return: boolean)
True if value in num register is zero
null (argument: expr) (return: boolean)
True if str register is empty
nonnull (argument: expr) (return: boolean
True if str register is not empty
void (argument: expr) (return: none)
Set str or num registers
comp (argument: comp) (return: string)
Set str register to value of field comp
compval (argument: comp) (return: integer)
Set num register to numeric value (from UNIX atoi() function) of field comp
decode (argument: expr) (return: string)
Decode any RFC-2047 encoding in str register (nmh only)
trim (argument: expr) (return: none)
Trim trailing whitespace from str register
putstr (argument: expr) (return: none)
Print str
putstrf (argument: expr) (return: none)
Print str in a fixed width
putnum (argument: expr) (return: none)
Print num
putnumf (argument: expr) (return: none)
Print num in a fixed width
The functions in the next Table require a date field as an argument:

Table: MH-format Function Escapes (2 of 3)

sec (argument: date) (return: integer)
Seconds of the minute
min (argument: date) (return: integer)
Minutes of the hour
hour (argument: date) (return: integer)
Hours of the day (0-23)
wday (argument: date) (return: integer)
Day of the week (Sun=0)
day (argument: date) (return: string)
Day of the week (abbrev.)
weekday (argument: date) (return: string)
Day of the week
sday (argument: date) (return: integer)
Day of the week known? (0=implicit,-1=unknown)
mday (argument: date) (return: integer)
Day of the month
yday (argument: date) (return: integer)
Day of the year
mon (argument: date) (return: integer)
Month of the year
month (argument: date) (return: string)
Month of the year (abbrev.)
lmonth (argument: date) (return: string)
Month of the year
year (argument: date) (return: integer)
Year (may be greater than 100)
zone (argument: date) (return: integer)
Timezone in hours
tzone (argument: date) (return: string)
Timezone string
szone (argument: date) (return: integer)
Timezone explicit? (0=implicit,-1=unknown)
date2local (argument: date) (return: none)
Coerce date to local timezone
date2gmt (argument: date) (return: none)
Coerce date to GMT
dst (argument: date) (return: integer)
Daylight savings in effect?
clock (argument: date) (return: integer)
Seconds since the UNIX epoch
rclock (argument: date) (return: integer)
Seconds prior to current time
tws (argument: date) (return: string)
Official 822 rendering
pretty (argument: date) (return: string)
User-friendly rendering
nodate (argument: date) (return: integer)
str not a date string
The functions listed in the next Table require an address field as an argument. The return value of functions noted with `*' pertain only to the first address present in the header field.

Table: MH-format Function Escapes (3 of 3)

proper (argument: addr) (return: string)
Official RFC 822 rendering
friendly (argument: addr) (return: string)
User-friendly rendering
addr (argument: addr) (return: string)
Host or host!mbox rendering*
pers (argument: addr) (return: string)
The personal name*
note (argument: addr) (return: string)
Commentary text*
mbox (argument: addr) (return: string)
The local mailbox*
mymbox (argument: addr) (return: integer)
The user's addresses? (0=no, 1=yes) (see note after table)
host (argument: addr) (return: string)
The host domain*
nohost (argument: addr) (return: integer)
No host was present*
type (argument: addr) (return: integer)
Host type* (0=local, 1=network, -1=uucp, 2=unknown)
path (argument: addr) (return: string)
Any leading host route*
ingrp (argument: addr) (return: integer)
Address was inside a group*
gname (argument: addr) (return: string)
Name of group*
formataddr (argument: expr) (return: none)
Append arg to str as a (comma-separated) address list. Works with repl -query to select addresses.
putaddr (argument: literal) (return: none)
Print str address list with arg as optional label; get line width from num
A note about the previous Table: In general, (mymbox{component}) checks each of the addresses in the header field component: against the user's mailbox name and any Alternate-Mailboxes:. It returns true if any address matches, however, it also returns true if the component: header field is not present in the message. If needed, the (null) function can be used to explicitly test for this condition.

Special Escapes

Some MH commands give different interpretations to some escapes. The next Table gives a summary. The third column refers you to sections (S) and examples (X) with more detail about each entry. For details, see the command's manual page.

Table: MH-format Special Component and Function Escapes

{error} in ap(8) (return: string)
A diagnostic if the parse failed
(cur) in forw -digest (return: integer)
Volume number
{digest} in forw -digest (return: string)
Digest name
(msg) in forw -digest (return: integer)
Issue number
{addresses} in rcvdist (return: string)
Addresses from command line
{body} in rcvtty (return: string)
First part of the body, compressed
{dtimenow} in rcvtty (return: date)
Current date. Example:
Thu, 01 Dec 1994 18:02:42 -0800
{fcc} in repl (return: string)
Any folders specified with -fcc folder
{subject} in repl (return: string)
Subject: field without any leading Re: and spaces
{body} in scan (return: string)
First part of the body, compressed
{date} in scan (return: string)
Returns file modification date if Date: field is missing.
{dtimenow} in scan (return: date)
Current date (as in rcvtty).
A note about the previous Table: If no Date: field is present in the message header, the function escapes which operate on {date} will return values for the date of last modification of the message file itself. Therefore, if scan encounters a message without a Date: field, the column that usually holds the date gets the last write date of the message instead. This is particularly handy for scanning a draft folder, as message drafts usually aren't allowed to have dates in them. Because control escapes evaluate false when they test for a field that doesn't exist, the default scan format prints a * when the Date: field is missing.

Nesting Escapes

When escapes are nested, evaluation is done from inner-most to outer-most. The outer-most escape must begin with %; the inner escapes must not. For example,

%<(mymbox{from}) To: %{to}%>
writes the value of the header field From: to str; then (mymbox) reads str and writes its result to num; then the control escape evaluates num. If num is non-zero, the string "To: " (with a trailing space) is printed followed by the value of the header field To:.

Field Width and Output Width

When a function or component escape is interpreted and the result will be immediately printed, an optional field width can be specified to print the field in exactly a given number of characters. For example, a numeric escape like %4(size) will print at most 4 digits of the message size; overflow will be indicated by a ? in the first position (like ?234). A string escape like %4(me) will print the first 4 characters and truncate at the end. Short fields are padded at the right with the fill character (normally, a blank). If the field width argument begins with a leading zero, then the fill character is set to a zero.

As above, the functions (putnumf) and (putstrf) print their result in exactly the number of characters specified by their leading field width argument. For example, %06(putnumf(size)) will print the message size in a field six characters wide filled with leading zeros; %14(putstrf{from}) will print the From: header field in fourteen characters with trailing spaces added as needed. For (putstrf), using a negative value for the field width causes right-justification of the string within the field, with padding on the left up to the field width. The functions (putnum) and (putstr) print their result in the minimum number of characters required, and ignore any leading field width argument.

The available output width is kept in an internal register; any output past this width will be truncated. The functions (width) and (charleft) are useful here; there are examples in the Sections scan Widths and The scan.dateparse Format File.

[Table of Contents] [Index] [Previous: mhl] [Next: Chapter Introduction: Processing New Mail Automatically]

Revised by Jerry Peek. Last change $Date: 1999/10/10 05:14:05 $

This file is from the third edition of the book MH & xmh: Email for Users & Programmers, ISBN 1-56592-093-7, by Jerry Peek. Copyright © 1991, 1992, 1995 by O'Reilly & Associates, Inc. This file is freely available; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. For more information, see the file copying.htm.

Suggestions are welcome: Jerry Peek <>