Subject: Guide to moderating the *.answers newsgroups

Version: $Id: guide,v 1.80 2001/05/09 17:29:56 ngb Exp $

======== 0. Introduction

This document describes the policies and procedures involved in being a 
moderator of the *.answers newsgroups.  It is intended to be a detailed 
step-by-step guide to help a new *.answers moderator come up to speed 
without needing additional guidance.  Other documents which a new 
*.answers moderator should be familiar with are listed below.

  * Posting: "Introduction to the *.answers newsgroups" 
  * Posting: "*.answers submission guidelines" 
  * Posting: "So you're interested in helping moderate *.answers...." 
  * Posting: "Questions Frequently Asked by *.answers moderators"

Although this document is not meant to be top-secret, if you are not 
already a *.answers moderator, you probably should be reading one of the 
first three documents listed above instead of this one.

The core of this document is organized into two sections: the first 
describes much the administrative apparatus which exists to help *.answers 
moderators perform their jobs, and the second explains how to perform the 
various moderation tasks.

======== 1. Administrative apparatus

======== 1.1. How *.answers and rtfm.mit.edu really work

All the *.answers newsgroups are marked as moderated newsgroups.  As such, 
any attempted postings to RFC-compliant news servers will not be 
propagated as news articles but are emailed directly to the "moderator" 
address for the newsgroup(s) involved.  Authors of articles which have 
been submitted to the moderators and properly approved are authorized by 
us to post their articles with the Approved line that allows the articles 
to propagate normally -- this is different from how most moderated 
newsgroups work, since the moderator usually is the one to post the 
articles with Approved lines.

Such articles are processed by nightly scripts which run on two machines 
at MIT: rtfm.mit.edu (currently aliased to bloom-picayune.mit.edu) and 
penguin-lust.mit.edu.  These scripts look at all the articles in *.answers 
which arrived that day and attempt to archive them in the places indicated 
by their Archive-name's for FTP and mail server availability.  These 
scripts indicate any problems with this automatic process in email to all 
the moderators and to the moderation queue.

An important part of the approval process is keeping various databases up 
to date.  These databases are special ASCII files which contain the 
information for the "List of Periodical Informational Postings" (LoPIP) 
and for the automatic scripts, the subscribers for the faq-maintainers-
announce mailing list, and for other internal moderator usage.

======== 1.2. Notes about logging in

Although your Athena account will allow you to log into a number of 
different Athena workstations which allow public access, much of the work 
for *.answers moderation can only be performed on penguin-lust.mit.edu.  
(The canonical "rtfm" is actually bloom-picayune.mit.edu, but that's
a much older, overworked machine, and nearly all its functions have
been moved to penguin-lust.)

The other public Athena workstations can be reached by telnetting to 
athena.dialup.mit.edu.

You shouldn't use a .rhosts file or anything like that for your
account, because even if you are able to log in without a password,
you'll still have to run "kinit" and "aklog" to get Kerberos tickets
and AFS tokens.  To log in to penguin-lust, you will need to use
either a version of telnet which supports Kerberos IV, or ssh
(http://www.cs.hut.fi/ssh/), a secure shell.  Otherwise, telnet to
athena.dialup.mit.edu.

If your Athena account is a guest account for *.answers moderating, then 
your home directory is in the SIPB (Student Information Processing Board, 
the student group at MIT that sponsors moderation accounts and maintains 
the hardware) AFS cell, and therefore you have AFS tokens for the SIPB AFS 
cell automatically when you log in.  Otherwise, you should run "aklog 
sipb" when you log in to get tokens in the SIPB AFS cell, since many of 
the files you will need to access and modify are in the SIPB AFS cell.  
Also, if you're logged in for more than 10 hours, your Kerberos tickets 
and AFS tokens may expire.  If you have problems accessing *.answers stuff 
and you have been logged in on that session for quite a long time, try 
running "kinit" and type your username and password, and then run "aklog 
athena sipb".

If you encounter problems with using rtfm.mit.edu or penguin-lust.mit.edu, 
first consider sending email to the rest of the moderators, who might be 
able to help you with the problems.  However, as a last resort, or for 
problems which you are certain are not specific to *.answers moderation 
(and even then, you should probably send the email to the moderators 
first), you may wish to send email explaining your problem to the address 
rtfm-maintainers@mit.edu.  Be aware that the people responding to that 
address may not know anything at all about the *.answers moderation 
process at all.

======== 1.3. Where are relevant files to be found?

In all of the paths references later in this document, $PPDIR represents 
the directory /afs/sipb/project/periodic-postings, and $NADIR represents 
$PPDIR/news.answers.  You may wish to add $PPDIR/src and $NADIR to your 
executable search path and create links to them in your home directory.

Most of the files related to recordkeeping and correspondence are in 
$NADIR.  Many of the other archive maintenance scripts are in $PPDIR/src/, 
and the LoPIP and related data files are in $PPDIR/data/.

On rtfm.mit.edu, the command "cd ~ftp" will place you in the same "base" 
directory that anonymous FTP users are placed in when they connect.  You 
may need to manipulate files and directories in that directory, so you 
should familiarize yourself with how it is set up, and in particular how 
the same file is stored in multiple places in the directory structure.  
For more information, see the *.answers submission guidelines.

======== 1.4. Notes about correspondence

======== 1.4.1. Queuing/sending/archiving *.answers correspondence

All mail sent to news-answers, news-answers-request, faq-maintainers-
announce, faq-maintainers-request, and faq-maintainers-announce-request is 
saved in the MH folder $NADIR/Mail/archive, as well as sent to each member 
of the faqs mailing list (i.e., all the moderators).  Mail to any of these 
addresses except news-answers is also stored in the moderation queue, 
$NADIR/Mail/inbox.  Mail to news-answers is first processed by a procmail 
recipe which attempts to weed out most of the test and "junk mail" 
messages (see Section 2.7.1), and messages which pass that filter are 
stored in the queue.

The $NADIR/Mail/inbox folder is the moderation queue, used for keeping 
track of current outstanding correspondence.  $NADIR/Mail/archive is an 
archive of all correspondence, and shouldn't be modified (except that its 
contents will occasionally be packed up into a single file when the 
directory gets too large; these files will be found in 
$NADIR/Mail/archive/packed).

When a *.answers message is sent to you personally, rather than to one of 
the *.answers admin lists, and it concerns something that needs to be 
dealt with, you should resend it to news-answers-request@mit.edu, so that 
it is fed into the archive and queued properly.  That way, other 
moderators will have a chance to take care of business if you do not have 
the time right then to do so.  Even if you are willing to deal with it 
immediately, you should resend a copy directly to the archives, at news-
answers-archive@rtfm.mit.edu, so they are as complete as possible.  (If 
you prefer, you can queue it by sending a copy to news-answers-
request@mit.edu, then immediately lock the message.) You may wish to 
experiment with your mail reader to explore what functionality for 
resending is available -- for example, emacs19 rmail has a resend function 
which preserves the original headers in a more useful manner than simply 
forwarding the email would.

If, on the other hand, if a message sent to you personally doesn't require 
any further action by any of the *.answers moderators, you should just 
resend it to news-answers-archive@rtfm.mit.edu (note "rtfm.mit.edu", not 
"mit.edu"), which will cause it to be filed into $NADIR/Mail/archive but 
not filed in $NADIR/Mail/inbox or sent to the other *.answers moderators.  
Of course, if for some reason you think the message *should* be seen by 
the other *.answers moderators, you can send it to news-answers-request, 
but you might then want to immediately remove it from the queue (lock it), 
as described below.

In addition to incoming messages sent to you personally, all outgoing 
(i.e., sent by you) *.answers correspondence should be BCC'd to news-
answers-archive@rtfm.mit.edu if it isn't already being BCC'd to news-
answers-request.  In this way, all *.answers correspondence from all 
moderators will be archived automatically.  Furthermore, you should keep a 
personal archive of all outgoing correspondence, so that it will be 
archived somewhere in case you forget to BCC it to news-answers-
archive@rtfm.mit.edu or news-answers-request.

There are two reasons why you might BCC outgoing correspondence to news-
answers-request rather than news-answers-archive@rtfm.mit.edu: (1) You are 
a new moderator.  Until you feel comfortable with the moderation process, 
all of your outgoing correspondence should be BCC'd to news-answers-
request (or directly to one of the more experienced moderators), so that 
the more experienced moderators can keep a closer eye on what you're doing 
and provide help and advice where it's needed; (2) You think that the 
message you're sending is of interest to the other moderators, i.e., not 
routine.

You should try to put "Reply-To: news-answers-request@mit.edu" in the 
headers of your outgoing *.answers correspondence, so that recipients who 
reply using RFC-compliant mail readers will reply in a manner that 
automatically queues and archives their email rather than reply to an 
individual moderator who then has to queue and archive it by hand.

When replying to email messages which were generated by news software, be 
sure you reply to the person who initially composed the email instead of 
replying to the news software.  Often the user's address will be in a 
secondary header section instead of in the main header section; this is 
neither a bug or a feature.  Now that the incoming *.answers E-mail stream 
has been front-ended by a Procmail-based filtering system (see Section 
2.7.1 for more information), this problem has been almost completely 
fixed.  However, misconfigured or buggy news software may still leave no 
trace of what real user originally attempted to post in a *.answers 
newsgroup, in which case it is appropriate to complain to the news site 
administrators (the mail alias usenet@site should reach the appropriate 
people).

Various moderators have written customizations for their mail readers 
(notably emacs rmail and procmail) to automatically insert appropriate 
headers and form letter text to help streamline this process.  You may 
wish to inquire about this if you want some help coming up with your own 
customizations for the way you work.

======== 1.4.2. faq-maintainers and faq-maintainers-announce archives

The faq-maintainers mailing list is archived in the directory 
$NADIR/Mail/faq-maintainers.  Similarly, faq-maintainers-announce is 
archived in $NADIR/Mail/faq-maintainers-announce.  Only actual list 
messages are stored, not administrative correspondence, subscription 
requests, and so on.  Both of these directories have "packed" 
subdirectories containing older files.

The mailing list archives are also available by FTP; see $NADIR/mailing-
lists-policy, which documents where they are archived, and which is mailed 
to subscribers of the mailing lists periodically as a reminder.

======== 1.5. Processing messages in the queue

To remove a message from the queue so you will exclusively be the one to 
handle it, "lock" it by executing this command:

athena% $NADIR/lock [ options to MH's "pick" command ]

Since all messages (in queue, archived, etc.) are stored as MH messages 
(until they get packed), the lock command is simply an interface to the MH 
"pick" command that looks in $NADIR/Mail/inbox, where queued, unprocessed 
messages are stored.  You can read the man pages for "pick" for some of 
the more esoteric options, but the easiest way to specify which queued 
submission you want to handle is by the username of the submitter: 
"$NADIR/lock -from joeuser" or "$NADIR/lock -search joeuser" will look for 
messages from or mentioning joeuser.  If the "pick" options you specify 
match more than one message, you will be given a list of them and asked 
which you want to lock; type their number(s) to select.  (When the queue 
is long, "pick" (and therefore "lock") can be rather slow.  Under those 
circumstances, it can be helpful to keep a "scan" of the queue ("scan 
+$NADIR/Mail/inbox >< filename") in a file and read or grep it to find the 
message number(s) you wish to lock.)

When you lock a message, it will be refiled in $NADIR/Mail/$USER, where 
$USER is your username.  If you use MH, you might want to put a symbolic 
link from your Mail directory to $NADIR/Mail/$USER (e.g., "ln -s 
$NADIR/Mail/$USER ~/Mail/news.answers"), in which case you can manipulate 
it just like any of your other MH folders.  Note that MH will renumber 
messages in this refiling process.

In addition to refiling the message in your folder in $NADIR/Mail, 
$NADIR/lock will send a mail message to faqs@mit.edu, announcing that 
you've locked the message.  One message is sent for each message you lock.  
This allows all moderators to have an idea how much activity is going on 
with processing queued messages (for example, a moderator might note the 
absence of lock announcements for the last several days and schedule some 
time in order to remedy this slack).  The script also logs who locked what 
when in $NADIR/lock-log.

To unlock a message, use "$NADIR/requeue [ `pick' options ]".  If you know 
the message number in your $NADIR/Mail/$USER folder that you want to 
unlock, you can just specify that, since "pick" will also take message 
numbers as arguments.  Alternatively, you can specify patterns just as you 
would for $NADIR/lock.  Requeueing a message moves it from your folder 
back into the inbox.

Once you're completely done with a message, you can just delete it from 
your folder in $NADIR/Mail/.  

When you are working the queue, you should prioritize dealing with 
error/warning/reminder messages from our scripts (currently this includes 
the nightly rkive run, the nightly LoPIP run, and the weekly archive 
cleanup reminder) and submissions which have been checked by the automated 
FAQ-checker (with Subject: [checked]); otherwise, queue entries should be 
dealt with in chronological order as much as possible (but related entries 
should just all be dealt with at once when you lock the first entry).

======== 2. How to perform your duties

======== 2.1. Duties: an overview

As a *.answers moderator, you will be called upon to perform some subset 
of the following tasks (explained in more details in later sections):

* Examining submissions to determine if they are appropriate for 
  *.answers, and rejecting inappropriate submissions.  
* For postings that are appropriate but not formatted correctly for 
*.answers, sending back a letter explaining what needs to be corrected 
  before the postings can be accepted.  
* Dealing with the necessary administrivia when you approve a new posting, 
  or when you approve changes to an existing posting.  (Includes 
  processing faq-maintainers-announce administrative requests and 
  forwarding faq-maintainers requests to the new list admins.) 
* Dealing with unapproved or badly formatted postings in *.answers.  
* Performing routine maintenance on the FAQ archive on rtfm.mit.edu.  
* Keeping the "List of Periodic Informational Postings" up-to-date.  
* Performing routine maintenance on the *.answers correspondence archive 
  and on the archives of the faq-maintainers and faq-maintainers-announce 
  mailing lists.  
* Maintaining the *.answers FAQ server, which is used by FAQ maintainers 
  to post their FAQs if they are having trouble cross-posting to *.answers 
  directly.

These tasks would be very time-consuming if performed only by one person, 
but with multiple moderators, the time requirement is not so onerous.  In 
addition, experience will help speed up the rate at which you can perform 
these tasks, so do not be discouraged if you don't make much of a dent on 
the incoming queue the first few times.

======== 2.2. Examining submissions

======== 2.2.1. Checking for appropriateness

When you see a submission and want to process it, first lock that 
submission to extract it from the queue.

Then you should decide if the posting belongs in *.answers, regardless of 
whether or not it is correctly formatted for the groups.  The 
"Introduction to the *.answers newsgroups" posting talks about what kinds 
of postings are appropriate for news.answers.  If you are unsure of 
whether or not a submission belongs, feel free to ask the other moderators 
to clarify.

If the posting is not appropriate for *.answers, decide whether you want 
to return it to the submitter with a note indicating this.  Each submitter 
whose message is in the queue will already have received a copy of 
$NADIR/autoreplies/autoreply-text, which explains what kinds of postings 
are appropriate for *.answers.  If the submission is close, e.g. if it 
would be appropriate if it weren't MIME encoded or if its commercial 
content were toned down just a bit, send a note explaining the problem and 
asking for a resubmission.

Most "junk mail" and other inappropriate messages should be filtered out 
by the procmail recipe, but that filter is intentionally loose, so some do 
slip through.  If you wish, you can send one of the $NADIR/inappr-* form 
letters to the sender, though in practice most moderators find that the 
increased workload isn't worth the effort.  The most general form letter 
is $NADIR/inappr-flame.  The file $NADIR/inappr-flame-clue includes more 
information about places to look for more appropriate places to ask their 
questions for people who look like they could use a clue.  If the 
submission appears to be due to a buggy news reader that ignored the 
Followup-To line in an FAQ posting with the submitter posted a followup, 
you might want to use $NADIR/inappr-flame-soft instead.  Finally, the file 
$NADIR/inappr-flame-test should be used to discourage test messages posted 
in *.answers, and $NADIR/pyramid-flame can be sent in response to "make 
money fast" messages and other pyramid schemes.

If the submission is indeed appropriate for news.answers, proceed to the 
next section.

======== 2.2.2. Checking for conformity to guidelines

[Postings which make it this far should be given an entry in the LoPIP 
database.  Please read section 2.3.]

Your next step is to determine if the submission conforms to the 
guidelines.  In summary, that means:

 Headers

* Both an article header and an auxiliary header are included in the 
submission.  The auxiliary header is separated from the article header by 
one or more *completely* blank lines (i.e., no spaces or tabs), and 
nothing more, and likewise separated from the body of he text by one or 
more completely blank lines.

 Main Header Lines

* The article header has a Newsgroups line.  
* The Newsgroups line contains news.answers and all of the other relevant 
*.answers newsgroups, and no inappropriate ones.
* All of the *.answers newsgroups are at the end of the Newsgroups line.  
* If the Newsgroups line contains other moderated newsgroups, the article 
  has the other moderators' approval.  
* The article header has a Subject line.  
* The Subject line gives a good idea of what's in the posting, even to 
  people who might not be familiar with the topic it addresses.  
* Important information (e.g., the Newsgroup name or topic) is near the 
  beginning of the Subject line.  
* The article header has a Followup-To line.  
* The Followup-To line is correctly formatted.  This means it contains 
  either a list of newsgroups, comma-separated (no spaces), or the word 
  "poster" (not "Poster").  It should *not* contain an E-mail address.  
* If the Followup-To line contains a list of newsgroups, then none of the 
*.answers newsgroups are listed.

 Auxiliary Header Lines

* The auxiliary header has an Archive-name line.  
* The archive name selected is appropriate, uses an established directory 
  if an appropriate one exists, and gives a good idea of what the posting 
  is about.  
* The archive name is in the correct format (see the description in the 
  guidelines) and is not already in use.
* If the archive name is specified in a News-answers-archive-name line 
  rather than an Archive-name line, there are other *-answers-archive-name 
  lines for the other *.answers newsgroups to which the article will be 
  posted.  
* There are not different archive names specified for different *.answers 
  newsgroups.

 Other Administrative Information

* The submitter has told you how often the article will be posted, either 
  by mentioning it in the submission or sending separate mail along with 
  the submission.  

You can check for many form errors, including the check for other 
moderated groups, by running the command

save-faq -quiet -submitted

on the buffer which contains the submission, starting from the beginning 
of the text of the submission.  (Or, in emacs, use C-u M-x shell-command 
in a reply buffer and redirect the submission file into that command.) The 
command output will tell you of some (but not all) possible errors, in a 
form suitable for mailing back to the user.

If the posting does not conform to the guidelines, you can do one of two 
things (when in doubt, do the latter):

* If it *almost* conforms, you can say, "Well, you can go ahead and post 
it if you make the following changes.  If these changes are a problem, let 
me know and we'll work them out." In particular, if the posting itself is 
conformant but you don't know about the posting frequency, you can approve 
it but ask the poster again for the information.

 * If it needs significant changes, write back and explain what's wrong 
with it, including copies of the guidelines and of the original 
submission's headers, asking for a resubmission.

Most moderators use a form letter which looks something like 
$NADIR/changes-needed.  It's important to tailor the middle section of 
this file by deleting the lines which are not relevant, and adding 
specific explanatory comments.

Most newly submitted FAQ's go through at least one iteration of this 
process.  (Note that only on the very first iteration should you email the 
guidelines to the submitter.) Once the submission is completely 
conformant, you should approve it by telling the poster that he or she can 
post when ready according to the determined frequency, and that an 
"Approved: news-answers-request@mit.edu" line should be added to the main 
header so that it will be propagated normally rather than emailed to us 
again.  This is also the time to ad the posting's Archive-name to the 
$NADIR/index file.  Since there are multiple moderators, it will 
frequently be the case that various moderators will be involved in 
different iterations.  It is important therefore that you keep a good 
record of comments on your transaction in each iteration in the LoPIP 
database, described in the next section.

======== 2.3. Changing the *.answers database and FTP archives

While this is going on, you're going to need to modify various *.answers 
database files and sometimes update the FTP archives by hand for certain 
kinds of changes.

======== 2.3.1. $NADIR/index and other index files

The file $NADIR/index contains a list of approved *.answers archive names 
and their maintainers.  The maintainer addresses aren't necessarily up-to-
date, but each one was correct when it was added.  It is a good idea, but 
not integrally important, to try to keep the addresses up-to-date (e.g., 
if a maintainer changes addresses or a new maintainer takes over an FAQ).

The approved archive name list should be used to quickly verify that an 
archive name is not already in use before approving a new posting.  
Furthermore, when you first start being a moderator, you should glance 
over it to get an idea of the archive names that are currently in use, so 
you can decide if a proposed archive name is reasonable.  (In particular, 
you should look to see if there are any existing directories which would 
be particularly appropriate for a posting to reside in.) Finally, whenever 
an archive name is changed for some reason, you should update it in this 
file, in addition to doing the other necessary tasks described below.

This list is RCS-controlled (read the RCS man pages if necessary, and
remember that we're using the Gnu versions).  To edit it, you should
cd into $NADIR and do "gco -l index".  If you get an error because
it's already locked, then check the modification time and owner of the
file to find out if it's being actively edited.  If not, see
$NADIR/moderation-faq for instructions on how to safely "break" a lock
on a file that isn't in use.  If it is being actively edited, then
you'll have to wait until the current locker finishes and checks it in
before you can modify it.

Once you check it out, you can use the editor of your choice to modify it.  
The list is in alphabetical (case-insensitive) order, so you should keep 
that order when editing it.  When done editing it, use "gci -u index" to 
check it back in.  The RCS log message you specify isn't all that 
important, since usually the diff stored in the RCS file will be 
sufficient to tell what you did, but if there's something unusual about 
what you did, note it there.

In addition to $NADIR/index, there are also index files in the FTP 
archives, in ~ftp/pub/usenet-by-group/*.answers/index and correspondingly 
in usenet-by-hierarchy/*/answers/index.  These are in slightly different 
format from $NADIR/index, containing the archive names of FAQ's and their 
subjects on one line and the groups they are posted to on the next line.  
You do not have to update these files when approving a new FAQ or new 
parts -- that is done automatically by the automatic save-faq script, 
which can be found in /usr/spool/FAQ_archiver on rtfm.

======== 2.3.2. The LoPIP database ($PPDIR/data/{list,list.long})

======== 2.3.2.1. What/where is it?

The "List of Periodic Informational Postings" database of FAQ postings 
lives in $PPDIR/data/{list,list.long}.  All *.answers postings (plus many 
other postings not approved for *.answers) should be listed in this 
database.  The utilities for manipulating it are located in $PPDIR/src.  
Never simply start your editor to directly edit the list or list.long 
files.

This database actually serves three different purposes:

* It is the basis for the "List of Periodic Informational Postings" posted 
monthly to news.answers and other newsgroups.  
* It contains records, on a per-posting basis, of the status of 
submissions to *.answers.  
* It is used to generate the rkive.cf file which controls which postings 
get archived in the rtfm.mit.edu FAQ archive.

Because it serves several purposes, it's more complicated than it would be 
if it were only used to keep track of *.answers submissions, and lives in 
a different directory tree ($PPDIR) than the rest of the *.answers files 
($NADIR).

Some of the data (e.g., the "From", "Newsgroups", "Subject", and 
"Frequency" fields for each entry in the database) in this database are to 
be maintained by human moderators, and can be found in the file "list" in 
the data directory.  Other data (e.g., the "Date", "Summary", and 
sometimes the "Archive-name" fields) is pulled automatically from the 
actual postings in the FAQ archive, and is stored in the file "list.aux".  
The reason the database is split in this way is so that the RCS log 
doesn't have to contain all of the automatically generated data, which 
changes frequently.  For this reason, the "list" file is RCS-controlled, 
but the "list.aux" file is not.

[Note: the checker.pl script, when run interactively by a moderator as 
part of the archives clean-up process, will now offer to correct the 
Frequency LoPIP field using the auxiliary header Posting-Frequency field 
in actual postings, as well as offer to correct From, Newsgroups, and 
Subject lines.]

======== 2.3.2.2. How to edit the LoPIP database

To edit the database, you should "cd $PPDIR/src" and type "./ledit" (or 
just type "ledit" if it's in your search path).  It will RCS check-out the 
"list" file and start up $EDITOR (defaults to emacs, but can be changed by 
putting a setenv command in a file called .environment in your home 
directory) on it.

When finished, if you used ledit, it will run grcsdiff to show you what has 
changed, and then ask you to enter a log message for the change.  
Otherwise, check the file back in using "gci $PPDIR/data/list".  Please 
enter a meaningful log message if you've made significant changes, 
especially if you are a relatively new moderator.

When editing the long version of the database, the main data for each 
entry is separated from the auxiliary data by "--" on a line by itself; 
make sure to put any main data above the "--", and any auxiliary data 
(e.g., the Archive-name field) below it.  If you want to edit the main and 
auxiliary data together, then you should use "./ledit -l" instead of 
"./ledit".  The "-l" option stands for "long", both because it allows you 
to edit a longer file and because it takes longer (since it has to merge 
the database files before running the editor).

For example, if you want to enter the "Archive-name" field, which goes in 
the auxiliary field of the database rather than main field, you need to 
have started your edit with "./ledit -l" instead of just "./ledit".  (It 
is a good idea to add the Archive-name field of a posting to the database 
when you first approve it, because if you don't, and the author posts with 
the wrong Subject line, the LoPIP-checking script won't see the posting.)

Also, you can use the "-c" option to tell the script to check the format 
of the database after you're done editing it.  This is useful if you are 
new to this and unsure of the exact format or of where everything goes.  
If there's something wrong with the data, you'll get an error message 
which should give *some* indication of the problem, and you'll be asked if 
you want to edit the file again to fix the problem.

Note that the "ledit" script is pretty smart about restarting if you abort 
an edit in the middle, i.e., if you rerun it, it will probably do the 
right thing.

Of course, if you're not using ledit, you won't be able to look at the 
long version of the file or have the script check your changes.  Be sure 
to remember to check the data file back in when you're done with it.

======== 2.3.2.3. Editing as part of the submission process

You should edit this database at each step in the submission/approval 
process for a submission.  A complete "paper trail" is important so that 
another moderator can step in to handle future correspondence and have 
some understanding of the transactions that went before.  If you ask the 
author to resubmit the posting, you should edit the database to indicate 
this, and to indicate each subsequent piece of correspondence to or from 
the submitter.  Each comment, starting with the posting's first 
submission, should be listed with the date it occurred, not necessarily 
the date you're recording it.

Keep in mind that when someone submits a posting, it may already be listed 
in the database even if it is just now being submitted to news.answers, 
since the database lists many postings that are not currently cross-posted 
to news.answers.  Therefore, before adding a new entry to the database for 
a submission, check to see if it's already there (by searching for the 
newsgroup name or author or a keyword on the Subject line or all of the 
above -- just convince yourself that it's not there before adding it); if 
it is already there, you should modify the current entry rather than 
adding a new one.

When adding new entries to the database, it doesn't matter where you put 
them as long as you leave the single blank lines separating entries.  
However, I prefer that you put them at the end of the file.  The list is 
sorted automatically once a day or so, based on the Newsgroups and Subject 
lines in the entries, and if there's some problem with the file's format, 
the new entries will be left at the end where they can be easily examined.  

When adding an entry to the database in response to a submission, make 
sure to remove the *.answers newsgroups from the Newsgroups line, even if 
they were present in the submission, if it has not yet been approved for 
*.answers.  When you approve a posting, be sure to add them back to the 
Newsgroups line.

If a submission comes in with only *.answers newsgroups in its Newsgroups 
line (or with no Newsgroups line at all) and you can't tell from the 
submission what its home newsgroup(s) are going to be, then add it to the 
LoPIP with "Newsgroups: unknown".  In that case, you should be sure to 
mark the Subject field special ("&Subject:", as described below), so that 
the entry does not show up in the posted LoPIP.  Never add an entry to the 
LoPIP with only *.answers newsgroups in its Newsgroups line, unless that's 
how it's actually going to be posted (i.e., one of the *.answers 
administrative postings).

In general, whenever you add an entry to the database for record-keeping 
purposes but the entry is incomplete enough that it shouldn't show up in 
the posted LoPIP, remember to mark its Subject field special.

When finally approving a posting, if it took several rounds of moderator 
comments and resubmissions, take a second to make sure that the submitter 
hasn't made minor changes since the initial submission (when the LoPIP 
entry was created).  It is not uncommon for the submitter to revise the 
Subject lines or Archive-names very slightly while going through the 
editing for the submission process, and it's easy to miss those changes 
when you're working quickly.

======== 2.3.2.4. Database format (details)

The "list" file (LoPIP) is composed of entries separated by blank lines.  
The entry format is a bit complicated, and is perhaps best explained with 
an example.  Here's a sample entry (the entry is prefixed with `>' even 
though they aren't actually in the database):

> *Newsgroups: alt.foo,alt.answers,news.answers 
> !Subject: alt.foo Frequently Asked Questions (FAQ), * of * 
> From: Susie_Sample@domain.com 
> Frequency: weekly in alt.foo, monthly in *.answers 
> +ID: 282687.709794033 
> +Single-Subject: alt.foo Frequently Asked Questions (FAQ), 1 of * 
> +Single-Subject: alt.foo Frequently Asked Questions (FAQ), 2 of * 
> +N: The Newsgroups line is marked valid because sometimes the author 
>     posts just to alt.foo.
> +C: 4/6/93: jik: Posted in two parts, with both parts with same 
>             archive name, with new Subject format, without prior 
>             approval.  Wrote and complained, but approved (with 
>             new archive names for multi-part).  
>        4/8/93: jik: Author ACKed.  

It looks a lot like the header of a mail message or Usenet posting, and 
indeed the format is similar.  Each field in an entry is composed of zero 
or more special tag characters which I will discuss below, then the name 
of the field, then a colon, then one or more spaces, then the field value.  
Fields can be continued just as in mail messages, by putting whitespace at 
the beginning of the continuation lines.  Fields taken directly from the 
header, such as Newsgroups and Subject, should not be split across lines 
-- it confuses the archive checker software.

The tag characters are as follows:

`*' Indicates that a field is "valid", even if the contents of a posting 
in the FAQ archive disagree with it.  For example, if an FAQ posting is 
posted to three different newsgroups, but is for some reason posted 
separately to the three groups rather than cross-posted, then all three 
newsgroups would be listed on the Newsgroups line, but it would be marked 
valid with `*' to tell the automatic FAQ archives consistency checker 
($PPDIR/src/checker.pl) to ignore the fact that the Newsgroups lines in 
the actual postings don't agree with the Newsgroups field in the database.

`+' Indicates that a field is "hidden", meaning that it should not be 
displayed in the "List of Periodic Informational Postings".  This is 
primarily used to hide the "ID" field in each entry, and to hide the 
comment field (usually named "C") that is used to record private moderator 
comments about the status of the posting.

`&' Indicates that a field is "special".  If a "Subject" line is marked 
special, that means that LoPIP entry should not appear in the published 
LoPIP, but should still be used when generating the rkive.cf file (so it 
gets archived if it shows up in the news feed).  If a "Single-Subject" 
line is marked special, the checker.pl script will handle multiple matches 
to a single Single-Subject line differently (for more details, see 
$PPDIR/src/cleanup-steps and below Single-Subject documentation).

`!' Indicates that a field is "important".  This means that when the LoPIP 
diff posting is generated, any change between this field and what was last 
posted will show up.  You shouldn't mark changes to the From field of an 
entry as important if the change is a new address for the same author, or 
if the order of newsgroups in the Newsgroups field changes but not which 
newsgroups are actually being posted to, but any other changes should 
probably be marked important.  If you're not sure whether a change is 
considered important, ask.

 Note that for the purposes of the diff posting, all visible parts of new 
LoPIP entries are considered "important".

The "ID" field is unique to each entry in the database.  You don't assign 
a new ID if you add a new entry -- the database utilities will assign it 
automatically.  You should never change or delete the ID of an existing 
entry.  The ID is used primarily so that the script that generates the 
LoPIP diff posting has a static bit of data to link entries in the last-
posted version of the LoPIP with entries in the new List.  The ID field 
should always be hidden.

The "Single-Subject" and "Dont-Match" (not shown) fields are used by the 
integrity-checking script which compares the entries in the database with 
the corresponding postings in the FAQ archive.  The purpose of Single-
Subject is to allow the script to notice when there are old versions of a 
posting that should be deleted -- when more than one posting in the FAQ 
archive matches one of the Single-Subject lines during rtfm.mit.edu 
archive cleanup, the script will automatically delete all but the newest 
file, unless the Single-Subject line was marked special using "&", in 
which case the script will ask the moderator which one(s) to keep.  You 
should mark a Single-Subject special only if you expect multiple postings 
to match it, those multiple postings should all be kept, *AND* you cannot 
easily specify the postings in separate Single-Subject lines.  The purpose 
of "Dont-Match" is to specify Subject lines which the Subject should *not* 
match, assuming that it has wildcards in it.  For example, since part 1 of 
the comp.benchmarks FAQ is posted to different Newsgroups from the other 
and therefore has its own entry in the LoPIP, I don't want the wildcarded 
subject for the other FAQ entries, "[l/m */*/*] *c.be FAQ", to match part 
1, so that entry has a Dont-Match which contains "[l/m */*/*] benchmark 
source info-Intro*".  These two fields should also always be hidden.

The "Archive-Subject" field (not shown) should appear in the main part of 
an LoPIP entry and should always be hidden.  The primary purpose of the 
Archive-Subject field is to specify a general pattern which all of the 
Subject lines for an FAQ should match, so that if the maintainer of the 
FAQ changes the Subject line of one of his postings slightly, or adds a 
new posting with a different Subject line, it will still get archived (and 
furthermore, so that the integrity-checking script will find the 
changed/new posting and alert us to the inconsistency between it and 
what's in the LoPIP).  (I.e., it is used by the checker.pl script to find 
files in the archive and by the make-kill.pl script to determine what 
postings in a newsgroup to archive, but is not treated as a valid Subject 
line for a posting to have, when the integrity-checking script compares 
the Subject line in an actual posting to the Subject line(s) in its LoPIP 
entry.) For example, note the Archive-Subject line in the LoPIP entry for 
the Ferret FAQ postings.  An entry may have multiple Archive-Subject 
lines.

The comment field, which should be called "C" ("Comment" has been used in 
the past but is deprecated) and should also be hidden, is used to record 
the status of the posting, e.g., if it's in the process of being reviewed 
for *.answers, or if there's a feeler out to the maintainer to find out if 
it's still being posted so that the list can be kept updated.  You can 
delete comments that are no longer relevant (e.g., if the comment says, 
"The author says he'll be posting it again by the end of the week," and is 
dated earlier than the last time it was posted, you can probably delete 
it) when adding new comments to an entry, but you don't have to do so.  
(The $PPDIR/src/trim-comments.pl script, which is documented in 
$PPDIR/src/cleanup-steps, automatically trims old comments from the 
database.) This is where you should comment during each step in the 
submission of an FAQ to *.answers.  If you're not sure of exactly what you 
should be putting in the comments, feel free to ask, or to glance through 
the database to see what's there already.  As shown above, all comments 
should be prefixed by the date (in MM/DD/YY format) and the username of 
the person entering the comment.  Be sure to indent successive lines in 
the comment field lest they become visible in the public LoPIP postings.

Finally, the "Note" field (sometimes "N") is similar to the comment field, 
but contains long-term information about the entry that doesn't need to be 
dated and isn't going to become obsolete (and therefore won't ever need to 
be deleted).

Some entries also have non-hidden "Comment" (as opposed to hidden "+C") 
fields which are included in the posted LoPIP.  Typically these explain 
what articles make up a multi-part posting, what language a posting is in, 
etc.

Note that all entries in the database must have one each of the From, 
Newsgroups, and Frequency fields, and one *OR MORE* Subject fields.  
Please don't forget these when adding a new entry or accidentally leave an 
entry without one of these after editing it (and again, remember never to 
delete the ID field of an existing entry!).  If the author didn't specify 
a posting frequency, use "Frequency: unknown".  Within each section (i.e., 
above and below the "--" in the long version of the file), lines can be 
entered in any order; they'll be sorted automatically each night.

Note that there are a number of different ways through which multiple 
postings can be referenced by a single LoPIP entry.  They are not mutually 
exclusive.  Here is an explanation of when each is appropriate:

 * Wildcard(s) in a Subject line should be used when the information that 
the wildcard is replacing is "dispensable".  Information that is 
considered dispensable includes part numbers and posting dates.  For 
example, it's reasonable to represent a multi-posting FAQ that has "part 1 
of 3", "part 2 of 3", etc.  in its Subject lines with a single LoPIP entry 
with a single Subject line reading "part * of *".

 * Multiple Subject lines should be used when a multi-posting FAQ's 
different Subject lines convey useful information that can't be conveyed 
in a single Subject.  Two examples: (1) an FAQ posting and its diff 
posting (see, for example, the entry in the LoPIP for the LoPIP itself); 
(2) a multi-posting FAQ where each Subject line lists the topic of the 
posting (see, for example, the LoPIP entry for the Ferret FAQ postings).

 * Single-Subject lines should be used when we know that there is a chance 
that information in the Subject lines will change and cause multiple 
copies, some obsolete, of a posting in the archive.  Two examples: "FOO 
FAQ (*/*)", when the number of parts, after the slash, changes 
occasionally; "FOO FAQ (*/*/* version)" where "*/*/*" is the date of the 
version of the FAQ.

 * Archive-Subject lines, as described above.

It is completely reasonable for an LoPIP entry to use any of these in any 
combination.

======== 2.3.2.5. Publicizing overall changes in the LoPIP

When the LoPIP is changed in a way that will show up in the posted version 
of it and that people should be told about, a description of the change 
should be placed in $PPDIR/data/periodic-posting-news, which has an 
associated RCS file.  (This is pretty uncommon, and refers only to changes 
in the structure or format of the LoPIP, not just to its contents.) When 
the LoPIP posting headers are generated, the text in this file (if any) 
will be placed in between the archive name and the descriptive text of the 
part1 and diff postings, preceded and followed by a blank line.  You might 
want to include in the periodic-posting-news file separator lines to make 
the text stand out, e.g., centered "----- NOTE -----" lines at the 
beginning and end of the file.

The automatic job that actually posts the LoPIP will empty 
$PPDIR/data/periodic-posting-news and check in the emptied file, so that 
the same changes are not announced multiple times.

======== 2.3.3. Modifying the FAQ archives by hand

To make modifications to the archives, first log into rtfm.mit.edu and cd 
to ~ftp/pub; or log into penguin-lust, use "XXX" to acquire access to the 
FTP archives, and then cd to ~ftp/pub.

You should almost never create new files or replace existing files in the 
archives by hand.  (If you do so, you will probably waste space and forget 
to create or change some of the files.) Instead, you can usually get the 
right result by feeding the new file you want to place into the archives 
into the save-faq script ($PPDIR/src/save-faq).  For example, if someone 
replies to an LoPIP entry and their software does not add an "Re:" to the 
Subject line, their reply may end up replacing the saved PIP.  If you can 
still retrieve the original article from the primary newsgroup, you can 
save it to disk, feed it into save-faq, and that script will update all 
the correct places.

If you approved a new posting which will be in a directory which did not 
exist before, you do not need to create the directory in the appropriate 
places because the automatic save-faq script will do so.

If the archive name of a posting for which you're responsible changes, you 
need to rename the file in the *.answers archives on rtfm.mit.edu.  The 
typical FAQ is stored in at least 4 different places: in usenet-by-group/ 
in news.answers/ and in its home hierarchy's ___.answers/, and in usenet-
by-hierarchy/ in news/answers and in its home hierarchy's ___/answers/.  
It is best not to change this manually, but again to use save-faq.  To do 
this, copy all the files in question into a temporary location (say, 
/tmp).  Next, do "save-faq -unlink -unlinkdirs < filename" for every 
filename; this deletes them from the archives.  Finally, do "save-faq -an 
new-archive-name < filename", which will reinstall them.  Be careful, 
however, of the presence of Newsgroup-archive-name: lines; those may have 
to be dealt with by hand.

If a posting called "foo-faq" splits, you should follow the steps above, 
giving it a new archive name of "foo-faq/part1" or whatever archive name 
was chosen for part 1 of the split FAQ.  If a split posting rejoins into 
one part, it may sometimes be necessary to remove the old directories by 
hand; be sure to get them all, two per newsgroup to which the files were 
posted.

Finally, if an author tells you that a posting will no longer be posted to 
*.answers, you should remove it from the archives and the corresponding 
index files, doing a "save-faq -unlink -unlinkdirs < filename".

======== 2.4. faq-maintainers-announce

When someone submits to *.answers, they may ask to be subscribed to
the faq-maintainers-announce mailing list.  (If they ask to subscribe
to faq- maintainers, tell them to send the request to
majordomo@faqs.org .)  <If they subscribe to faq-maintainers, they are
also automatically subscribed to faq-maintainers-announce.) You will
find the database of subscribers in $NADIR/faq-maintainers-announce.

To edit the list, you need to check it out with RCS as with the index.  
When adding or editing users, keep the list in alphabetical order by mail 
username (case-insensitive), and make sure the entries you add are in the 
same format as those that are already there (address first, then 
whitespace, then full name in parentheses).  It is important to use this 
format because the first field on each line is the only one that actually 
gets fed to the mailing list processor.  Try to get the full name whenever 
possible.  (Due to the Moira database software, addresses of the form 
"username@mit.edu" or "username@athena.mit.edu" should be listed in these 
files as "username" only for the first field.)

After you've made all the necessary changes for one session to the list 
and checked it back in, you must use the "update-lists" script in the same 
directory to actually commit the changes.  Note, however, that they won't 
propagate to MIT's mail server for a day or so.  Also, the Moira database, 
which the update-lists script accesses, is generally busy from shortly 
before midnight until a few hours later, while it dumps the list contents 
into "alias" files for sendmail running on the MIT mail hubs.  If you 
can't run update-lists because the database is busy, send mail to faq-
maintainers-request asking someone else to do it for you.

You should send a copy of $NADIR/mailing-lists-policy to new subscribers.

Note also that there are sometimes requests from seemingly random people 
to be added to the list and requests from current subscribers to be 
removed from these mailing lists in the queue which should be processed 
accordingly (there is *no* requirement that someone actually be 
maintaining an FAQ in *.answers to be on the faq-maintainers-announce 
mailing list).

Lastly, with respect to subscriptions, a few subscribers to the faq-
maintainers-announce mailing list have X.400 addresses.  If one of them 
wishes to unsubscribe, or if someone new wishes to subscribe with an X.400 
address, send the request to rtfm-maintainers and ask them to tweak the 
mail aliases file for faq-maintainers-announce-redist (redistribution list 
for X.400 addresses) by hand.

Messages which people attempt to send to faq-maintainers-announce 
currently will be deposited into the same queue as messages sent to the 
various -request lists; this is to simulate the effect of a moderated 
mailing list.  If you lock the message and decide it is appropriate for 
the subscribers to faq-maintainers-announce, you should send the message 
below with the specified headers:

To: faq-maintainers-announce@MIT.EDU 
From: {submitter's email address} 
Reply-to: {submitter's email address, or possibly faq-maintainers} 
Subject: {submitter's subject line} 
Bcc: fma-actual-distribution, faq-maintainers@faqs.org
Comment: This message was approved for distribution by {JOHN SMITH}, one 
     of the list maintainers.  
Precedence: list

 {body of the submitter's message}

You should, of course, make the appropriate substitutions for the text in 
curly braces.  Also, if the submitter was impatient and already sent the 
announcement to faq-maintainers himself, delete the faq-maintainers from 
the Bcc line above.

======== 2.5. Routine maintenance

Various routine maintenance tasks have to be performed by the *.answers 
moderators.

======== 2.5.1. Trimming *.answers correspondence archives

In order to keep the size of the archives of *.answers correspondence 
manageable, the messages in the archive are periodically trimmed using the 
following heuristics (see the $NADIR/trim-folder.pl script):

 * If a message contains a "Newsgroups" line in its mail header or 
auxiliary header, and that header contains a *.answers newsgroup, 
everything after the first kilobyte of the body of the message is 
truncated and replaced with a line indicating the truncation.

 * Any text appearing between lines saying "(Begin form letter.)" and 
"(End form letter.)" or "[begin reference inclusion]" and "[end reference 
inclusion]" is removed and replaced with a line indicating that a 
reference inclusion was removed.

If you use other lines to indicate an inclusion, you can modify 
$NADIR/trim-folder.pl to notice and remove your inclusions automatically 
too.  Just make sure you use something specific enough that there won't be 
any false matches, and make sure you test your changes (e.g., by creating 
a "test" folder, putting messages with previous inclusion styles and your 
new inclusion style in it, and running "trim-folder.pl -f test" to make 
sure that it does the right thing); if you're not sure how to do this, ask 
me for help.

======== 2.5.2. Unapproved and badly formatted postings in *.answers

If there are problems with one or more FAQs when the automatic FAQ 
archiver runs each day, it is supposed to send mail to news-answers-
request bringing them to our attention (and depositing them in the queue).  
The problems it notices include all those noticed by save-faq and the 
automatic FAQ checker, not surprisingly since they all use the same 
script.

When the nightly archiver finds a problem posting, it saves it in under 
/usr/spool/FAQ_archiver/problems.  You can deal with them the following 
way:

 1) Lock the LoPIP and edit it with "ledit -l".

 2) cd into /usr/spool/FAQ_archiver.

 3) Load the files into your favourite editor and look why they might be a 
problem; if you can figure it out, go to step 5

 4) Run "/usr/spool/FAQ_archiver/save-faq -n < filename".  It should 
report an error indicating what the problem with the file is.  If it 
doesn't, then there's no problem (any more) and you should go to step 6

 5) Deal with the posting as appropriate; also add necessary comments to 
the LoPIP.

 6) Remove the file from /usr/spool/FAQ_archiver/problems.

Once you've checked all of the files, do:

 find /usr/spool/FAQ_archiver/problems -depth -type d -print | xargs rmdir

If there are any error messages, there still are some files; go back and 
check them.

Some possible responses to problem postings:

 * Approved *.answers postings with the archive names in the wrong place 
or no archive name at all -- send mail to the author pointing out the 
problem, ask him to fix it, and (depending on your mood) update the FAQ 
archive so that the posting appears under the archive name even though it 
wasn't specified in the posting.  

* Unapproved *.answers postings -- send mail to the author flaming.  For 
this, you can use the form letter $NADIR/unapproved-flame.  If the it's 
less than 24 hours since the article was posted, consider canceling it.  
If it contained an Archive-name line that wasn't a good archive name, 
remove it from the FAQ archive (and update the appropriate index files).  
Note that when save-faq reports that there is an invalid archive name 
(most commonly, when the archive name is not listed in $NADIR/index), it 
will not save the posting.  This means that you have to save by hand 
(using save-faq) any FAQ that was posted with a valid and approved archive 
name which someone simply forgot to add to $NADIR/index during the 
approval process.

* Approved *.answers postings posted to only *.answers -- if we weren't 
informed that the posting would appear in only *.answers, write and ask 
what's up.  

* Postings that are posted to a *.answers group for a hierarchy they don't 
post to otherwise -- write and ask why.  

* Postings with "Supercedes" header -- write and ask the author to correct 
it.

Basically, just use your common sense.  Log what happened, and your 
response, in the LoPIP.  Update the FAQ archive where appropriate.  
Remember that postings appear in multiple locations in the FAQ archive, 
e.g., articles in usenet-by-group/news.answers also appear in usenet-by-
hierarchy/news/answers.

======== 2.5.3. Checking the rtfm FAQ archive and the LoPIP

"Routine maintenance" on the FAQ archive and the LoPIP should be done more 
regularly than it is.  The file $PPDIR/src/cleanup-steps lists the steps 
involved in the cleanup and explains them in detail.

Also, part of the regular cleanup of the FAQ archive should include going 
through the faq-maintainers-announce file checking for disabled addresses 
and figuring out if they can be reenabled (try mailing a test message to 
the subscriber address and see what happens now) and/or should be removed 
from the list completely (rather than just commented out).  These files 
and others which are kept under RCS (e.g., $NADIR/index, $PPDIR/data/list) 
should have their old revisions outdated at least annually as that will 
improve speed when working with RCS gci/gco.  Leaving a month or two of 
revisions should suffice to track down any problems.

The script that runs nightly on rtfm.mit.edu to check the LoPIP and post 
it if necessary (see $PPDIR/src/nightly.csh) is also set up to send a 
message weekly to news-answers-request@MIT.Edu, reminding us that the 
routine maintenance (i.e., the tasks mentioned in this section as well as 
the two sections below) needs to be performed.  Whoever does the 
maintenance should lock that message before doing so.  If you lock that 
message and are only able to do part of the maintenance, you should note 
in the locked message what you have and haven't done and then requeue it.

======== 2.5.4. Keeping the LoPIP up to date

The process for keeping the LoPIP up-to-date is outlined in the 
$PPDIR/src/cleanup-steps file mentioned above.  Basically, it consists of 
two steps:

* Deal with LoPIP entries that are found during maintenance of the FAQ 
archive to be out-of-date.

* Deal with LoPIP entries that are listed as missing in the "check.out" 
file created nightly automatically.

See $PPDIR/src/cleanup-steps for more information.

======== 2.5.5. Packing the correspondence archives

There are three different correspondence archives that need to be 
maintained: the *.answers correspondence archive ($NADIR/Mail/archive), 
the faq-maintainers mailing list archive ($NADIR/Mail/faq-maintainers), 
and the faq-maintainers-announce mailing list archive ($NADIR/Mail/faq-
maintainers-announce).  For the *.answers archives, maintenance involves 
trimming messages in the archive (described in Section 1.4.2 above) and 
packing the archive into a file (and deleting the packed messages) when it 
gets too large.  For the faq-maintainers and faq-maintainers-announce 
archives, maintenance involves only packing the archive into a file when 
it gets too large.

Each week, the person who performs routine maintenance on the FAQ archive 
and the LoPIP should also perform the routine maintenance on the 
correspondence archives.  This consists of performing the following steps 
on each of the archives (the *.answers correspondence archive is used in 
the examples below, but the same steps apply to the other archives except 
where otherwise noted).

7) Check to see if the archive has grown large enough that it needs to be 
packed:

% cd $NADIR/Mail/archive 
% du .  
5995 ./packed 
8563 .  

The "packed" subdirectory of each archive contains the packed RMAIL 
files for the archive.  An archive needs to be packed when the unpacked 
messages in it total more than 500k in size.  To determine if that is the 
case, subtract from the total size of the archive directory the size of 
the "packed" subdirectory; in this case, the difference is over 2500k, so 
the archive obviously needs packing.  If the archive is below 1000 k or 
so,, the packing can be put off for a few days, since it will probably 
fall below the 500k limit after the duplicate-removal and message-trimming 
steps mentioned below.

8) Remove duplicates from the archive.  It's useful to run this in an 
emacs shell or another window with scrollback capability, or to redirect 
its output into a file.

 % $NADIR/mh-dups.pl [1-9]* 
<zero or more messages, as described below> 

This script can print the following messages:

 "No Date field in <msg number>." (this happens rarely)

 The script uses the Date fields of messages it's checking to figure out 
which ones are duplicates.  If it can't find the Date field in a message, 
it displays a warning.  You can either just ignore this message (the worst 
that will happen is that the message is a duplicate but will get archived 
in the packed archive anyway, and this isn't such a big deal) or look at 
the message, figure out why it doesn't have a Date field, and deal with it 
appropriately.

 "Removing <msg number>, duplicate of <msg number>."

 The script removed a message, as it's supposed to.  This is normal.

 "Error removing <msg number>: <error>." (this happens rarely)

 A system error of some sort occurred while trying to delete a duplicate.  
Investigate as appropriate.

 "Possible duplicates: <msg number> and <msg number>."

 Two messages have the same Date field, but the checksums of their bodies 
are different.  Check the two messages to see if they're really 
duplicates, and if so, delete one.

 "Error opening <msg number>: <error>." (this happens rarely)

 A system error of some sort occurred while trying to open a message to 
check if it's a duplicate.  Investigate as appropriate.

 "EOF before end of header in <msg number>." (this happens rarely)

 Possibly an empty message or a message that was somehow saved improperly 
in the archives.  Investigate and deal with it appropriately.

 "Empty body in <msg number>."

 Either a message with an empty body was actually sent to the archived 
address, or some sort of error occurred while saving a message into the 
archive, or something like that.  You can either investigate and possibly 
remove the message from the archive, or just leave it there; it doesn't do 
any harm to archive a message with no body.

9) (*.answers correspondence archive only) Trim the messages in the 
archive.  It's useful to run this in an emacs shell or another window with 
scrollback capability, or to redirect its output into a file.

 % $NADIR/trim-folder.pl 
<zero or more messages, as described below> 

 The originals of any messages that are modified by the script are given 
".#" prefixes.  If necessary, you can use the original to recover from any 
errors (as described below) reported by the script.

 The script can print the following messages:

 "Opening $NADIR/Mail/archive: <error>." (this happens rarely)

 A system error occurred trying to open the folder to be trimmed.  
Investigate and take appropriate action to fix the problem.

 "<msg number> has size <number of bytes in file>."

 The indicated message is big enough that it probably needs to be trimmed, 
but the script's heuristics weren't able to figure out how to trim it.  
Load the message into an editor and trim it as appropriate, leaving 
something like "[deleted]" in place of any deleted text.  If you use Emacs 
to do the trimming, don't forget when you're done to delete any "~" files 
you leave behind.

 "Opening <msg number>: <error>." (this happens rarely)

 A system error occurred trying to open a message.  Investigate and take 
appropriate action to fix the problem.

 "Error opening <msg number>.new for write: <error>." "Error writing to 
<msg number>.new: <error>." "Error closing <msg number>.new: <error>." 
"Error renaming <msg number>: <error>." "Error renaming <msg number>.new: 
<error>."

 (these happen rarely)

 An error occurred while trying to open, write to, or move into place, a 
trimmed message file.  Investigate and fix the problem.

10) Use "rm *~" and "expunge" to get rid of the original files which were 
modified or removed in earlier steps.

11) Use the "du ." command again to check if the archive is still big 
enough that it needs to be packed.  If it isn't, proceed to the next 
archive and leave this one alone for now.

12) Pack the archive:

% packf +$NADIR/Mail/archive -file /tmp/archive.packed 
Create file "/tmp/archive.packed"? yes 

If the archive is particularly large, there may not be space in /tmp for 
it.  In that case, attach the "bitbucket" locker (MIT's AFS-wide, large 
temporary directory, cleaned out every few days but handy for times like 
this) and use that:

% attach bitbucket
% mkdir /mit/bitbucket/news-answers
% packf +$NADIR/Mail/archive -file /mit/bitbucket/news-answers/archive.packed
Create file "/mit/bitbucket/news-answers/archive.packed"? yes

13) If, after this stage, the packed archive is less than 500k in size, 
stop here and move on to the next archive to be packed (don't forget to 
remove /tmp/archive.packed or /mit/bitbucket/news-answers/archive.packed, 
though).

14) Split the packed file into 500k chunks:

% cd /tmp
 OR
% cd /mit/bitbucket/news-answers
% $NADIR/split-packf.pl archive.packed 

This will create packf files "xaa", "xab", etc.  All but the last one will 
be approximately 500k in length.  For example:

% ls -l xa? 
-rw-rw-r-- 1 jik 502154 Jun 7 19:04 xaa 
-rw-rw-r-- 1 jik 236427 Jun 7 19:04 xab 

15) For each of the xa? files but the last one, figure out what date it 
should be given in the "packed" subdirectory, move it into place, and then 
gzip it.  See the files in the "packed" subdirectory of each archive to 
determine their name format; the month is that of the last message in the 
file:

% grep ^Date: xaa | uniq | tail 
Date: Sat, 30 May 1997 16:51:50 +0000 
Date: Sat, 30 May 97 19:14:03 EST 
Date: Sat, 30 May 1997 01:42:00 +0100 
Date: Sat, 30 May 97 19:51:35 EST 
Date: Sun, 1 Jun 97 19:53:28 EST 
Date: Sat, 30 May 97 18:52:41 EST 
Date: Sun, 1 Jun 97 18:52:38 EST 
Date: Sun, 1 Jun 97 20:15:23 EST 
Date: Sun, 1 Jun 97 17:48:29 -0800 
Date: Sun, 1 Jun 1997 21:10:22 -0500 (EST) 
 
% ls $NADIR/Mail/archive/packed/*.9706.* 
No match.
% mv xaa $NADIR/Mail/archive/packed/news.answers.9706.01 
% gzip --best !$ 
gzip --best $NADIR/Mail/archive/packed/news.answers.9706.01 

16) (faq-maintainers and faq-maintainers-announce archives only) 
Also put a copy of each full xa? file (with the same names you used to put 
them into the "packed" subdirectory, but compressed instead of gzipped; 
i.e., "faq-maintainers.9???.?.Z" or "faq-maintainers-announce.9???.?.Z") 
into ~ftp/pub/faq-maintainers, the public archive of the mailing lists, 
and generate a scan file to go there too.

17) Load the last xa? file into Emacs, or view it with "more" or 
something, to figure out what the first message in it is.  Then, find the 
same message in the archive, and delete all of the messages up to (but not 
including) that one in the archive using rmm.  You can grep the last xa? 
file for '^Date:' and use 'wc' to find out how many messages are in it, 
count back that many messages from the last message in the archive, and 
scan the messages near that one to find the specific one you want.

 Be very careful that you find the correct message to delete up to, and 
that you don't delete that message itself, so that no messages are lost in 
this process.

For example:

% grep '^Date:' xab | wc
18 2839 19874

This indicates that there are 18 messages in xab.  If 'scan 
+$NADIR/Mail/archive' shows 350 messages, then try

% scan +$NADIR 320-350

to look for the message you need.  Check its Message-ID against the first 
one in the last xa? file, to be sure.  Then delete everything up to, but 
not including, that one (say it's message 326 in this case):

% rmm +$NADIR/Mail/archive 1-325
% cd $NADIR/Mail/archive
% expunge

The "expunge" is necessary because rmm only renamed removed messages with 
a ".#" suffix; "expunge" will get rid of those files.  

19) Delete the unneeded files from /tmp or /mit/bitbucket:

% rm /tmp/archive.packed /tmp/xa?
  OR
% rm /mit/bitbucket/news-answers/archive.packed 
% rm /mit/bitbucket/news-answers/xa?

This is important because when you go to pack the next archive, you want 
to be sure not to accidentally append to /tmp/archive.packed (or whatever 
file name you use), rather than creating a new one, so you want to make 
sure to delete it.

At this point, the archive is packed.  Confirm it using 'du .'.

======== 2.6. Maintaining the FAQ server

The FAQ server uses the post_faq utility, with an added front-end to 
process incoming commands.  Maintenance of the FAQ server consists mainly 
of answering questions about it and handling the occasional problems that 
crop up.  The source code for the server is located in $PPDIR/faq_server, 
and the README file in that directory documents the server and the 
maintenance associated with it.

======== 2.7. Maintaining the autorepliers

======== 2.7.1. The procmail-based email filter and autoreply setup

In February 1995, a Procmail-based filter was placed upstream of the 
existing incoming-message handling and queuing programs.  In addition to 
generating automated return-receipts for all *.answers-related 
correspondence, it repairs the headers of news articles submitted from 
sites running outdated software (most importantly, restoring the correct 
From: line), and then passes the automatically-acknowledged, possibly 
repaired, messages to the existing program 
/usr/local/news.answers/bin/file-message (on penguin-lust).

In early 1997, this procmail filter was greatly expanded.  A number of 
sets of rules were added which attempt to identify certain types of "junk 
mail" and reply with appropriate messages.  Note that these filters are 
intentionally rather loose, and they are not applied to any message which 
was sent to a "-request" address (news-answers-request, faq-maintainers-
announce-request, etc.).  All incoming messages are stored in the 
correspondence archive, but messages identified as junk mail are NOT sent 
to the moderation queue.

The files and executables used in this filtering system are all on 
penguin-lust and include:

/usr/local/news.answers/bin/procmail-news-answers-handler
   A Bourne-shell script that calls the procmail program with the incoming 
   mail stream as standard input.  It also performs post-filtering 
   processing on associated data and log files.

/usr/local/news.answers/bin/procmail
   The main filter executable for the Procmail package, which is a rugged 
   freeware mail filtering tool that has become a de-facto standard among 
   many other moderators, including Dave Lawrence of 
   news.announce.newgroups.

/usr/local/news.answers/bin/formail
   A mail-header formatting filter provided with the Procmail 
   distribution.  It is principally used to generate mail headers for 
   reply messages by changing existing headers and possibly adding new 
   ones.

/usr/local/news.answers/bin/headstrip
   A sed script that repairs the From: line and merges headers in news 
   articles submitted by sites running obsolete software (B-News, in 
   particular).

/usr/local/news.answers/bin/lockfile
   A very rugged, NFS-proof, lockfile-generating program provided with the 
   Procmail distribution.

/usr/local/news.answers/bin/autoreply
   A Perl script that sends a reply form-letter in response to incoming 
   correspondence that makes it to the queue.  It expands variable tokens 
   in the form-letter with information concerning the depth of the 
   *.answers incoming mail queue, the date of receipt, and the date of the 
   first message in the incoming mail queue (indicating how far behind we 
   are :-).

   This is a *.answers-moderation-team-originated and maintained program 
   whose RCS-controlled source may be found under:

   $NADIR/autoreply

   Changes should be made to this file first.  Someone with root 
   privileges (usually the chief moderator or one of the SIPB staff) then 
   needs to copy the updated file to /usr/local/news.answers/bin (this 
   scheme is similar to how /etc/aliases is maintained, and is motivated 
   by concerns over system security).

/usr/local/news.answers/etc/autoreply-text
/usr/local/news.answers/etc/autoreply-*
   Automatic reply messages.  "autoreply-text" contains unexpanded 
   variable tokens of the form %NAME% and is used by the autoreply script, 
   sent in response to messages which make it to the queue.  The others 
   are sent in response to various types of "junk mail".  The original, 
   RCS-controlled copies are found in

   $NADIR/autoreplies/

   and should be edited first, then copied to 

   /usr/local/news.answers/etc/.

/usr/local/news.answers/bin/file-message
   The original handling program.  It is executed by the last rule in the 
   Procmail rules file (which is always true) and places incoming messages 
   into $NADIR/Mail/inbox and $NADIR/Mail/archive.

/usr/local/news.answers/etc/procmailrc-na
   The filtering rules for the *.answers-specific E-mail handling system.  
   This too is a *.answers-moderation-team-originated and maintained file 
   whose RCS-controlled source may be found under:

   $NADIR/procmailrc-na


   For more information about the syntax of the Procmail filter rules 
   (which are based on Unix regular expressions), consult the manual pages 
   for the Procmail distribution, which may be found under:

   /usr/local/news.answers/man

   In particular, see man5/procmailrc.5 and man1/procmailex.1.

/var/spool/lopipusr/daemon-logs/procmail.log
   This contains at least the past few day's worth of incoming E-mail 
   diagnostic and log messages.  The moderators should probably check the 
   contents of this file every few days for significant errors (such as 
   filter programs crashing or not executing, mail bouncing, etc.), 
   particularly after tweaking the procmailrc-na file.  Depending on how 
   stable the overall system currently is (usually inversely proportional 
   to how recently it was changed) and how much diagnostic output is 
   desired, the VERBOSE variable in the rules file may be set or unset as 
   appropriate.

Since the rules file contains the names of moderators, it should be 
periodically updated as moderators come and go (principally to prevent 
moderators from having to wade through autoreply messages to their own 
mail).

There is at least one known bug that causes the reply message not to be 
sent to America On-Line addresses (aol.com and aol.net).  The reason is 
that part of the loop-detection and avoidance macros in Procmail checks 
for "root" in the From: address and does not reply to messages that 
contain it.  The news server at AOL sends submissions to moderated 
newsgroups with a From: address of "root@aol.net" (contrary to the 
accepted practice of originating such messages from the "news" account).  
Since AOL submitters usually need the information in the reply letter the 
most, this is a significant problem :-).  As of the writing of this 
section (April 1995), E-mail queries to the AOL administrators regarding 
this problem have not yet been answered, nor has the From: address in AOL-
originated articles been fixed.

The format of the rules file, with individual rules being essentially 
independent of one another, means that new filter rules can be added 
relatively safely without the risk of crashing procmail or losing incoming 
messages.  (Also, since the operational files are in a root-privileged 
directory hierarchy, someone qualified will have to review them before 
they are installed, anyway).  The rules file is clearly marked as to where 
to insert new rules.

Since the *.answers system benefited greatly from its inputs, it would 
probably be a good idea to share our useful tools with the Moderators' 
tools archive at ftp.sterling.com (maintained by the comp.sources.misc 
moderator, Kent Landfield, kent_landfield@sterling.com), both initially 
and as they are updated.

The system was installed and configured by Ping Huang and Paul Schleck 
(pshuang@mit.edu and pschleck@gonix.com, respectively), and later expanded 
by Pam Greene (pgreene@optics.rochester.edu).  Please feel free to contact 
any of them if you are interested in programming additional rules for the 
*.answers Procmail-based filtering system.

======== 2.7.2. The automated FAQ-checker

The automated FAQ-checker described in the *.answers submission guidelines 
uses its own address, news-answers-submit@rtfm.mit.edu, which sends 
submissions to the autochecker script.  The RCS-controlled original of 
this script lives in $NADIR/autochecker, and the local copy is in 
/usr/local/news.answers/bin/autochecker.  That script in turn uses the 
command

save-faq -submitted

(actually, it uses the local copy, in /var/spool/FAQ_archiver/save-faq).  
All incoming submissions are archived, but only ones which pass the tests 
appear in the queue.  They are given the new Subject "[checked]", to allow 
them to be easily pulled out of the queue; their original To: and Subject: 
headers are preserved as Old-To: and Old-Subject.  Checked submissions 
should be given priority over other submissions.  Note that the FAQ-
checker is purely syntactical; it cannot check the appropriateness of the 
Subject or Archive-name, so you should make sure those are reasonable 
before giving a checked submission final approval.

======== 3. Reference

======== 3.1. Documentation files you should be aware of

You should read and try to remember (at least in general) what's in the 
following files, in addition to the documentation files mentioned 
elsewhere in this document.  They tell you what files in corresponding 
directories are for.  If you add, delete, or drastically change the 
purpose of files in these directories, you should update these files.

$NADIR/README

 Documents many of the files in $NADIR.

$PPDIR/README $PPDIR/src/README $PPDIR/data/README

 Documents many of the files in $PPDIR, plus how to set up LoPIP scripts 
operation.

$NADIR/README.rtfm

 This file tries to document everything that must be present on 
rtfm.mit.edu in order for all of the *.answers stuff to work.  It is 
primarily meant to serve as a guide for restoring *.answers functionality 
to rtfm.mit.edu if it crashes without backups or something, or to serve as 
a checklist to make sure nothing was forgotten if/when rtfm.mit.edu is 
moved to a new machine.

======== 3.2. Lists supporting *.answers moderation

The following groups and mailing lists on Athena exist to support 
*.answers moderation efforts.  These groups are controlled by Athena's 
Moira database, and they can be manipulated with the "blanche" or "moira" 
program (the "update-lists" script mentioned below uses "blanche", and you 
should use that, rather than using moira or blanche directly, to update 
the faq-maintainers-announce list).  See the man pages for more 
information.

In particular note that although Moira database updates roughly occur in 
real time (provided Moira hasn't crashed, in which case you'll see a 
timeout), the program which converts relevant information in the Moira 
database to a gigantic mail alias for the mailhubs (primarily the hosts 
mit.edu and athena-as-well.mit.edu, which is the MX target for the 
canonical name athena.mit.edu) only runs once a night.

 List: faqs

Description: Contains all Kerberos principals (i.e., Athena usernames) 
     currently doing *.answers moderation 
This is the group used to make *.answers directories accessible to all 
moderators.  It corresponds to the AFS group "system:faqs" and the UNIX 
group "faqs".  It's also the mailing list used to send mail to all the 
moderators without it being archived or queued.  (Owner: itself)

Note that although the AFS group faqs in the Athena AFS cell is updated 
automatically when the faqs group is updated in Moira, the corresponding 
faqs group (which is also owned by itself) in the SIPB AFS cell isn't 
updated automatically, and therefore needs to be updated by hand when 
someone is added to or deleted from the faqs group.

IMPORTANT NOTE: Never use the faqs@mit.edu mailing list in E-mail that is 
seen by anyone other than the maintainers of *.answers.  We don't want 
anyone else to know about it, because once people see the address they'll 
either intentionally or unintentionally send mail to it at one point or 
another in order to contact the *.answers moderators, and said mail won't 
get archived.

 List: news-answers

Description: Official submission address for *.answers.  
Contains the faqs list and the addresses news-answers-archive@rtfm.mit.edu 
and news-answers-incoming@rtfm.mit.edu, respectively for archiving and 
queueing incoming submissions and correspondence.  (Owner: faqs)

 List: news-answers-request

Description: Contains the news-answers list.  
This is the official moderator contact address for news.answers.  (Owner: 
faqs)

 List: faq-maintainers

Description: Used to be the faq-maintainers mailing list.  
The list is now being maintained on a majordomo at faqs.org, so this 
list is no longer in use.  For subscription help, send email to faq-
maintainers-request@faqs.org with "help" in the Subject; to contact 
the list maintainers, use owner-faq-maintainers@faqs.org .

 List: faq-maintainers-announce

Description: The faq-maintainers-announce mailing list as publicized.  
This is now a submission address which contains the list faq-maintainers-
request; messages sent to this list must be manually forwarded to the 
actual distribution, thus creating the illusion of a moderated mailing 
list.

 List: fma-actual-distribution

Description: The mailing list containing those who wish to subscribe to  
     faq-maintainer-announce.  
The contents of this list is in $NADIR/faq-maintainers-announce, and it's 
updated with $NADIR/update-lists as described below.  (Owner: faqs)

 List: faq-maintainers-request

Description: Contains the news-answers list.  
The former admin address for the faq-maintainers list, kept active because 
we still get requests on it now and again.  (Owner: faqs)

 List: faq-maintainers-announce-request

Description: Contains the faq-maintainers-request list.  
The admin address for the faq-maintainers-announce list.  (Owner: faqs)

The following *.answers-related mailing lists exist on rtfm.mit.edu.  They 
can be updated only by someone with root access to rtfm (if you don't have 
said access, send mail to the rest of the moderators [one of whom should 
have such access], or to rtfm-maintainers@mit.edu if it's urgent, which is 
unlikely to be the case).

 List: faq-maintainers-announce-redist

Description: A sub-list of the faq-maintainers-announce list, containing 
addresses with slashes in them, because moira won't allow addresses with 
slashes in them in its mailing lists.

 List: news-answers-incoming

Description: Feeder for the *.answers queue in $NADIR/Mail/inbox.

 List: news-answers-archive

Description: Feeder for the *.answers archive in $NADIR/Mail/archive.

 List: faq-maintainers-archive

Description: Feeder for the faq-maintainers list archive in  
     $NADIR/Mail/faq-maintainers.  
Archives list messages only, not administrative correspondence.

 List: faq-maintainers-announce-archive

Description: Feeder for the faq-maintainers-announce list archive in  
     $NADIR/Mail/faq-maintainers-announce.  
Archives list messages only, not administrative correspondence.

The lists which are described as feeders run scripts to store incoming 
email in their respective directories.  In the case of news-answers-
incoming, the scripts cause an automated reply message to be sent to all 
external, non-junk mail.  For further details about the automated reply 
mechanism, see Section 2.7.1.

======== 3.3. Jobs which run automatically each night

Rather than using cron directly, the whole nightly-jobs system is kicked 
off by a script nightly_jobs.pl (on penguin-lust), which runs all the jobs 
for us, the mail server, and everybody else on rtfm.mit.edu.  The jobs 
themselves are listed in /etc/nightly_jobs.conf.  Only three of those jobs 
affect us directly; their descriptions, below, will make more sense once 
you're familiar with the rest of this guide and the moderation process.

  nightly rkive run /usr/spool/FAQ_archiver/do-rkive 
------------------------------------------------------ 
This is the one that checks the format and integrity of $PPDIR/data/list 
(the LoPIP), snags posts from the news feed, compares them to the LoPIP, 
complains about incorrect postings and stores copies in 
/usr/spool/FAQ_archiver/problems/, archives posts with no problems, and 
sends a report to us by email.

  nightly LoPIP run $PPDIR/src/nightly.csh 
-------------------------------------------- 
This locks copies of our own various periodic postings (*.answers 
guidelines, LoPIP postings, etc.) and runs post_faq on 
$PPDIR/data/faq.config to post them monthly.  It sends the weekly 
"Automatic weekly reminder" messages and, when appropriate, it sends the 
mailing list policy to all subscribers using $NADIR/mail_policy.pl.  If 
the LoPIP was posted, this script also clears the file of news to put in 
its headers, $PPDIR/data/periodic-posting-news .

  FAQ server posting /usr/spool/faq_server/post_faqs.pl 
--------------------------------------------------------- 
Runs the FAQ server.  Locks a copy of each posting, posts it using 
post_faq on /usr/spool/faq_server/faq.conf, and sends a report to us by 
email.

======== 4. Conclusion

If all of this makes the job of being a *.answers moderator seem 
difficult, don't worry, it isn't really.  It may be somewhat time-
consuming, with a non-trivial learning curve, but there isn't anything 
complicated about it, once you get used to it.

-- The *.answers moderation team (news-answers-request@mit.edu)

      dalamb@qucis.queensu.ca (David Alex Lamb)
      n.g.boalch@durham.ac.uk (Nick Boalch)

      jik@cam.ov.com (Jonathan I.  Kamens) [Emeritus]
      pshuang@mit.edu (Ping Huang) [Emeritus]
      pgreene@optics.rochester.edu (Pamela Greene) [Emeritus]
