From halazar Fri Jan  9 12:04:10 1987
Return-Path: davy@ee.ecn.purdue.edu
Date: Fri, 26 Dec 86 22:43:17 EST
From: davy@ee.ecn.purdue.edu (Dave Curry)
To: halazar@media-lab.media.mit.edu
Subject: webster
Status: RO


Well, I finally got a chance to do some hacking on webster.  I completely
rewrote the stuff that deals with the index file; the crap where we read
in 1.2MB bit the dust.  Now it uses dbm files and such, and takes up about
100 kbytes in memory instead of 1300.

Here's the list of things I did that I sent to people here.  If you want the
new stuff, you can get it through anonymous ftp from ee.ecn.purdue.edu.  Go
for pub/webster.tar (make sure you use image mode on the ftp).

Also, I forget if I told you, the new files from SRI-NIC have been fixed and
all the words actually seem to be there.  I strongly recommend you FTP them,
or else send me a tape and I'll put them on it for you (it takes about 4 or
5 hours to FTP the files, even late at night).

- --Dave

- --------------
Well, webster is much better now.  I'm extremely pleased with myself (and
happy, since I just happened to write code the first time that was easy to
massively change this time).

Anyway, here's a summary of changes.  Some of you may not care, but read on
anyway, what the hell.

	1. The new dictionary has been installed.  The previous copy
	   had several holes in it due to damaged files on SRI-NIC's
	   end.  The new copy does not have these holes.

	   Significance: all the words that start with "conv" are now
	   actually in the dictionary.

	2. Webster now uses your personal erase, line kill, and so on
	   characters instead of hardwiring ^H, ^X, and so on.

	   Significance: not much, but someone ragged about it today so
	   I fixed it while I was in there.

	3. Websterd no longer reads in a 1.2 megabyte index file when
	   you connect to it.  It instead uses a dbm-based index for
	   simple lookups, and uses a disk-based word search for
	   wildcard lookups.
	   
	   Significance: the first word is now defined right away
	   instead of several seconds into things; simple lookups take
	   less time in general; wildcard lookups (and Tenex word
	   completion) take slightly longer, but the time is still
	   more than acceptable; webster doesn't pound the crap out of
	   the machine when people run it.

	4. The bug in which the last word in a dictionary file (there
	   are 220 dictionary files) could not be defined, and instead
	   the response was just a blank line, has been fixed.

	   Significance: you people who have a fetish for the word
	   "fylfot" can stop bothering me now. :-)

	5. The "bug" (actually a "fix" for another bug) which caused
	   the server to constantly go away has (I think) been fixed.

	   Significance: you potato heads (bimbies, for you PUCC
	   people) who use this thing to do crossword puzzles can quit
	   calling me now. :-)

	6. The "bug" in which you can only use a numbered response once
	   instead of looking at all words (1, 2, 3, etc.) has not been
	   fixed.  It would be real hard given the design of the
	   program, and I don't think it's useful enough to bother
	   with.

	7. Thanks to Rich Kulawiec, we now have a list of how all the
	   words in the dictionary (all 59,963 of them) should be
	   hyphenated.  Our current theory is that the "syllables"
	   indicated in the definitions (and that often look wrong)
	   are actually indications of the acceptable places to split
	   a word between lines.

There are several files available from the dictionary on the ee.ecn.purdue.edu
machine in /m/webster/misc.  Included are the pronunciation guides, a list of
all the suffixes, a list of all the prefixes, the hyphenation guide, a list
of all the palindromes in the dictionary, and so on.  If you have an ee.ecn
account feel free to examine these.  If you don't, and really want the file,
send me mail.

Enjoy.  Please send any bug reports to me.

- --Dave


From mike  Fri Jan  9 14:08:35 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA10125; Fri, 9 Jan 87 14:08:35 EST
Date: Fri, 9 Jan 87 14:08:35 EST
From: Michael Hawley <mike>
Message-Id: <8701091908.AA10125@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: collins
Status: O

hm, well, since jrd has made this list, here's some info for you.
i have a collins english dictionary on tape (undecoded)
and here is what other people know about it:

----------------------------------------------------------------------------
>From vtisr1!irlistrq@seismo.CSS.GOV Fri Aug  8 10:55:37 1986
Received: from bellcore.ARPA by petrus.bellcore.com (4.12/4.7)
	id AA23072; Fri, 8 Aug 86 10:55:18 edt
Return-Path: <vtisr1!irlistrq>
Received: from vtisr1.UUCP by seismo.CSS.GOV with UUCP; Fri, 8 Aug 86 09:06:57 EDT
Date: Fri, 8 Aug 86 09:06:57 EDT
From: vtisr1!irlistrq@seismo.CSS.GOV
Message-Id: <8608081306.AA29234@seismo.CSS.GOV>
To: fox@seismo.CSS.GOV
Subject: IRList Digest V2 #33

IRList Digest           Thursday, 7 August 1986      Volume 2 : Issue 33

Today's Topics:
   Discussion - Machine Readable Collins Dict., Job at Leeds Univ.

----------------------------------------------------------------------


Date:     24-JUL-1986 23:09:36
From:     RAHTZ%UK.AC.OXFORD.VAX1@AC.UK
Subject: The Machine-Readable Collins English Dictionary, Job at Leeds

                The Machine-Readable Collins English
                     Summary of work in progress

                        Sebastian Rahtz
                Department of Computer Studies
                   University of Southampton

1. Introduction:
This short document summarizes the responses I had to a letter sent
out in June 1986 to all the people who have ordered a tape of Collins
English Dictionary from the Oxford Text Archive (my thanks to Lou
Burnard for the list of names and addresses). I am grateful to all
those who replied to my request for information about how they had
decoded the text, and what they were doing with it; since it was
apparent that quite a lot of work had been done, and that some were
much further on than others, it seemed sensible to send out a summary
of the replies. I have either included sections of electronic mail directly,
or summarized paper mail. ...


2. Philip Taylor, University of London
  Date:	1-JUL-1986 11:23:07
  From:	CHAA006@UK.AC.RHBNC.VAXA

I carried some some work on transliterating the dictionary from photo-
typesetting codes to a more useable form some years ago, when I first received
the tape.  I had two objectives:- (1) to provide an online English-language
HELP system, using VMS help, for all entries in the dictionary, and (2) to
integrate the dictionary into the Dennison spelling checker (which also runs
on the VAX).  Neither of these projects was 100% successful, but the
intermediate results may be of some use to you.  (As part of (1), I also
implmented the core of the IPA on a Mellordate DT80/1 (VT-100 look-alike),
with reasonable success).

I should be happy to pass on all the work I have done, provided only that any
publications resulting from this work acknowledge the various contributors, and
that any further work which you carry out should be equally freely available
among the Academic community.

Philip Taylor (RHBNC, Univ. of London) [CHAA006@UK.AC.RHBNC.VAXB]

2.1 Pascal programs

Here are the more useful files from my work on the Collins English
Dictionary; they are written in Pascal and Macro-32.  The programs TYPESET,
DECRYPT and PARSE are good starting points.  TYPESET, as is, will produce quite
acceptable output even on unmodified VT100s; if you have any DT80/1s, I can
copy the IPA ROM for you, and the output will then be as close to Collins type-
set form as I was able to achieve within the time available.

If you have no DT80s, I could let you have the IPA in 8*8 dot-matrix form, and
you could burn it into ROMs for whatever devices you do have.


3. Ian Ellis, University of New England
  From:	ian%oz.neumann@oz.munnari  3-JUL-1986 03:41
  Date: Thu, 3 Jul 86 11:30:10 est

Thank you for your letter regarding CED. As yet no one on this Campus has
tried to use CED other than a list of words. We did try to figure some of
the symbols and produce a database but lack of user pressure has allowed
us to put it on the back burner

Ian Ellis,
Director,
Computer Centre,
University of New England


4. Edward Fox, Virginia Tech
  From:	vtisr1!fox@gov.css.seismo  3-JUL-1986 08:06

You have hit the jackpot!

I have worked with several students during the last year on
the Collins English Dictionary.  One completed his M.S. project
specifically on this.

We are almost done with production of a database, that can be
used from Prolog or from any relational database system, and
probably modified for other systems.  I hope to be sending a tape
to Oxford Text Archive by the end of August.

Ed Fox (BITNET[cheapest]:foxea@vtvax3 or foxea%vtvax3.bitnet@wiscvm.arpa;
  CSNET:fox@vt;Internet:fox%vtisr1.uucp@seismo.css.gov;UUCP:seismo!vtisr1!fox)
      Dr. Edward A. Fox; Dept. of Computer Science; 562 McBryde Hall
      Virginia Tech, Blacksburg VA 24061; (703) 961-5113 or 6931

   We have done everything EXCEPT for the phonetic and etymology
information - I hope you don't need them!  All I have so far is
the MS report - ...


5. David Eckersley, University of Salford
  Date:     MON, 07 JUL 86 13:57:38 GMT
  From:     D_ECK@UK.AC.SALFORD.SYSC
   University of Salford Computing Services:  Dr J B Slater, Director

I reply to your letter of June 24th concerning the Collins English Dictionary
from the Oxford Text Archive.  I'm afraid we have for the time being shelved our
plans for using this data.  The person who was do to the work left us, and I
have not taken it up.  We did not manage to attach any consistent meanings to
the embedded codes in the text.

D Eckersley
(Secretary, IUSC)


6. Eric Atwell, University of Leeds
  From: E S Atwell [eric@uk.ac.leeds.ai]
  Date: Tue, 15 Jul 86 13:26:11 bst

I'm afraid I haven't done anything of use to
you with the CED tape: I got it mainly to evaluate it and compare
it  to  the  machine-readable versions of two other dictionaries,
the Oxford Advanced Learner's Dictionary (OALD) and  the  Longman
Dictionary  of  Contemporary  English  (LDOCE).  I am researching
into aspects of parsing and grammatical analysis  of  unresticted
`raw'  text,  for which a large non-`toy' dictionary is required.
Each  word  in  the   dictionary   needs   detailed   grammatical
information; and the grammatical codes used in OALD and LDOCE are
far more refined and detailed  than  those  of  CED,  so  I  have
concentrated  work on the other two.   In fact, LDOCE has already
been converted into a database-type  format,  and  this  form  is
available  for general (including commercial) research, though at
a price - at the Alvey workshop on linguistic theory and computer
applications  at  UMIST  last  september, a figure of pounds 30,000 was
mentioned!  As an alternative, I have a copy of  the  OALD  tape,
and  last  year  I  got  one  of  our undergraduates to attempt a
reformatting of this as a Third Year Project.  Unfortunately,  he
did  not get as far as a form worthy of general distribution, but
after graduating he stayed on here  over  the  summer  to  finish
parsing  the  original file; the end result is exemplified by the
sample at the end of this letter.

I am currently trying to get some funding from OUP to carry  this
work   further   (in  collaboration  with  Prof  Sampson  of  the
linguistics  dept.  and  Tony  Cowie  from  our  English   dept.)
However, if you are committed to using the CED, I suggest you get
in touch with the Speech research group at IBM Scientific  Centre
in  Winchester;  they  have  extracted a quarter-million wordlist
from CED I believe, with grammatical part-of-speech and  phonetic
transcription  codes  (but  with  other  fields ignored); the CED
phonetic transcriptions are, they say, better than those of  OALD
or LDOCE, which is why they are 'out on a limb' in the sense that
most other researchers i know of are using OALD or LDOCE.


 Eric Steven Atwell                  Artificial Intelligence Group
                                    Department of Computer Studies
 phone: +44 532 431751 ext 6307/6119              Leeds University
 JANET: eric@uk.ac.leeds.ai                          Leeds LS2 9JT
 UUCP:  ...!seismo!mcvax!ai.leeds.ac.uk!eric               England
 EARN/BITNET/ARPA: eric%uk.ac.leeds.ai@rl.earn

EXAMPLE OF PARSED REFORMATTED OALD FILE:
headword			    :B
alternative spelling of headword    :b
pronunciation			    :bi
+++++++start of pieces+++++++
conjugation or plural label	    :pl
conjugation or plural spelling	    :B's
conjugation or plural spelling	    :b's
pronunciation			    :biz
__________definition__________
text				    :the second letter of the English alphabet.
**********end of entry**********
headword			    :baa
pronunciation			    :bq
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :cry of a sheep or lamb.
***change in part of speech***
word class label		    :vi
text				    :(baaing, baaed or baa'd /bqd/) make this
                                      cry; bleat.
====subentry====
derivative			    :%@-lamb
word class label		    :n
---subentry definition---
text				    :child's word for a sheep or lamb.
**********end of entry**********
headword			    :baas
pronunciation			    :bqs
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :(S Africa) boss.
**********end of entry**********
headword			    :babble
pronunciation			    :%babl
+++++++start of pieces+++++++
word class label		    :vi
word class label		    :vt
__________definition__________
verb pattern			    :2A
verb pattern			    :2B
verb pattern			    :2C
text				    :talk in a way that is difficult to
                                     understand; make sounds like a b
__________definition__________
verb pattern			    :6A
verb pattern			    :15B
====subentry====
idiom				    :@ (out)
text				    :, repeat foolishly; tell (a secret):
                                     @ (out) nonsense/secrets.
***change in part of speech***
word class label		    :n
nountype			    :U
text				    :childish or foolish talk; confused talk
                                     not clearly to be understoo
__________definition__________
text				    :gentle sound of water flowing over
                                     stones, etc.
====subentry====
derivative			    :bab.bler
pronunciation			    :%bablE(r)
word class label		    :n
---subentry definition---
text				    :person who @s, esp one who tells secrets.
**********end of entry**********
headword			    :babe
pronunciation			    :beIb
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :(liter) baby.
__________definition__________
text				    :inexperienced and easily deceived person.
__________definition__________
text				    :(US sl) girl or young woman.
**********end of entry**********
headword			    :babel
pronunciation			    :%beIbl
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :the Tower of B@, tower built to reach
                                     heaven. (Gen 11).
__________definition__________
text				    :(sing with indef art) scene of noisy and
                                     confused talking: What a @
**********end of entry**********
headword			    :ba.boo
alternative spelling of headword    :babu
pronunciation			    :%bqbu
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :(as Hindu title) Mr; Hindu gentleman;
                                     Hindu clerk; (old use, pej) H
**********end of entry**********
headword			    :ba.boon
pronunciation			    :bE%bun
US pronunciation		    :ba-
+++++++start of pieces+++++++
word class label		    :n
__________definition__________
text				    :large monkey (of Africa and southern Asia)
                                     with a dog-like face.
cross reference		    	    :the illus at ape
**********end of entry**********
headword			    :baby
pronunciation			    :%beIbI
+++++++start of pieces+++++++
word class label		    :n
conjugation or plural label	    :pl
conjugation or plural spelling	    :-bies
__________definition__________

6.1 Further remarks
It will be interesting to see what others are doing with CED  and
other  dictionary  tapes,  so  please do circulate your findings.
You may like to join Euralex, the European association for
lexicography,  and  find other related work through their bulletin (I
assume you are not already a member as your name did  not  appear
on  the  recent  membership list).  For details contact RRK
Hartmann, Language Centre, Exeter  University,  Exeter  EX4  4QH  (no
JANET address that I know of!)

I would also like to hear how your 3rd year project student  gets
on.   TEFL  students  might  prefer  a ``browser aid" for LDOCE or
OALD,  as  these  as  specifically  designed  for  2nd   language
learners;  in  my previous job at Lancaster University, I wrote a
browser aid for the LDOCE which ELT MA students could  use.   The
speaking  CED  sounds  a  great idea.  A major problem with
`off-the-shelf' speech synthesisers is that they have no way  of
producing  varied ``listenable" intonation contours for sentences and
longer texts; but this problem is neatly sidestepped in a talking
dictionary,  as most fields (keyword, part of speech, spelling)do
not require smooth continuous speech, and the  definition  fields
tend  to  be  short  sentences or sentence-fragments where a very
simple intonation contour would be quite acceptable to the  user.
Even  so, as you suggest, it is still quite ambitious for a third
year project!

6.2 an interesting job
  From:	E S Atwell [eric@uk.ac.leeds.ai] 22-JUL-1986 16:10
  Subj:	vacancy for NLP/AI/OR Software Engineer

I am collaborating with Professor Sampson on a Parsing research project, and
we have just had the go-ahead to advertise for a software engineer to work
with us on the project.  I would be most grateful if you could bring the
following details to the attention of any potential candidates you know of.

********* UNIVERSITY OF LEEDS ****** ANNEALING PARSER PROJECT *********
Applications are invited for a post of SOFTWARE ENGINEER, to work on a project
developing a parser for unrestricted English using the connexionist technique
of simulated annealing.  The project (funded by the Joint Speech Research
Unit) is supervised by Prof. Geoffrey Sampson of the Linguistics & Phonetics
Department (where the post will be tenable) and Eric Atwell of the Computer
Studies Department.  The person appointed will be working on a SUN-3/52M
Workstation dedicated to his/her use.  Candidates should have a good honours
degree; experience with natural language analysis, and of programming in a
Unix environment, will be advantages.

The post is available from 1 October 1986 for a fixed term of up to 3 years.
Starting salary will be within the range 8020 to 9495 pounds (under rev
Other-Related IA Grade, according to age, qualifications, and experience.

Informal enquiries may be made to Prof. Sampson on (0532) 431751 ext.6252;
or by electronic mail to Eric Atwell, eric@Leeds.AI via JANET or
eric%UK.ac.Leeds.AI@RL.EARN via EARN or BITNET.  For application forms and
further particulars write to the Registrar, The University, Leeds LS2 9JT,
quoting reference no. 14/20.
****** The closing date for applications is 14 AUGUST 1986 ******

Leeds University is one of the largest and most influential universities in
the country.  Leeds itself is the commercial, social and sporting centre for
much of North and West Yorkshire; it has all the facilities you would expect
of a major city, yet the outskirts of Leeds lead directly out onto 2,00 square
miles of outstandingly beautiful countryside.  Leeds also offers some of the
cheapest housing in England; for example, pounds 15,000 buys a two-bedroomed
se or a larger terraced house.

Simulated Annealing, a technique originating in statistical mechanics, can be
used in operational research and artificial intelligence in optimisation
problems requiring an efficient search of a very large search space.  We plan
to apply this technique to parsing unrestricted English, where the search
space is a set of trees.  The appointee will find a stimulating research
environment at Leeds: the University is a thriving centre for research in
Computer Analysis of Language and Speech, Artificial Intelligence, Operational
Research and related areas.  In addition to her/his dedicated workstation, the
appointee will have access to a wide range of equipment and software,
including specialist Departmental libraries, a VAX 11/750 dedicated to
Artificial Intelligence research, and a spacious SUN LOUNGE with a network of
Suns, fileserver, laserprinter, and large south-facing windows.


7. Ron Hardie, Brighton Polytechnic
[summary of letter] has only just started thinking about CED;
interested to hear what others are doing.

Ron Hardie, Department of Modern Languages, Brighton Polytechnic,
Brighton BN1 9PH


8. Herbert Wenzel, Erlangen
[summary of letter] writing text retrieval system for PC, now
integrating dictionary, but has problems physically reading tape (sent
suggestion).

Professor H. Wenzel, Institut fEuLr Technische Chemie II,
Egerlandstr. 3, Erlangen, W. Germany.


9. Roger Mitton, University of London
[summary of letter] looked at Collins dictionary but finds the Oxford
Advanced Learner's more useful. Has produced a database from the
OALDCE as part of research into spelling checking, which is available
from the Oxford Text Archive.

Roger Mitton,
Dept Computer Science, Birkbeck College, Malet Street, London WC1E 7HX

------------------------------

END OF IRList Digest
********************


>From vtisr1!irlistrq@seismo.CSS.GOV Fri Aug  8 15:51:43 1986
Received: from bellcore.ARPA by petrus.bellcore.com (4.12/4.7)
	id AA28751; Fri, 8 Aug 86 15:51:34 edt
Return-Path: <vtisr1!irlistrq>
Received: from vtisr1.UUCP by seismo.CSS.GOV with UUCP; Fri, 8 Aug 86 14:28:54 EDT
Date: Fri, 8 Aug 86 14:28:54 EDT
From: vtisr1!irlistrq@seismo.CSS.GOV
Message-Id: <8608081828.AA11935@seismo.CSS.GOV>
To: fox@seismo.CSS.GOV
Subject: IRList Digest V2 #34

IRList Digest           Friday, 8 August 1986      Volume 2 : Issue 34

Today's Topics:
   Discussion - Differences between document files (diff -b)
   Announcement - Advance Program of ACM SIGIR 1986 Int'l Conf. (Pisa)
   Call for Papers - ACM SIGIR 1987 Int'l Conf. on R&D in IR
   COGSCI - Knowledge Bases as Qualitative Models 

----------------------------------------------------------------------

From: seismo!hplabs!pesnta!lsuc!dave
Date: Sun, 3 Aug 86 07:34:50 pdt
Subject: IRList Digest V2 #32 [Note: see issues 24, 31 too - Ed]

Re: significant differences
A rather trivial point, but the UNIX "diff" command has
a "-b" option which causes it to ignore differences which
are only in the blanks and tabs (whitespace).

Dave Sherman
The Law Society of Upper Canada
dave@lsuc.UUCP

------------------------------

Date: Fri, 8 Aug 86 13:43:23 EDT
From: fox
Subject: ACM SIGIR-86 Conference in Pisa, Italy

[Note: the following was typed based on program in upcoming issue of
 ACM SIGIR Forum. Await receipt of that for more details, or contact
 G. Salton <gs@cornell.arpa> with questions. - Ed]


                           ADVANCE PROGRAM

          1986 -- ACM Conference on Research and Development in
                         Information Retrievel

           Palazzo des Congressi, Via Matteotti, 1, Pisa ITALY
                        September 8-10, 1986

    Sponsored by Italian National Research Council in cooperation with
          ACM SIGIR    AICA-GLIR    BCS-IRSG    IDI     ESA-IRS


SUNDAY Sept. 7, 1986
 16:00 - 21:00 Conference Registration (18-19:00 welcoming drink)

MONDAY Sept. 8, 1986
 8:00 - 9:00   Conference Registration
 9:00 - 9:30   Opening Session
 9:30 - 10:30  Keynote Speech - Recent trends in automatic IR (G. Salton)
 10:30 -11:00  Chairman: F. Rabitti
             Using structural representation of anomalous states of knowledge 
             for choosing doc. retrieval strategies (N.J. Belkin, B.H. Kwasnik)
 11:30-13:00   OFFICE SYSTEMS - Chairman F. Rabitti
             Doc. presentations and query formul. in Muse (Gibbs,Tsichritzis)
             Approach to multimedia inf. mgmt. (Gallelli,Iacobelli,Marchisio)
             Method. issues for the design of an office information server
                (Truckenmuller,Rathgeb)            
 14:30-16:00   USER INTERFACES - Chairman W.B. Croft
             IR, NLP, AI and UFOS: or IR-relevance, Natural Language Problems, 
                Artful Intelligence and User-Friendly Online Systems (Doszkocs)
             Visual display of info. in an IR environment (D. Crouch)
             Improved subject access, browsing and scanning mechanisms
                in modern on-line IR (Ingwersen, Wormwell)
 16:30-18:00   STORAGE STRUCTURES - Chairman P. Willett
             S-Tree: Dynamic balanced signature index for office ret. (Deppisch)
             Improved hierarchical bit-vector compression in doc. ret.
                systems (Fraenkel, Klein, Choueka, Segal)

TUESDAY Sept. 9, 1986
 8:30 - 10:00   LINGUISTIC RETRIEVAL - Chairman Y. Chiaramella
             Incorporating syntactic information into a doc. ret.
                 strategy: An investigation (Smeaton)
             CALIN: A user interface based on a simple natural
                 language (Bosc, Courant, Robin)
             Solving grammatical ambiguities within a surface
                 syntactical parser for automatic indexing (Berrut, Palmer)
 10:30-12:00    INFORMATION RETRIEVAL SYSTEMS - Chairman D. Kraft
             A design of a distributed full text retrieval system
                 (Macleod, Martin, Nordin)
             REALIST: Retrieval aids by linguistics and statistics (Thurmair)
             COREL: A conceptual ret. system (DiBenigno,Cross,DeBessonet)
 13:30-14:45    CLUSTERING - Chairman P. Bollman
             Hierarchical doc. classification using Ward's clustering method
                 (El-Hamdouchi, Willett)
             User-oriented doc. clustering: A framework for learning
                 in IR (Raghavan, Deogun)
             The efficiency of inverted index and cluster searches (Voorhees)
 15:10-16:00    RETRIEVAL STRATEGIES - Chairman M. Agosti
             On extending the vector space model for Boolean query
                 processing (S.K.M. Wong, Ziarco, Raghavan, P.C.N. Wong)
             An experimental study of factors important in doc.
                 ranking (D. Williamson)

WEDNESDAY Sept. 10, 1986
 9:00-10:30    KNOWLEDGE BASED IR (I) - Chairman C.J. van Rijsbergen
             Invited paper - A new theoretical framework for IR 
                (C.J. van Rijsbergen)
             User-specified domain knowledge for doc. ret. (Croft)
 11:00-12:30   KNOWLEDGE BASED IR (II) - Chairman C.J. van Rijsbergen
             IOTA: A full text IR system (Chiaramella,Defude,Bruandet,Kerkouba)
             An IR system based on AI techniques (DeJaco,Garbolino)
             The use of inference mechanisms to improve the retrieval
                facilities from large relational databases (Zarri)
 14:00-15:30   LEARNING SYSTEMS - Chairman G. Salton
             A machine learning approach to IR (S.K.M. Wong, W. Ziarko)
             An automatic and tunable doc. indexing system (Ozkarahan,Can)
             Performance of self-taught documents (Bookstein)
 15:50-18:00   PROBABILISTIC RETRIEVAL - Chairman A. Bookstein
             Two models of retrieval with prob. indexing (Fuhr)
             Two Poisson and binary indep. assumptions for prob. doc.
                retrieval (Losee, Bookstein, Yu)
             Non-binary independence model (Yu, Lee)
             The maximum entropy principle in IR (Kantor, Lee)
             An interpretation of index term weighting schemes based
                on doc. components (Kwok)

THURSDAY Sept. 11, 1986
  The Special Interest Group in Information Retrieval (GLIR) of the
  Italian Computing Society (AICA) is organizing a Tutorial Day on
  FUTURE DIRECTIONS IN IR
 9:00-10:45    Design of automatic retrieval systems (G. Salton)
 11:00-12:30   Future directions in IR: theory   (C.J. van Rijsbergen)
 14:00-15:30   Future directions in IR: practice (C.J. van Rijsbergen)
 15:45-17:15   Technological trends in IR hardware (T. Toszkocs)
 17:15         Concluding remarks

------------------------------

Date: Tue, 29 Jul 86 11:06:46 cdt
From: Don <kraft%lsu.csnet@csnet-relay.arpa>
Subject: Re:  1987 conference [Reformatted for CRT - Ed]
     
              Association for Computing Machinery (ACM)
        Special Interest Group on Information Retrieval (SIGIR)

                 1987 International Conference on 
           Research and Development in Information Retrieval

                          June 3-5, 1987
              Monteleone Hotel (in the French Quarter)
                   New Orleans, Louisiana, USA

                         CALL FOR PAPERS

Papers are invited on theory, methodology, and applications of information
retrieval.  Emerging areas related to information retrieval, such as office
automation, computer hardware technology, and artificial intelligence and
natural language processing are welcome.

Topics include, but are not limited to:

retrieval system modelling                     user interfaces    
retrieval in office environments               hardware development
natural language processing                    mathematical models
retrieval system performance                   linguistic models    
system development and evaluation              multimedia retrieval    
storage and search techniques                  complexity problems
cognitive and semantic models                  knowledge representation
information retrieval and database management

Submitted papers can be either full length papers of approximately twenty
to twenty-five pages or extended abstracts of no more than ten pages.  All
papers should contain the authors' contributions in comparison to existing
solutions to the same or to similar problems.

Important Dates:
Submission Deadline        December 15, 1986
Acceptance Notification    February 15, 1987
Final Copy Due             March 20, 1987
Conference                 June 3-5, 1987

Four copies of each paper should be submitted.  Papers submitted from
North America can be sent to Clement T. Yu; submissions from outside North
America should be sent to C. J. "Keith" van Rijsbergen.

Conference Chairman    Treasurer               Publicity Chairman
Donald H. Kraft        Bert R. Boyce           Vijay Raghavan
Department of          School of Library and   Department of
  Computer Science       Information Science     Computer Science
Louisiana State Univ.  Louisiana State Univ.   Univ. of Regina
Baton Rouge, LA 70803  Baton Rouge, LA  70803  Regina, Saskatchewan Canada
(504) 388-1495         (504) 388-3158                  and 
                                               Center for Advanced Studies
                                               Univ. of Southwestern Louisiana
                                               P.O. Box 44330
                                               Lafayette, LA  70504


Arrangements Chairman  Technical Program Co-Chair  Technical Program Co-Chair
Michael C. Stinson     Clement T. Yu               C. J. "Keith" van Rijsbergen
Department of          Department of               Department of
  Computer Science       Elect. Engineering          Computer Science
Lousiana State Univ.     and Computer Science      University of Glascow
Baton Rouge, LA 70803  Univ. of Illinois, Chicago  Lilybank Gardens,
                                                   Glascow G12 8QQ
(504) 388-1495         Chicago, IL  60680          Scotland
                       (312) 996-2318              (041) 339-8855


Technical Program Committee Members:
Abraham Bookstein (USA)                 Nick Cercone (Canada)    
Stavros Christodoulakis (Canada)        Yves Chiaramella (France)    
Martha Evens (USA)                      Aviezri Fraenkel (Israel)
Jochum Friedbert (Germany)              Richard Frost (Scotland)    
Tetsuro Ito (Japan)                     W. S. Luk (Canada)    
Michael McGill (USA)                    Esen Ozkarahan (USA)
Fausto Rabitti (Italy)                  Gerard Salton (USA)    
Peter Scheuermann (USA)                 C. J. "Keith" van Rijsbergen (Scotland)
Michael Wong (Canada)                   Clement T.  Yu (USA)

------------------------------

Date: Fri, 25 Jul 86 18:57:49 edt
From: DEJONG%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU
Subject: Cognitive Science Calendar [Extract - Ed]

    Date: Friday, 18 July 1986  10:09-EDT
    from: BGOODMAN@G.BBN.COM
    Subject: Seminar on Knowledge Bases

              BBN Laboratories Inc.
            Science Development Program
              AI/Education Seminar
             Friday, 1 August   10:30am

  From Guidon to Neomycin and Heracles--Viewing               
       Knowledge Bases as Qualitative Models 

          Dr. William J. Clancey 
          Stanford Knowledge Systems Laboratory 
          Computer Science Department 
          701 Welch Road, Bldg C 
          Palo Alto, CA 94304 

Beginning with early attempts to improve MYCIN's representation of
knowledge for use in teaching, we have followed the approach of
decomposing knowledge from how it is used, abstracting knowledge
structures and reasoning procedures, and formulating an increasingly more
general understanding of what knowledge engineering and knowledge bases
are all about.  In NEOMYCIN, medical knowledge and diagnostic procedure
are separately represented in well-structured languages to facilitate
explanation and student modeling.  In HERACLES, this knowledge base is
viewed as a classification model of some physical, cognitive, or social
system that is heuristically related to some design, modification,
prediction, or control action.  That is, we view the knowledge base as a
qualitative model of some system in the world, designed with practical
engineering value in mind.  From this perspective, the "diagnostic
strategy" of Neomycin is a general inference procedure that describes
memory activation and search for constructing a situation-specific model. 

This talk will review the development of NEOMYCIN from GUIDON and
summarize the generalizations that we are now exploiting in our 
development of the HERACLES shell and GUIDON2 teaching programs. 

------------------------------

END OF IRList Digest
********************


From jrd  Fri Jan 16 20:30:40 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA10942; Fri, 16 Jan 87 20:30:40 EST
Date: Fri, 16 Jan 87 20:30:40 EST
From: Jim Davis <jrd>
Message-Id: <8701170130.AA10942@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Names for protocols
Status: O

It seems that there are two *different* dictionary protocols in
common use in the world today.  OZ has a server for the WEBSTER
protocol, which accepts a word and returns a slightly parsed
copy of the info from the Webster database.  What Mike Halle
recently implemented is not WEBSTER but rather DICTIONARY.

The DICTIONARY protocol is much richer, supporting such things
as wild cards, spell correction, completion.  It also requires
more complicated client code.

I just figured this out a few minutes ago.  Let us all be consistent
henceforth.


From jrd  Fri Jan 16 21:08:06 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA11364; Fri, 16 Jan 87 21:08:06 EST
Date: Fri, 16 Jan 87 21:08:06 EST
From: Jim Davis <jrd>
Message-Id: <8701170208.AA11364@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Faster dictionary installed
Status: O

The diligent and clever Mr. Halle has installed
the new improved fast version of webster.  Hurrah.

From jrd  Fri Jan 16 21:15:24 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA11464; Fri, 16 Jan 87 21:15:24 EST
Date: Fri, 16 Jan 87 21:15:24 EST
From: Jim Davis <jrd>
Message-Id: <8701170215.AA11464@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Suggestions
Status: RO

Please add these to the list of suggestions for the DICTIONARY server.
(Maintained by G Cathey.)


1) access based on pronunciation.  I send it a pronunciation string
and it sends me back a list of all words that are pronounced that way,
in the same manner that wild cards do.  This would be valuable for
the phonetic dictionary.

2) Same as 1, but allows me to (somehow) specify a phonetic similarity
predicate, a function of two pronunciations.  This would, for example, allow
me to specify that all vowels are to be considered alike.  This would
be very helpful for the PD, even if the only option available were
to ignore vowel differences.

3) At present, if I use wild cards (on the Vax Unix client), I get back a
numbered list of possibilities, and I can then type in a number to get
that definition.  But I can only do this once.  It would be nice to
be able to do this many times.  This is not very important.  I do not
know whether this limitation is in the Client or the Server.

4) Wildcards at the beginning of a word are very slow.  It would be nice
to speed this up.  It is not very important.


From jrd  Sat Jan 17 03:26:24 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA18173; Sat, 17 Jan 87 03:26:24 EST
Date: Sat, 17 Jan 87 03:26:24 EST
From: Jim Davis <jrd>
Message-Id: <8701170826.AA18173@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: It WORKS!
Status: RO

I now have a Lisp Machine Client program that is able to talk to
the DICTIONARY protocol on the media lab.  It is much faster than
using the WEBSTER protocol via CHAOS to OZ.  It is very much
faster than the method the phonetic dictionary uses now.  Hurrah
for Mike Halle.  But...there seem to be some bugs in the server,
and I hope Mr. Cathey will soon fix them.

From jrd  Sat Jan 17 03:33:16 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA18249; Sat, 17 Jan 87 03:33:16 EST
Date: Sat, 17 Jan 87 03:33:16 EST
From: Jim Davis <jrd>
Message-Id: <8701170833.AA18249@MEDIA-LAB.MIT.EDU>
To: bug-dictionary
Subject: Some bugs in server
Status: RO

I think it is not correctly parsing some of the fields, resulting
in the following bizarre output:
Apparent bugs in the DICTIONARY server.

BUG: ] comes before end of etymology in 1, no closing ] in 3.  What is
that X doing there?

1. ball \'bo.l\ n [ME bal, fr. ON bo:llr; akin to OE bealluc testis, OHG 
   ba]lla ball, OE bula bull 1: a round or roundish body or mass : as 1a: a 
   spherical or ovoid body used in a game or sport 1b: EARTH, GLOBE 1c: a 
   spherical or conical projectile; also : projectiles used in firearms 1d: a 
   roundish protuberant anatomic structure; esp : the rounded eminence at the 
   base of the thumb or great toe 2: a game in which a ball is thrown, kicked, 
   or struck; esp : BASEBALL 3a: the delivery of the ball {curve ~} 3b: a 
   pitched baseball not struck at by the batter that fails to pass through the 
   strike zone
2. ball vb : to form or gather into a ball - ball.er n
3. ball n [F bal, fr. OF, fr. baller to dance, fr. LL ballare, fr. Gk 
   (Xballizein; akin to Skt balbali-ti he whirls 1: a large formal gathering 
   for social dancing 2: a good time : PICNIC

BUG - what is that capitalized Dog doing in def 3?

1. dog \'do.g\ \'do.-.gli-k\ n [ME, fr. OE docga] often attrib  1a: a 
   highly variable carnivorous domesticated mammal (Canis familiaris) prob. 
   descended from the common wolf; broadly : any animal of the dog family 
   (Canidae) to which this mammal belongs 1b: a male dog 2a: a worthless 
   fellow : 2b: CHAP, FELLOW {a gay ~} 3a: any of various usu. simple 
   mechanical devices for holding, gripping, or fastening consisting of a 
   spike, rod, or bar 3b: ANDIRON 4a: SUN DOG 4b: WATER DOG 4c: FOGBOW 5: 
   affected stylishness or dignity cap  6: either of the constellations Canis 
   Major or Canis Minor pl, slang  7: FEET slang  8: something inferior of its 
   kind pl  9: RUIN {go to the ~s} cap  10: any of various American Indian 
   peoples - dog.like aj
2. dog vt or dogged;  or dog.ging 1a: to hunt or track like a hound 1b: to 
   worry as if by dogs : HOUND 2: to fasten with a dog
3. dog av : EXTREMELY, UTTERLY {dog-tired}Dog \'do.g\  : - a communications 
   code word for the letter d

Maybe not a bug, but why are there two pronunciations here.
Note also misplaced ]
No number 1 begining first def.  Maybe that's because there's only one def,
but then why was the second included?

ro.man \ro--'ma:n\ n [ME, fr. OF romans romance] : a metrical romance1. 
   Ro.man \'ro--m*n\ n [partly fr. ME, fr. OE, fr. L Romanus, adj. & n., fr. 
   Roma Rome; pa]rtly fr. ME Romain, fr. OF, fr. L Romanus 1: a native or 
   resident of Rome 2: ROMAN CATHOLIC - often taken to be offensive not cap  
   3: roman letters or type
2. Roman aj 1: of or relating to Rome or the people of Rome; specif : 
   characteristic of the ancient Romans {~ fortitude} 2: LATIN not cap  3: 
   UPRIGHT - used of numbers and letters whose capital forms are modeled on 
   ancient Roman inscriptions 4: of or relating to the see of Rome or the 
   Roman Catholic Church 5: having a semicircular intrados {~ arch} 6: having 
   a prominent slightly aquiline bridge {~ nose}

BUG has no number.  Maybe that's because it's solitary.

dog.house \'do.g-.hau.s\ n : a shelter for a dog : in a state of disfavor - 
   in the doghouse


From jrd  Sat Jan 17 03:37:33 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA18300; Sat, 17 Jan 87 03:37:33 EST
Date: Sat, 17 Jan 87 03:37:33 EST
From: Jim Davis <jrd>
Message-Id: <8701170837.AA18300@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Protocol is not specific
Status: RO

The protocol does not specify the format of the ASCII data returned in a
definition.  It should.  If programs are going to parse this output they
need to know what to expect.  For instance, that the last character sent a
128 decimal (NULL).

On the other hand, if we decide to implement the RAW mode (where one gets
just the raw data direct from the dictionary) then THAT would be the thing to
document, and it already is.  So perhaps we ought to just claim that the
info returned is human readable, and not commit to anything more than that.


From halazar  Sat Jan 17 12:18:43 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA23474; Sat, 17 Jan 87 12:18:43 EST
Message-Id: <8701171718.AA23474@MEDIA-LAB.MIT.EDU>
To: jrd
Cc: info-dictionary
Subject: Re: Some bugs in server
In-Reply-To: Your message of Sat, 17 Jan 87 03:33:16 EST.
             <8701170833.AA18249@MEDIA-LAB.MIT.EDU>
Date: Sat, 17 Jan 87 12:18:41 -0500
From: halazar
Status: RO


It seems unclear to me that we aren't just seeing database bugs, especially
since we KNOW that some of our files are garbled.  Get the files off
sri-nic, which have reportedly been fixed, then see.

						-Mike

From mike  Sat Jan 17 15:40:44 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA26600; Sat, 17 Jan 87 15:40:44 EST
Date: Sat, 17 Jan 87 15:40:44 EST
From: Michael Hawley <mike>
Message-Id: <8701172040.AA26600@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: dog
Status: RO

I wrote my webster program in about an hour, spending about half
the time getting it to pretty print the definitions in a way I like:
-------
dog ('d{o.}g)
   i[often attrib]
Etymology: ME, fr. OE i[docga]
1) a) n, a highly variable carnivorous domesticated mammal (i[Canis
      familiaris]) prob. descended from the common wolf; i[broadly]:
      any animal of the dog family (Canidae) to which this mammal belongs
   b) n, a male dog
2) a) n, a worthless fellow:
   b) n, CHAP, FELLOW <a gay ~>
3) a) n, any of various usu. simple mechanical devices for holding,
      gripping, or fastening consisting of a spike, rod, or bar
   b) n, ANDIRON
4) a) n, SUN DOG
   b) n, WATER DOG
   c) n, FOGBOW
5) n, affected stylishness or dignity i[cap]
6) n, either of the constellations Canis Major or Canis Minor i[pl], i[slang]
7) n, FEET i[slang]
8) n, something inferior of its kind i[pl]
9) n, RUIN <go to the ~i[s]> i[cap]
10) n, any of various American Indian peoples
 -- dog.like, aj
('d{o.}-.gl{i-}k)
--------
I find this easier to read.
The Webster definitions are nicely structured, it seems a shame to
spoil it by right justifying.  The code to do this is hacky but
I can give it to anyone who wants it.

Michael

From mike  Sat Jan 17 15:43:40 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA26642; Sat, 17 Jan 87 15:43:40 EST
Date: Sat, 17 Jan 87 15:43:40 EST
From: Michael Hawley <mike>
Message-Id: <8701172043.AA26642@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: formats
Status: RO

There should probably be some raw mode format -- like, a way to have
the server return the machine-format text (unformatted)
and let the client worry about decoding.
A pronunciation index would be wonderful - a rhyming dictionary
is something I've always wanted.  (I have a friend who is trying to
write code that will digest the AP news wire and read it in rap
on a dectalk.)

From jrd  Sun Jan 18 12:54:39 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA12321; Sun, 18 Jan 87 12:54:39 EST
Date: Sun, 18 Jan 87 12:54:39 EST
From: Jim Davis <jrd>
Message-Id: <8701181754.AA12321@MEDIA-LAB.MIT.EDU>
To: gmcathey, jrd@amt
Subject: Re:  format, versions, new database
Cc: info-dictionary
Status: RO

	From gmcathey Sat Jan 17 14:56:05 1987
	Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA25987; Sat, 17 Jan 87 14:56:02 EST
	Message-Id: <8701171956.AA25987@MEDIA-LAB.MIT.EDU>
	To: jrd@amt
	Cc: halazar@amt
	Subject: format, versions, new database
	Date: Sat, 17 Jan 87 14:56:00 -0500
	From: gmcathey
	
	
	....	
	NEW DATABASE
	
	I think we should get the corrected database.  Jim, what do you
	think?  I have some idea that you have made many
	?corrections?updates? to the database.  
	
I fixed the pronunciation errors as documented in all.badpron.  I can
do this again if need be.  I would expect, by the way, that getting new
source files would require recomputing the dbm file of word indicies.
	
	
From gmcathey Sat Jan 17 14:56:05 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA25987; Sat, 17 Jan 87 14:56:02 EST
Message-Id: <8701171956.AA25987@MEDIA-LAB.MIT.EDU>
To: jrd@amt
Cc: halazar@amt
Subject: format, versions, new database
Date: Sat, 17 Jan 87 14:56:00 -0500
From: gmcathey
Status: RO


FORMAT


I have a copy of Webster's Seventh.  The server formats the 
definitions wrong in some cases. The ones I have know about:

1--labels are misplaced: instead of-> 5 -of an inanimate object- a: to  
	move... b: to stand...// it gives-> ...balls of an inanimate 
	object 5a: to move... 5b: to stand... (from "walk")

2--labels are misplaced: in "dog", the label "often attrib"
	should follow the pronunciation (or lead the etymology)

3--runons are ignored and are fundemental to understanding the word in
	a particular sense

4--] is misplaced when the etymology is longer than 80 characters
	that is, when a C: (continuation) line is used.

5--if a number is missing(e.g., Dog and Roman) then it is appended to 
	the last same word's definition.

I know the reasons for 1,3,4.
I can easily fix 1,3.
I haven't looked at the rest, so I can't say what the problem is.
However, the database is correct for these examples (except for 
Roman).  

One problem: the server does not distinguish between upper and lower
case characters for queries.

_________________________________________________________________________

VERSIONS

Mike, would you please tell me where the new working source code is?


_________________________________________________________________________

NEW DATABASE

I think we should get the corrected database.  Jim, what do you think?
I have some idea that you have made many ?corrections?updates? to
the database.  


________________________________________________________________________

Mike, good work!


Ade.

gmc


From jrd  Sun Jan 18 13:03:27 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA12804; Sun, 18 Jan 87 13:03:27 EST
Date: Sun, 18 Jan 87 13:03:27 EST
From: Jim Davis <jrd>
Message-Id: <8701181803.AA12804@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Re:  formats
Status: RO

	From mike Sat Jan 17 15:43:43 1987
	Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA26642; Sat, 17 Jan 87 15:43:40 EST
	Date: Sat, 17 Jan 87 15:43:40 EST
	From: Michael Hawley <mike>
	Message-Id: <8701172043.AA26642@MEDIA-LAB.MIT.EDU>
	To: info-dictionary
	Subject: formats
	
	There should probably be some raw mode format -- like, a way to have
	the server return the machine-format text (unformatted) and let the
	client worry about decoding.

This is planned.

	A pronunciation index would be wonderful - a rhyming dictionary is
	something I've always wanted.  (I have a friend who is trying to
	write code that will digest the AP news wire and read it in rap on a
	dectalk.)

Yes, this would be fun.  It is not planned, however.  What I proposed was
the ability to extract a single word given its pron.  For rhyming, you want
*all* words that rhyme.  This requires wild cards in the phonetic string,
which is not something we'd planned.  What's more, initial position wild
cards seem to be slow.  I would not want this to be high on the priority
list.

By the way, your improved Client formatting looks like a win.  I've
been wondering myself whether it makes sense to have the server do the
formating that it does - it minimizes screen usage, but contains embedded
assumptions about the screen width of the client, and makes parsing more
difficult, although RAW mode is supposed to make that a moot point.

From jrd  Mon Jan 19 00:46:43 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA26857; Mon, 19 Jan 87 00:46:43 EST
Date: Mon, 19 Jan 87 00:46:43 EST
From: Jim Davis <jrd>
Message-Id: <8701190546.AA26857@MEDIA-LAB.MIT.EDU>
To: holtzman, lacsap
Subject: Missing Webster Daemon
Cc: info-dictionary
Status: RO

This evening the webster daemon is not running.
Pascal, did you perhaps kill it to get more time
for your big crunching job?  Or Henry, is there
not some provision in the system to start the
daemon when the system boots?  Did it get a fatal
error and die?  

Is there some way for a mortal to restart it?
I would like to use the daemon, and I can't.
If it requires Superpowers to do it, could you
send them to me, so that I'll know what to
ask some Superuser to do?  There is usually someone
around.

From halazar  Mon Jan 19 11:35:06 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA03854; Mon, 19 Jan 87 11:35:06 EST
Message-Id: <8701191635.AA03854@MEDIA-LAB.MIT.EDU>
To: jrd
Subject: Re: Missing Webster Daemon
Cc: holtzman, info-dictionary
In-Reply-To: Your message of Mon, 19 Jan 87 00:46:43 EST.
             <8701190546.AA26857@MEDIA-LAB.MIT.EDU>
Date: Mon, 19 Jan 87 11:35:04 -0500
From: halazar
Status: O


I killed it when I installed the new code into /projects/webster.
Restarting puts the message into /usr/adm/messages of :
websterd[pid]: bind: address already in use.  

Upon further research, I found the obvious (YOUR machine) has 3
connections (FIN_WAIT_2) to port 3012, the webster port.  I assume
that this is the cause for the error.  I don't know how to fix the
problem, short of rebooting.  Sorry.  That isn't to say there isn't a
fix.


						--Mike

From jrd  Mon Jan 19 16:14:01 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA09116; Mon, 19 Jan 87 16:14:01 EST
Date: Mon, 19 Jan 87 16:14:01 EST
From: Jim Davis <jrd>
Message-Id: <8701192114.AA09116@MEDIA-LAB.MIT.EDU>
To: halazar
Subject: Re: Missing Webster Daemon
Cc: holtzman, info-dictionary
Status: O

	From halazar Mon Jan 19 11:35:09 1987
	
	I killed it when I installed the new code into /projects/webster.
	Restarting puts the message into /usr/adm/messages of :
	websterd[pid]: bind: address already in use.  
	
	Upon further research, I found that Obvious (Lisp machine) has 3
	connections (FIN_WAIT_2) to port 3012, the webster port.  I assume
	that this is the cause for the error.  I don't know how to fix the
	problem, short of rebooting.  Sorry.  That isn't to say there isn't a
	fix.
	
For a short while there was a bug on Obvious where it was keeping ports
open after use.  If I killed the connections from Obvious, could you
try again?

Surely there is some way to start Websterd forcibly, killing any rivals,
without rebooting...

	
From gmcathey  Tue Jan 20 16:30:54 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA06972; Tue, 20 Jan 87 16:30:54 EST
Message-Id: <8701202130.AA06972@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: possible additions to DICTIONARY protocol
Date: Tue, 20 Jan 87 16:30:51 -0500
From: gmcathey
Status: O


Here are some new commands that I would like to add to the
existing protocol.  Please send comments to info-dictionary.


Command: PART_OF_SPEECH<space>word<NL>

The PART_OF_SPEECH command asks only for the parts
of speech of a word from the dictionary.  The possible
responses are:

	PART_OF_SPEECH<NL>
	<part of speech><NL>
	<part of speech><NL>
	      . . .
	<part of speech><NL>
	<EOF>

Following the PART_OF_SPEECH response are the different
parts of speech of the word.


If the word contains wildcard characters, then a
WILD response will be returned, just like for the
DEFINE command.

If the word is not found in the dictionary, then a 
SPELL response will be returned, just like for the 
DEFINE command.


________________________________________________________________


Command: PRONUNCIATION<space>word<NL>

The PRONUNCIATION command asks only for the pronunciation of 
a word from the dictionary.  The possible responses are:

	PRONUNCIATION<NL>
	<pronunciation>
	<EOF>

Following the PRONUNCIATION response is the pronunciation
of the word.


If the word contains wildcard characters, then a
WILD response will be returned, just like for the
DEFINE command.

If the word is not found in the dictionary, then a 
SPELL response will be returned, just like for the 
DEFINE command.


__________________________________________________________________


Command: RAW_MODE<NL>

The RAW_MODE command requires the server to send the 
data in its "raw" form; that is, just as it exists in the
database file.


The response is:

	RAW_MODE_OK<NL>


__________________________________________________________________

Command: FORMAT_MODE<NL>

The FORMAT_MODE command requires the server to parse
(make it human readable) the data before sending the 
information.


The response is:

	FORMAT_MODE_OK<NL>


___________________________________________________________________

Command: MODE<NL>

The MODE command requests the server to tell the client
which mode the server is in.


The possible responses are:


	FORMAT_MODE_OK<NL>

This response indicates the server in the format mode.

	
	RAW_MODE_OK<NL>

This response indicate the server in the raw mode.


___________________________________________________________________


yours,


george cathey

From gmcathey  Tue Jan 20 17:00:27 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA08395; Tue, 20 Jan 87 17:00:27 EST
Message-Id: <8701202200.AA08395@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: addendum to documentation of possible additions to DICTIONARY protocol
Date: Tue, 20 Jan 87 17:00:22 -0500
From: gmcathey
Status: O


In RAW_MODE:

The server's response to a DEFINE command would be 
	RAW_DEFINITION<NL>
	--raw data--
	<EOF>

The server's response to a PRONUNCIATION command would be
	RAW_PRONUNCIATION<NL>
	--raw pronunciation data--
	<EOF>

In FORMAT_MODE:

The server's responses would be the same as currently documented and
implemented.


Please refer comments to info-dictionary.


george cathey


From jrd  Wed Jan 21 12:41:23 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA14100; Wed, 21 Jan 87 12:41:23 EST
Date: Wed, 21 Jan 87 12:41:23 EST
From: Jim Davis <jrd>
Message-Id: <8701211741.AA14100@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Protocol comments
Status: RO

The proposed protocol extensions for DICTIONARY look good to me.
I have a few questions and suggestions:

1) I would suggest a different scheme for modes.  Instead of separate
commands for each mode, define a single new command, MODE, which takes one
required argument, the name of a mode, and one optional argument, a string
value for that mode.  For Boolean valued modes, the string YES and NO shall
be accepted.  The MODE command returns an error message if given an
unrecognized mode, an unacceptable value, or missing or extra args.

Then define a mode (or switch, if you prefer that term) FORMAT.  By default,
FORMAT is YES.  

The advantage of this plan are:
 1) it makes clear that RAW_MODE and FORMAT_MODE are mutually incompatible.

 2) It makes it easier to add new modes.

 3) it eliminates the "mode inquiry" command 

Perhaps there should also be the command MODES (no args) which returns
values of all modes.

2) I don't understand how FORMAT affects the PRONUNCIATION command.
Since PRONUNCIATION is new, there is no need for it to do formatting.
It *might* be helpful to have the server convert from the internal
Webster coding to ARPABET, but if it isn't going to do this it
may as well not do anything at all.

3) Would it be desirable and easy to allow the Client to specify the
kind of formatting done by the Server?  For example, you might want
to tell the Server the line length of the terminal, or whether 
some form of emthasis (bold, inverse, underline) was available as
the "man" program uses.  Or perhaps the X or other window system
provides methods of encoding font shifts...This is all pretty speculative,
and not very important to me at present, although it suggests
the value of an extensible protocol for setting modes (or switches)


From jrd  Wed Jan 21 17:48:03 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA22393; Wed, 21 Jan 87 17:43:38 EST
Date: Wed, 21 Jan 87 17:43:38 EST
From: Jim Davis <jrd>
Message-Id: <8701212243.AA22393@MEDIA-LAB.MIT.EDU>
To: bug-dictionary
Subject: Bugs in server
Status: O

There are many bugs in the supplied version of the server.  Here is an
example, the entry for HELP

1. help \'help, South also 'hep\ vb [ME helpen, fr. OE helpan; akin to OHG 
   helfan to help, Lith (Xshachek>elpti 1: to give assistance to : AID 2a: 
   REMEDY, RELIEVE archaic  2b: RESCUE, SAVE 2c: to get (oneself) out of a 
   difficulty 3a: to be of use to : BENEFIT 3b: to further the advancement of 
   : PROMOTE 4a: to change for the better : MEND 4b: to refrain from {couldn't 
   ~ laughing} 4c: to keep from occurring : PREVENT 5: to serve with food or 
   drink esp. at a meal 6: to appropriate for the use of (oneself) : to be of 
   use or benefit {every little bit ~s} accomplish an end, HELP carries a 
   strong implication of advance toward an objective; AID suggests the evident 
   need of help or relief and so imputes weakness to the one aided and 
   strength to the one aiding; ASSIST suggests a secondary role in the 
   assistant or a subordinate character in the assistance : cannot but : I 
   swear it - cannot help but  SYN syn HELP, AID, ASSIST mean to supply what 
   is needed to

Here is what is in the database:

F:help;1;;;;vb;;
P:'help, (XSouth also)X 'hep
E:ME (Xhelpen)X, fr. OE (Xhelpan)X; akin to OHG (Xhelfan)X to help, Lith (
C:Xs|B(Qhachek)Qelpti)X

Note the open paren at end of E:  It is the beginning of an overstrike
sequence to enter X font.  Note also the use of (Q to get the hachek
character).

Why are these ^M in here, anyway?

Note also that the Webster database includes an italicized comment in the
middle of the pronunciation:  A Trap and Snare for the unwary.!

The Synonyms are messed up.  They are listed at the end, which is good,
but it included the Continuation lines of the Synonyms in the definition
instead of in the Synonym info.

D:6;;;vt;to appropriate for the use of (oneself)
D:0;;;vi;to be of use or benefit <(every little bit (R@)R(Xs)X)>
S:(Ysyn)Y (MHELP)M, (MAID)M, (MASSIST)M mean to supply what is needed to
C: accomplish an end, (MHELP)M carries a strong implication of advance toward

From jrd  Wed Jan 21 17:52:11 1987
Received: by MEDIA-LAB.MIT.EDU (5.51/4.8)  id AA22634; Wed, 21 Jan 87 17:52:11 EST
Date: Wed, 21 Jan 87 17:52:11 EST
From: Jim Davis <jrd>
Message-Id: <8701212252.AA22634@MEDIA-LAB.MIT.EDU>
To: info-dictionary
Subject: Plans
Status: O

George and I had a talk.  We found more bugs in the existing dictionary code
(which we henceforth call Version 0).  I reported them to bug-dictionary, if
you are curious.

Here are George's plans for now:
1) make the  protocol definition more explicit
  MODE
   ? response for invalid mode, bad value for mode
  MODES
  VERSION
  PRONUNCIATION

2) document format of Dictionary as you discover it.

3) implement the features
 new modes:
   DWIM
   FORMAT

4) Define a server for an inverted dictionary.  The basic command would be
to find the list of all words which are pronounced in a supplied manner.  It
will probably look like "WILD".  This may, or may not, be the same server as
the DICTIONARY server, at George's option.  This requires constructing a
program to build an inverted table from ARPABET to words using the existing
files.

5) I will make a translator from the Webster pronunciation codes
to ARPABET, in Lisp.  We may use this to build the tables for 4.

From gmcathey  Mon Apr  6 00:29:37 1987
Received: by media-lab.MIT.EDU (5.54/4.8)  id AA15544; Mon, 6 Apr 87 00:29:37 EDT
From: George M Cathey <gmcathey>
Message-Id: <8704060429.AA15544@media-lab.MIT.EDU>
To: info-dictionary
Subject: updated protocol for webster
Date: Mon, 06 Apr 87 00:29:35 EDT
Status: O


This is the most up-to-date version of the dictionary protocol
as of April 5, 1987.  Please send any recommendations or comments
to info-dictionary.


DICTIONARY server protocol:

Contact name is "DICTIONARY".  A full connectional is established
(additional data in the RFC is ignored, there's no simple mode)

Command lines to the server are of the form

	COMMAND[<space>ARGUMENT]<NL>

where the part in brackets, [], is optional.  <space> is ASCII space,
octal 40, and <NL> is the LispMachine NewLine character, octal 215.

The server responds with a single line of the same format, and then if
there's additional data it comes next, followed by an EOF packet.

The actual response will be either

	ERROR<space>RECOVERABLE<error message><NL>
or	ERROR<space>FATAL<error message><NL>

or a command-depenedent response.  FATAL-type errors are just that,
fatal, and the server will go away after sending the ERROR message.

Command:	HELP<NL>

This command will send back the text of this document, the dictionary
protocol, followed by <EOF>.

Command:	DEFINE<space>WORD<NL>

This is the command that asks for the defintion of a word from the
dictionary.  The possible response are:

	WILD<space>0<NL>

or
	WILD<NL>
	<word#><space><word1><NL>
	<word#><space><word2><NL>
		. . .
	<word#><space><wordN><NL>
	<EOF>

A WILD response is given when the word to be defined contained
wildcard characters ('%' which matches exactly one character, or '*'
which matches 0 or more characters).  If the wild string had no
matches, a WILD response with argument 0 is returned.  If there are
one or more matches, a WILD with no arg is returned, and then the
matching words are sent, one per line, followed by an EOF packet.  For
each returned word there is a word#, a string of ASCII digits
representing a decimal number.  For user convenience, that word#
may be specified in place of the word itself, in a DEFINE request.

	SPELL<space>0<NL>

or
	SPELL<NL>
	{ same response as WILD }

When a word is specified that couldn't be found verbatim, Webster
attempts to Do What You Mean, and try to fix common typos (transposed
letter, one missing or one additional letter, or one letter wrong).
If any such matches are found, a SPELL response is returned, listing
all the "possible" words.  If no such words were found, (e.g. it
couldn't make ANY sense out of the input word), a SPELL with argument
0 is returned.

	DEFINITION<space>n<NL>
then n	{ WILD-response-like lines }
then	<any amount of ASCII text>
	<EOF>

A DEFINITION response means the word matched an entry, and the definition
follows.  The argument (always present), n,  is the # of cross-references
in the definition that might prove interesting.  If n > 0, then follows
one line per cross-reference, in the same fork as the WILD responses.  Then
comes the body of the definition, followed by and <EOF>

Command:	COMPLETE<space>word<NL>

Is used to simulate is action of the TENEX/TWENEX <escape> completion
feature.  "word" is usually the beginning portion of a word that is
expected to be unique.  The response is either

	AMBIGUOUS<space>n<NL>

or
	COMPLETION<space>full-word<NL>

If the partial word you specified matches zero or more than one
dictionary entry, an AMBIGUOUS response is given, the argument
being the number of matches.

If the partial word matches one and only one entry, a COMPLETION
reply is sent back, containing the full text of the word that was
completed.  Note that COMPLETION and wildcard characters CAN be
mixed, so the user program should check the word being completed,
and if any wildcard characters exist (in the supplied part),
the entire word should be retyped, not just what was competed.

Command:	ENDINGS<space>word<NL>

This command is used to simulate the "?" TENEX/TWENEX feature.
The response is either

	ENDINGS<space>0<NL>

or
	ENDINGS<NL>
	{ WILD-like word-list }

What ENDINGS actually does is append a "*" to the word you gave and
check for WILD matches.  If there were none, you get the ENDINGS with
argument 0 response, else a WILD-like list of words.  NOTE that even
tho the word-line returned have a word#, that number does not work
the same as for WILD or SPELLING.  A number is present on the line so
that the response is in the same format as that of WILD or SPELLING,
to make it a little easier on the user program.  Those number shouldn't
be passed back as they don't mean anything.

Command:	SPELL<space>word<NL>

The SPELL command is for people who want to (ab)use the dictionary
server as a SPELL server.  The word is looked up, and if it matches
zero or one entries you get a

	SPELL<space>matches<NL>

response where matches is 0 or 1.

If the word isn't found, but has possible alternate spellings, those
are returned exactly like a SPELLING response to DEFINE.


Command: PART-OF-SPEECH<space>word<NL>

The PART-OF-SPEECH command asks only for the parts
of speech of a word from the dictionary.  The possible
responses are:

	PART_OF_SPEECH<NL>
	<part of speech><NL>
	<part of speech><NL>
	      . . .
	<part of speech><NL>
	<EOF>

Following the PART-OF-SPEECH response are the different
parts of speech of the word.

            Frequency   Code   Part of Speech          Example
            ____________________________________________________________
                42590   n      noun
                13250   aj     adjective
                 4972   vt     verb, transtitive
                 2423   vb     verb
                 1432   av     adverb
                 1363   vi     verb, intransitive
                  516   cf     combining form
                  164   pp     preposition
                  150   nc     noun combining form
                  121   tm     trademark
                  108   pn     pronoun
                  107   ns     noun suffix
                   96   cj     conjunction
                   94   ij     interjection
                   74   pf     prefix
                   51   js     adjective suffix
                   12   vs     verb suffix
                   10   vp     verb imperative
                    6   as     adjective suffix
                    4   va     verbal auxiliary
                    3   ia     indefinite article
                    2   vm     verb impersonal, past   meseems, methinks
                    2   vc     verb combining form     -lyze, -sect
                    2   sf     suffix                  -est, -fold
                    2   da     definite article        the, ye
                    1   np     noun plural suffix      -s
                    1   is     interjection suffix     -o


If the word contains wildcard characters, then a
WILD response will be returned, just like for the
DEFINE command.

If the word is not found in the dictionary, then a 
SPELL response will be returned, just like for the 
DEFINE command.

Command: PRONUNCIATION<space>word[<space>part-of-speech]<NL>

The PRONUNCIATION command asks only for the pronunciation of 
a word from the dictionary.  The possible responses are:

	PRONUNCIATION<NL>
	<pronunciation>
	<EOF>

Following the PRONUNCIATION response is the pronunciation
of the word. If a part-of-speech is specified, then the
corresponding pronunciation is returned.  If no part-of-speech
is specified, then the first pronunciation found is returned.

A list of possible parts-of-speech is found on the 
PART-OF-SPEECH command page.

If the word contains wildcard characters, then a
WILD response will be returned, just like for the
DEFINE command.

If the word is not found in the dictionary, then a 
SPELL response will be returned, just like for the 
DEFINE command.

If the word is not the specified part-of-speech, then 
the response will be:

	NO-SUCH-PART-OF-SPEECH<NL>

Command: RAW-MODE<NL>

The RAW-MODE command requires the server to send the 
data in its "raw" form; that is, just as it exists in the
database file.


The response is:

	RAW-MODE-OK<NL>

Command: FORMAT-MODE<space>format-option<NL>

The FORMAT-MODE command requires the server to parse
(make it human readable) the data before sending the 
information.

The format-options are:
** none decided yet, but options may include: **
-variable width screen
-tty types
-others

The response is:

	FORMAT-MODE-OK<NL>

Command: MODE<NL>

The MODE command requests the server to tell the client
which mode the server is in.


The possible responses are:


	FORMAT-MODE-OK<NL>
	<format-option><NL>
	<format-option><NL>
	  .   .   .   .
	<format-option><NL>
	<EOF>

This response indicates the server in the format mode.

	
	RAW-MODE-OK<NL>

This response indicate the server in the raw mode.


yours,

george cathey

From gmcathey  Wed Apr  8 00:23:53 1987
Received: by media-lab.MIT.EDU (5.54/4.8)  id AA22887; Wed, 8 Apr 87 00:23:53 EDT
From: George M Cathey <gmcathey>
Message-Id: <8704080423.AA22887@media-lab.MIT.EDU>
To: info-dictionary
Subject: Message from Dave Curry
Date: Wed, 08 Apr 87 00:23:51 EDT
Status: O


Date: Mon, 6 Apr 87 07:47:54 EST
From: davy@ee.ecn.purdue.edu (Dave Curry)
Message-Id: <8704061247.AA15666@ee.ecn.purdue.edu>
To: gmcathey@media-lab.media.mit.edu
Subject: Webster


The dictionary files were corrupted out on the NIC; they finally fixed
this back around Christmastime.  To get the new files, FTP to SRI-NIC
and log in as anonymous.  Change to the directory "W7".  Then get all
the files whose name ends in "DICTION".  The other files (INDEX.LST,
PRONUNCIATION.TXT, etc.) are all there also, but none of them changed,
so there's not much point in FTP'ing them (they're huge).  Note that
it took me two evenings to ftp this much data (19 MB); maybe you'll do
better.

Also, I did massive amounts of work on webster over Christmas.  The
stupid "INDEX" file and the habit of the server taking up 1.3MB on
each call has been deleted in favor of a database method.  Also, many
miscellaneous buglets have been fixed.  You should probably grab the
latest version by using anonymous ftp to ee.ecn.purdue.edu and getting
the file "pub/webster.tar".  Make sure you set "image" mode for the
transfer.

The protocol looks reasonably interesting I guess.  I'm not sure about
the "format" mode though.  It would be a MAJOR mistake to start hard
coding things like terminal types into the program.  FORMAT-MODE should
just mean "format it for a dump 80 column terminal" like it does now.
If they want graphics or something, let them get it in RAW mode.

--Dave

From halazar  Wed Apr  8 01:07:44 1987
Received: by media-lab.MIT.EDU (5.54/4.8)  id AA24806; Wed, 8 Apr 87 01:07:44 EDT
From: Michael Halle <halazar>
Message-Id: <8704080507.AA24806@media-lab.MIT.EDU>
To: gmcathey
Cc: info-dictionary
Subject: Re: Message from Dave Curry 
In-Reply-To: Your message of Wed, 08 Apr 87 00:23:51 EDT.
             <8704080423.AA22887@media-lab.MIT.EDU> 
Date: Wed, 08 Apr 87 01:07:42 EDT
Status: O


We are running the latest version of webster that is available, which
includes silly bug fixes that davy doesn't seem to want to put in to
the real version, but that break things on AMT.  

						--Mike


From jrd  Wed Apr  8 09:28:15 1987
Received: by media-lab.MIT.EDU (5.54/4.8)  id AA11619; Wed, 8 Apr 87 09:28:15 EDT
Date: Wed, 8 Apr 87 09:28:15 EDT
From: Jim Davis <jrd>
Message-Id: <8704081328.AA11619@media-lab.MIT.EDU>
To: gmcathey, halazar
Subject: Re: Message from Dave Curry
Cc: info-dictionary
Status: O

	From halazar Wed Apr  8 01:07:53 1987
	
	We are running the latest version of webster that is available, which
	includes silly bug fixes that davy doesn't seem to want to put in to
	the real version, but that break things on AMT.  
	
							--Mike

We have to resolve this.  It is a bad thing for all potential Webster
users to have the server code diverging.
	
Mike: Do you have a list of the bugs fixed?
      What happened when you sent the fixes to Dave Curry?
      Did he say why he didn't take them?
      Could you try again to get him to accept them?

George:  you and I have discovered other bugs in Webster.
      Have you notified Dave of them?  He might be more
      willing to take our mods if they fixes bugs that also
      affected him.