A 4.2BSD Interprocess Communication Primer
                   DRAFT of May 13, 1987


                     Samuel J. Leffler

                      Robert S. Fabry

                       William N. Joy

              Computer Systems Research Group
 Department of Electrical Engineering and Computer Science
             University of California, Berkeley
                Berkeley, California  94720
                       (415) 642-7780


                          _A_B_S_T_R_A_C_T


          This document provides an introduction to the
     interprocess  communication facilities included in
     the 4.2BSD release of the VAX* UNIX** system.

          It discusses the overall model for  interpro-
     cess communication and introduces the interprocess
     communication primitives which have been added  to
     the  system.  The majority of the document consid-
     ers the use  of  these  primitives  in  developing
     applications.   The reader is expected to be fami-
     liar with the C programming language as all  exam-
     ples are written in C.


_________________________
* DEC and VAX are trademarks of Digital Equipment  Cor-
poration.
** UNIX is a Trademark of Bell Laboratories.


                        May 13, 1987


4.2BSD IPC Primer          - 2 -                Introduction


                      _1. _I_N_T_R_O_D_U_C_T_I_O_N


One of the most important parts of 4.2BSD is  the  interpro-
cess  communication  facilities.   These  facilities are the
result of more than two years of  discussion  and  research.
The  facilities  provided  in 4.2BSD incorporate many of the
ideas from current research, while trying  to  maintain  the
UNIX  philosophy of simplicity and conciseness.  It is hoped
that the interprocess communication facilities  included  in
4.2BSD  will  establish  a  standard  for  UNIX.   From  the
response to the design, it appears many organizations carry-
ing out work with UNIX are adopting it.

     UNIX has previously been  very  weak  in  the  area  of
interprocess communication.  Prior to the 4.2BSD facilities,
the only standard mechanism which allowed two  processes  to
communicate  were  pipes  (the  mpx files which were part of
Version 7 were experimental).  Unfortunately, pipes are very
restrictive  in that the two communicating processes must be
related through a common ancestor.  Further,  the  semantics
of  pipes makes them almost impossible to maintain in a dis-
tributed environment.

     Earlier attempts at extending  the  ipc  facilities  of
UNIX  have  met  with  mixed  reaction.  The majority of the
problems have been related to the fact these facilities have
been tied to the UNIX file system; either through naming, or
implementation.  Consequently, the ipc  facilities  provided
in  4.2BSD  have been designed as a totally independent sub-
system.  The 4.2BSD ipc allows processes  to  rendezvous  in
many  ways.  Processes  may  rendezvous  through a UNIX file
system-like name space (a space where  all  names  are  path
names)  as  well  as through a network name space.  In fact,
new name spaces may be added at  a  future  time  with  only
minor  changes visible to users.  Further, the communication
facilities have been extended to included more than the sim-
ple  byte  stream  provided  by  a  pipe-like entity.  These
extensions have resulted in a completely  new  part  of  the
system  which users will need time to familiarize themselves
with.  It is likely that as more use is made of these facil-
ities they will be refined; only time will tell.

     The remainder of this document  is  organized  in  four
sections.  Section 2 introduces the new system calls and the
basic model of communication.  Section 3 describes  some  of
the  supporting  library  routines  users may find useful in
constructing distributed applications.  Section  4  is  con-
cerned  with  the  client/server  model  used  in developing
applications and includes examples of the two major types of
servers.   Section  5  delves  into  advanced  topics  which
sophisticated users are likely to encounter when  using  the
ipc facilities.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 3 -                      Basics


                         _2. _B_A_S_I_C_S


     The basic  building  block  for  communication  is  the
_s_o_c_k_e_t.  A socket is an endpoint of communication to which a
name may be _b_o_u_n_d.  Each socket in use has a _t_y_p_e and one or
more  associated processes.  Sockets exist within _c_o_m_m_u_n_i_c_a_-
_t_i_o_n _d_o_m_a_i_n_s.  A  communication  domain  is  an  abstraction
introduced to bundle common properties of processes communi-
cating through sockets.  One such  property  is  the  scheme
used  to  name sockets.  For example, in the UNIX communica-
tion domain sockets are named with UNIX path names;  e.g.  a
socket may be named ``/dev/foo''.  Sockets normally exchange
data only with sockets in the same domain (it may be  possi-
ble to cross domain boundaries, but only if some translation
process is performed).  The 4.2BSD ipc supports two separate
communication  domains:  the  UNIX  domain, and the Internet
domain is used by processes which communicate using the  the
DARPA  standard communication protocols. The underlying com-
munication facilities provided by these domains have a  sig-
nificant  influence on the internal system implementation as
well as the interface to socket facilities  available  to  a
user.   An  example of the latter is that a socket ``operat-
ing'' in the UNIX domain sees a subset of the possible error
conditions which are possible when operating in the Internet
domain.

_2._1.  _S_o_c_k_e_t _t_y_p_e_s

     Sockets are typed according to the  communication  pro-
perties visible to a user. Processes are presumed to commun-
icate only between sockets of the same type, although  there
is  nothing  that  prevents communication between sockets of
different types should the underlying  communication  proto-
cols support this.

     Three types of sockets currently  are  available  to  a
user.  A _s_t_r_e_a_m socket provides for the bidirectional, reli-
able, sequenced,  and  unduplicated  flow  of  data  without
record  boundaries.  Aside from the bidirectionality of data
flow, a pair of connected stream sockets provides an  inter-
face nearly identical to that of pipes*.

     A _d_a_t_a_g_r_a_m socket supports bidirectional flow  of  data
which is not promised to be sequenced, reliable, or undupli-
cated. That is, a process receiving messages on  a  datagram
socket  may  find  messages duplicated, and, possibly, in an
_________________________
* In the UNIX domain, in fact, the semantics are ident-
ical  and,  as one might expect, pipes have been imple-
mented internally as simply a pair of connected  stream
sockets.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 4 -                      Basics


order different from the order in  which  it  was  sent.  An
important characteristic of a datagram socket is that record
boundaries in data are preserved.  Datagram sockets  closely
model  the  facilities  found  in  many  contemporary packet
switched networks such as the Ethernet.

     A _r_a_w socket provides users access  to  the  underlying
communication  protocols  which support socket abstractions.
These sockets are normally datagram oriented,  though  their
exact  characteristics  are  dependent on the interface pro-
vided by the protocol.  Raw sockets are not intended for the
general  user;  they  have  been  provided  mainly for those
interested in developing new communication protocols, or for
gaining access to some of the more esoteric facilities of an
existing protocol.  The use of raw sockets is considered  in
section 5.

     Two potential socket types which have interesting  pro-
perties  are  the  _s_e_q_u_e_n_c_e_d  _p_a_c_k_e_t socket and the _r_e_l_i_a_b_l_y
_d_e_l_i_v_e_r_e_d _m_e_s_s_a_g_e socket.   A  sequenced  packet  socket  is
identical  to a stream socket with the exception that record
boundaries are preserved.  This interface is very similar to
that  provided  by  the  Xerox NS Sequenced Packet protocol.
The reliably delivered message socket has similar properties
to  a  datagram  socket,  but with reliable delivery.  While
these two socket types have been loosely defined,  they  are
currently  unimplemented  in 4.2BSD.  As such, in this docu-
ment we will concern ourselves only with  the  three  socket
types for which support exists.

_2._2.  _S_o_c_k_e_t _c_r_e_a_t_i_o_n

     To create a socket the _s_o_c_k_e_t system call is used:

        s = socket(domain, type, protocol);

This call requests that the system create a  socket  in  the
specified  _d_o_m_a_i_n  and  of the specified _t_y_p_e.  A particular
protocol may also be requested.  If  the  protocol  is  left
unspecified  (a  value  of  0),  the  system  will select an
appropriate protocol from those protocols which comprise the
communication  domain  and  which may be used to support the
requested socket type.  The user is returned a descriptor (a
small  integer  number)  which  may  be used in later system
calls which operate on sockets.  The domain is specified  as
one   of   the   manifest  constants  defined  in  the  file
<_s_y_s/_s_o_c_k_e_t._h>.   For  the  UNIX  domain  the  constant   is
AF_UNIX*;  for the  Internet  domain  AF_INET.   The  socket
types  are also defined in this file and one of SOCK_STREAM,
_________________________
* The manifest constants are named AF_whatever as  they
indicate  the ``address format'' to use in interpreting
names.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 5 -                      Basics


SOCK_DGRAM, or SOCK_RAW must  be  specified.   To  create  a
stream  socket  in  the  Internet  domain the following call
might be used:

        s = socket(AF_INET, SOCK_STREAM, 0);

This call would result in a stream socket being created with
the TCP protocol providing the underlying communication sup-
port.  To create a datagram socket for on-machine use a sam-
ple call might be:

        s = socket(AF_UNIX, SOCK_DGRAM, 0);


     To obtain a particular protocol one selects the  proto-
col number, as defined within the communication domain.  For
the Internet domain the available protocols are  defined  in
<_n_e_t_i_n_e_t/_i_n._h>  or,  better  yet,  one  may  use  one of the
library routines discussed in section 3, such as   _g_e_t_p_r_o_t_o_-
_b_y_n_a_m_e:

        #include <sys/types.h>
        #include <sys/socket.h>
        #include <netinet/in.h>
        #include <netdb.h>
         ...
        pp = getprotobyname("tcp");
        s = socket(AF_INET, SOCK_STREAM, pp->p_proto);


     There are several  reasons  a  socket  call  may  fail.
Aside  from the rare occurrence of lack of memory (ENOBUFS),
a socket request may fail due to a request  for  an  unknown
protocol  (EPROTONOSUPPORT),  or  a  request  for  a type of
socket for which there is no  supporting  protocol  (EPROTO-
TYPE).

_2._3.  _B_i_n_d_i_n_g _n_a_m_e_s

     A socket is created without a name.  Until  a  name  is
bound  to  a  socket,  processes have no way to reference it
and, consequently, no messages may be received on  it.   The
_b_i_n_d call is used to assign a name to a socket:

        bind(s, name, namelen);

The bound name is a variable length  byte  string  which  is
interpreted  by the supporting protocol(s).  Its interpreta-
tion may vary from  communication  domain  to  communication
domain  (this  is  one  of the properties which comprise the
``domain'').  In the UNIX domain names are path names  while
in the Internet domain names contain an Internet address and
port number.  If one wanted to bind the name ``/dev/foo'' to
a UNIX domain socket, the following would be used:


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 6 -                      Basics


        bind(s, "/dev/foo", sizeof ("/dev/foo") - 1);

(Note how the null byte in the name is not counted  as  part
of  the name.)  In binding an Internet address things become
more complicated.  The actual call is simple,

        #include <sys/types.h>
        #include <netinet/in.h>
         ...
        struct sockaddr_in sin;
         ...
        bind(s, &sin, sizeof (sin));

but the selection of  what  to  place  in  the  address  _s_i_n
requires  some discussion.  We will come back to the problem
of formulating Internet addresses  in  section  3  when  the
library routines used in name resolution are discussed.

_2._4.  _C_o_n_n_e_c_t_i_o_n _e_s_t_a_b_l_i_s_h_m_e_n_t

     With a bound socket it is possible to  rendezvous  with
an  unrelated process.  This operation is usually asymmetric
with one process a ``client'' and the  other  a  ``server''.
The client requests services from the server by initiating a
``connection'' to the server's  socket.   The  server,  when
willing   to   offer   its  advertised  services,  passively
``listens'' on its socket.  On the client side  the  _c_o_n_n_e_c_t
call  is  used  to  initiate  a  connection.  Using the UNIX
domain, this might appear as,

        connect(s, "server-name", sizeof ("server-name"));

while in the Internet domain,

        struct sockaddr_in server;
        connect(s, &server, sizeof (server));

If the client process's socket is unbound at the time of the
connect  call, the system will automatically select and bind
a name to  the  socket;  c.f.  section  5.4.   An  error  is
returned  when  the  connection  was  unsuccessful (any name
automatically bound by the system, however, remains).   Oth-
erwise,  the  socket  is associated with the server and data
transfer may begin.

     Many errors can be returned when a  connection  attempt
fails.  The most common are:

ETIMEDOUT
     After failing to establish a connection for a period of
     time, the system decided there was no point in retrying
     the connection attempt any more.  This  usually  occurs
     because  the  destination  host  is  down,  or  because


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 7 -                      Basics


     problems in the network resulted in transmissions being
     lost.

ECONNREFUSED
     The host refused service for some  reason.   When  con-
     necting to a host running 4.2BSD this is usually due to
     a server process not being  present  at  the  requested
     name.

ENETDOWN or EHOSTDOWN
     These operational errors are returned based  on  status
     information  delivered to the client host by the under-
     lying communication services.

ENETUNREACH or EHOSTUNREACH
     These operational errors can occur either  because  the
     network  or host is unknown (no route to the network or
     host is present),  or  because  of  status  information
     returned  by  intermediate gateways or switching nodes.
     Many times the status returned  is  not  sufficient  to
     distinguish  a  network  being  down  from a host being
     down.  In these cases the system  is  conservative  and
     indicates the entire network is unreachable.

     For the server to receive a client's connection it must
perform two steps after binding its socket.  The first is to
indicate a willingness to  listen  for  incoming  connection
requests:

        listen(s, 5);

The second parameter to the _l_i_s_t_e_n call specifies  the  max-
imum  number  of outstanding connections which may be queued
awaiting acceptance by the server process.  Should a connec-
tion  be  requested  while the queue is full, the connection
will not be refused,  but  rather  the  individual  messages
which  comprise  the  request will be ignored.  This gives a
harried server time to make room in its  pending  connection
queue  while the client retries the connection request.  Had
the connection been returned with  the  ECONNREFUSED  error,
the  client  would be unable to tell if the server was up or
not.  As it is now it is still possible to get the ETIMEDOUT
error  back,  though  this  is unlikely.  The backlog figure
supplied with the listen call is limited by the system to  a
maximum  of  5  pending  connections on any one queue.  This
avoids the problem of processes hogging system resources  by
setting  an  infinite  backlog, then ignoring all connection
requests.

     With a socket marked as listening, a server may  _a_c_c_e_p_t
a connection:

        fromlen = sizeof (from);
        snew = accept(s, &from, &fromlen);


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 8 -                      Basics


A new descriptor is returned  on  receipt  of  a  connection
(along with a new socket).  If the server wishes to find out
who its client is, it may supply a  buffer  for  the  client
socket's  name.   The value-result parameter _f_r_o_m_l_e_n is ini-
tialized by the server to indicate how much space is associ-
ated  with _f_r_o_m, then modified on return to reflect the true
size of the name.  If the client's name is not of  interest,
the second parameter may be zero.

     Accept normally blocks.  That is, the  call  to  accept
will  not return until a connection is available or the sys-
tem  call  is  interrupted  by  a  signal  to  the  process.
Further,  there  is no way for a process to indicate it will
accept connections from only a specific individual, or indi-
viduals.   It  is up to the user process to consider who the
connection is from and close down the connection if it  does
not  wish  to  speak  to the process.  If the server process
wants to accept connections on more than one socket, or  not
block  on the accept call there are alternatives;  they will
be considered in section 5.

_2._5.  _D_a_t_a _t_r_a_n_s_f_e_r

     With a connection established, data may begin to  flow.
To  send  and  receive  data  there are a number of possible
calls.  With the peer entity at each  end  of  a  connection
anchored,  a  user  can  send  or  receive a message without
specifying the peer.  As one might  expect,  in  this  case,
then the normal _r_e_a_d and _w_r_i_t_e system calls are useable,

        write(s, buf, sizeof (buf));
        read(s, buf, sizeof (buf));

In addition to _r_e_a_d and _w_r_i_t_e, the new calls _s_e_n_d  and  _r_e_c_v
may be used:

        send(s, buf, sizeof (buf), flags);
        recv(s, buf, sizeof (buf), flags);

While _s_e_n_d and _r_e_c_v are  virtually  identical  to  _r_e_a_d  and
_w_r_i_t_e, the extra _f_l_a_g_s argument is important.  The flags may
be specified as a non-zero value if one or more of the  fol-
lowing is required:


        SOF_OOB         send/receive out of band data
        SOF_PREVIEW     look at data without reading
        SOF_DONTROUTE   send data without routing packets


Out of band data is a notion specific to stream sockets, and
one  which  we will not immediately consider.  The option to
have data sent without routing applied to the outgoing pack-
ets  is  currently used only by the routing table management


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 9 -                      Basics


process, and is unlikely to be of  interest  to  the  casual
user.  The ability to preview data is, however, of interest.
When SOF_PREVIEW is specified with a  _r_e_c_v  call,  any  data
present  is  returned  to  the  user,  but  treated as still
``unread''.  That is, the next _r_e_a_d or _r_e_c_v call applied  to
the socket will return the data previously previewed.

_2._6.  _D_i_s_c_a_r_d_i_n_g _s_o_c_k_e_t_s

     Once a socket is no longer of interest, it may be  dis-
carded by applying a _c_l_o_s_e to the descriptor,

        close(s);

If data is associated with a socket which promises  reliable
delivery  (e.g.  a  stream socket) when a close takes place,
the system will continue to attempt to  transfer  the  data.
However,  after a fairly long period of time, if the data is
still undelivered, it will be discarded.  Should a user have
no  use  for  any pending data, it may perform a _s_h_u_t_d_o_w_n on
the socket prior to closing it.  This call is of the form:

        shutdown(s, how);

where _h_o_w is 0 if the user is no longer interested in  read-
ing data, 1 if no more data will be sent, or 2 if no data is
to be sent or  received.   Applying  shutdown  to  a  socket
causes any data queued to be immediately discarded.

_2._7.  _C_o_n_n_e_c_t_i_o_n_l_e_s_s _s_o_c_k_e_t_s

     To this point we have been concerned mostly with  sock-
ets  which  follow  a  connection  oriented model.  However,
there is also support for connectionless interactions  typi-
cal  of the datagram facilities found in contemporary packet
switched networks.  A datagram socket provides  a  symmetric
interface  to  data  exchange.   While  processes  are still
likely to be client and server, there is no requirement  for
connection  establishment.   Instead,  each message includes
the destination address.

     Datagram sockets are created as before, and each should
have  a  name  bound  to it in order that the recipient of a
message may identify the sender.  To send data,  the  _s_e_n_d_t_o
primitive is used,

        sendto(s, buf, buflen, flags, &to, tolen);

The _s, _b_u_f, _b_u_f_l_e_n, and _f_l_a_g_s parameters are used as before.
The  _t_o  and  _t_o_l_e_n values are used to indicate the intended
recipient of the message.  When using an unreliable datagram
interface, it is unlikely any errors will be reported to the
sender.  Where information is present locally to recognize a
message  which  may  never be delivered (for instance when a


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 10 -                     Basics


network is unreachable), the call will  return  -1  and  the
global value _e_r_r_n_o will contain an error number.

     To receive messages on an unconnected datagram  socket,
the _r_e_c_v_f_r_o_m primitive is provided:

        recvfrom(s, buf, buflen, flags, &from, &fromlen);

Once again, the _f_r_o_m_l_e_n parameter is  handled  in  a  value-
result  fashion,  initially  containing the size of the _f_r_o_m
buffer.

     In addition to the two calls mentioned above,  datagram
sockets  may also use the _c_o_n_n_e_c_t call to associate a socket
with a specific address.  In this case, any data sent on the
socket  will  automatically  be  addressed  to the connected
peer,  and  only  data  received  from  that  peer  will  be
delivered  to  the user.  Only one connected address is per-
mitted for each socket  (i.e.  no  multi-casting).   Connect
requests  on  datagram  sockets  return immediately, as this
simply results in the system recording  the  peer's  address
(as compared to a stream socket where a connect request ini-
tiates establishment of an end to end connection).  Other of
the less important details of datagram sockets are described
in section 5.

_2._8.  _I_n_p_u_t/_O_u_t_p_u_t _m_u_l_t_i_p_l_e_x_i_n_g

     One last facility often used in developing applications
is  the  ability  to  multiplex  i/o requests among multiple
sockets and/or files.  This is done using the _s_e_l_e_c_t call:

        select(nfds, &readfds, &writefds, &execptfds, &timeout);

_S_e_l_e_c_t takes as arguments three bit masks, one for  the  set
of  file  descriptors for which the caller wishes to be able
to read data on, one for those descriptors to which data  is
to  be written, and one for which exceptional conditions are
pending. Bit masks are created by or-ing bits  of  the  form
``1 << fd''.  That is, a descriptor _f_d is selected if a 1 is
present in the _f_d'th bit of the mask.   The  parameter  _n_f_d_s
specifies  the range of file descriptors  (i.e. one plus the
value of the largest descriptor) specified in a mask.

     A timeout value may be specified if  the  selection  is
not  to  last  more than a predetermined period of time.  If
_t_i_m_e_o_u_t is set to 0, the selection takes the form of a _p_o_l_l,
returning  immediately.   If  the  last  parameter is a null
pointer, the selection  will  block  indefinitely*.   _S_e_l_e_c_t
_________________________
* To be more specific, a return takes place only when a
descriptor  is selectable, or when a signal is received
by the caller, interrupting the system call.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 11 -                     Basics


normally returns the number of  file  descriptors  selected.
If the _s_e_l_e_c_t call returns due to the timeout expiring, then
a value of -1 is returned along with the error number EINTR.

     _S_e_l_e_c_t  provides  a  synchronous  multiplexing  scheme.
Asynchronous  notification of output completion, input avai-
lability, and exceptional conditions is possible through use
of the SIGIO and SIGURG signals described in section 5.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 12 -   Network Library Routines


                _3. _N_E_T_W_O_R_K _L_I_B_R_A_R_Y _R_O_U_T_I_N_E_S


     The discussion in section 2 indicated the possible need
to  locate  and  construct  network addresses when using the
interprocess  communication  facilities  in  a   distributed
environment.   To aid in this task a number of routines have
been added to the standard C run-time library.  In this sec-
tion  we  will consider the new routines provided to manipu-
late network addresses.  While the 4.2BSD networking facili-
ties  support  only  the  DARPA standard Internet protocols,
these routines have been designed with flexibility in  mind.
As  more  communication  protocols become available, we hope
the same user interface  will  be  maintained  in  accessing
network-related  address  data  bases.   The only difference
should be the values returned  to  the  user.   Since  these
values  are  normally  supplied the system, users should not
need to be directly  aware  of  the  communication  protocol
and/or naming conventions in use.

     Locating a service on a remote host requires many  lev-
els  of mapping before client and server may communicate.  A
service is assigned a name which is intended for human  con-
sumption;  e.g.  ``the  _l_o_g_i_n  _s_e_r_v_e_r on host monet''.  This
name, and the name of the peer host, must then be translated
into  network  _a_d_d_r_e_s_s_e_s  which are not necessarily suitable
for human consumption.  Finally, the address must then  used
in  locating  a  physical _l_o_c_a_t_i_o_n and _r_o_u_t_e to the service.
The specifics of these three  mappings  is  likely  to  vary
between  network  architectures.  For instance, it is desir-
able for a network to not require hosts be named in  such  a
way  that  their  physical  location  is known by the client
host.  Instead, underlying services in the network may  dis-
cover  the  actual location of the host at the time a client
host wishes to communicate.   This  ability  to  have  hosts
named  in  a location independent manner may induce overhead
in connection establishment, as  a  discovery  process  must
take  place,  but  allows  a  host  to  be physically mobile
without requiring it to notify its clientele of its  current
location.

     Standard routines are provided for: mapping host  names
to network addresses, network names to network numbers, pro-
tocol names to protocol numbers, and service names  to  port
numbers and the appropriate protocol to use in communicating
with  the  server  process.   The  file  <_n_e_t_d_b._h>  must  be
included when using any of these routines.

_3._1.  _H_o_s_t _n_a_m_e_s

     A host name to address mapping is  represented  by  the
_h_o_s_t_e_n_t structure:


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 13 -   Network Library Routines


        struct hostent {
               char      *h_name;             /* official name of host */
               char      **h_aliases;         /* alias list */
               int       h_addrtype;          /* host address type */
               int       h_length;            /* length of address */
               char      *h_addr;             /* address */
        };

The official name of the host and  its  public  aliases  are
returned,  along  with a variable length address and address
type.  The routine _g_e_t_h_o_s_t_b_y_n_a_m_e(3N) takes a host  name  and
returns    a    _h_o_s_t_e_n_t   structure,   while   the   routine
_g_e_t_h_o_s_t_b_y_a_d_d_r(3N) maps host addresses into a _h_o_s_t_e_n_t  struc-
ture.  It is possible for a host to have many addresses, all
having the same  name.   _G_e_t_h_o_s_t_y_b_y_n_a_m_e  returns  the  first
matching  entry in the data base file /_e_t_c/_h_o_s_t_s; if this is
unsuitable, the lower level routine  _g_e_t_h_o_s_t_e_n_t(3N)  may  be
used.  For example, to obtain a _h_o_s_t_e_n_t structure for a host
on a particular network the following routine might be  used
(for simplicity, only Internet addresses are considered):

        #include <sys/types.h>
        #include <sys/socket.h>
        #include <netinet/in.h>
        #include <netdb.h>
         ...
        struct hostent *
        gethostbynameandnet(name, net)
               char *name;
               int net;
        {
               register struct hostent *hp;
               register char **cp;

               sethostent(0);
               while ((hp = gethostent()) != NULL) {
                      if (hp->h_addrtype != AF_INET)
                             continue;
                      if (strcmp(name, hp->h_name)) {
                             for (cp = hp->h_aliases; cp && *cp != NULL; cp++)
                                    if (strcmp(name, *cp) == 0)
                                           goto found;
                             continue;
                      }
               found:
                      if (in_netof(*(struct in_addr *)hp->h_addr)) == net)
                             break;
               }
               endhostent(0);
               return (hp);
        }

(_i_n__n_e_t_o_f(3N)  is  a  standard  routine  which  returns  the


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 14 -   Network Library Routines


network portion of an Internet address.)

_3._2.  _N_e_t_w_o_r_k _n_a_m_e_s

     As for host names, routines for mapping  network  names
to numbers, and back, are provided.  These routines return a
_n_e_t_e_n_t structure:

        /*
         * Assumption here is that a network number
         * fits in 32 bits -- probably a poor one.
         */
        struct netent {
               char      *n_name;             /* official name of net */
               char      **n_aliases;         /* alias list */
               int       n_addrtype;          /* net address type */
               int       n_net;               /* network # */
        };

The  routines  _g_e_t_n_e_t_b_y_n_a_m_e(3N),   _g_e_t_n_e_t_b_y_n_u_m_b_e_r(3N),   and
_g_e_t_n_e_t_e_n_t(3N)  are the network counterparts to the host rou-
tines described above.

_3._3.  _P_r_o_t_o_c_o_l _n_a_m_e_s

     For  protocols  the  _p_r_o_t_o_e_n_t  structure  defines   the
protocol-name     mapping    used    with    the    routines
_g_e_t_p_r_o_t_o_b_y_n_a_m_e(3N),        _g_e_t_p_r_o_t_o_b_y_n_u_m_b_e_r(3N),         and
_g_e_t_p_r_o_t_o_e_n_t(3N):

        struct protoent {
               char      *p_name;             /* official protocol name */
               char      **p_aliases;         /* alias list */
               int       p_proto;             /* protocol # */
        };


_3._4.  _S_e_r_v_i_c_e _n_a_m_e_s

     Information regarding services is a  bit  more  compli-
cated.   A  service  is  expected  to  reside  at a specific
``port'' and employ  a  particular  communication  protocol.
This view is consistent with the Internet domain, but incon-
sistent with other network architectures.  Further,  a  ser-
vice may reside on multiple ports or support multiple proto-
cols.  If either of these occurs, the higher  level  library
routines will have to be bypassed in favor of homegrown rou-
tines similar in spirit to the ``gethostbynameandnet''  rou-
tine described above.  A service mapping is described by the
_s_e_r_v_e_n_t structure,


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 15 -   Network Library Routines


        struct servent {
               char      *s_name;             /* official service name */
               char      **s_aliases;         /* alias list */
               int       s_port;              /* port # */
               char      *s_proto;            /* protocol to use */
        };

The routine _g_e_t_s_e_r_v_b_y_n_a_m_e(3N) maps service names to  a  ser-
vent structure by specifying a service name and, optionally,
a qualifying protocol.  Thus the call

        sp = getservbyname("telnet", (char *)0);

returns the service specification for a telnet server  using
any protocol, while the call

        sp = getservbyname("telnet", "tcp");

returns only that telnet server which uses the TCP protocol.
The  routines  _g_e_t_s_e_r_v_b_y_p_o_r_t(3N) and _g_e_t_s_e_r_v_e_n_t(3N) are also
provided.  The _g_e_t_s_e_r_v_b_y_p_o_r_t routine has an interface  simi-
lar  to that provided by _g_e_t_s_e_r_v_b_y_n_a_m_e; an optional protocol
name may be specified to qualify lookups.

_3._5.  _M_i_s_c_e_l_l_a_n_e_o_u_s

     With the support routines described above, an  applica-
tion  program  should  rarely  have  to  deal  directly with
addresses.  This allows services to be developed as much  as
possible  in  a  network  independent fashion.  It is clear,
however, that purging all network dependencies is very  dif-
ficult.   So  long as the user is required to supply network
addresses when naming services and sockets there will always
some network dependency in a program.  For example, the nor-
mal code included in client programs,  such  as  the  remote
login program, is of the form  shown  in  Figure  1.   (This
example will be considered in more detail in section 4.)

     If we wanted to make the remote login program  indepen-
dent  of  the  Internet  protocols  and addressing scheme we
would be forced to add a layer of routines which masked  the
network  dependent  aspects  from the mainstream login code.
For the current facilities available in the system this does
not  appear  to  be  worthwhile.  Perhaps when the system is
adapted to different  network  architectures  the  utilities
will be reorganized more cleanly.

     Aside from  the  address-related  data  base  routines,
there  are  several other routines available in the run-time
library which are of interest to users.  These are  intended
mostly  to  simplify  manipulation  of  names and addresses.
Table 1 summarizes the routines  for  manipulating  variable
length  byte  strings  and handling byte swapping of network


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 16 -   Network Library Routines


        #include <sys/types.h>
        #include <sys/socket.h>
        #include <netinet/in.h>
        #include <stdio.h>
        #include <netdb.h>
         ...
        main(argc, argv)
               char *argv[];
        {
               struct sockaddr_in sin;
               struct servent *sp;
               struct hostent *hp;
               int s;
               ...
               sp = getservbyname("login", "tcp");
               if (sp == NULL) {
                      fprintf(stderr, "rlogin: tcp/login: unknown service\n");
                      exit(1);
               }
               hp = gethostbyname(argv[1]);
               if (hp == NULL) {
                      fprintf(stderr, "rlogin: %s: unknown host\n", argv[1]);
                      exit(2);
               }
               bzero((char *)&sin, sizeof (sin));
               bcopy(hp->h_addr, (char *)&sin.sin_addr, hp->h_length);
               sin.sin_family = hp->h_addrtype;
               sin.sin_port = sp->s_port;
               s = socket(AF_INET, SOCK_STREAM, 0);
               if (s < 0) {
                      perror("rlogin: socket");
                      exit(3);
               }
               ...
               if (connect(s, (char *)&sin, sizeof (sin)) < 0) {
                      perror("rlogin: connect");
                      exit(5);
               }
               ...
        }

            Figure 1.  Remote login client code.
addresses and values.

     The byte swapping routines  are  provided  because  the
operating system expects addresses to be supplied in network
order.  On a VAX, or machine with similar architecture, this
is  usually  reversed.  Consequently, programs are sometimes
required to byte  swap  quantities.   The  library  routines
which return network addresses provide them in network order
so that they may simply be copied into the  structures  pro-
vided  to  the  system.  This implies users should encounter
the byte swapping problem  only  when  _i_n_t_e_r_p_r_e_t_i_n_g  network


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 17 -   Network Library Routines


8____________________________________________________________________________
 Call               Synopsis
8____________________________________________________________________________
 bcmp(s1, s2, n)    compare byte-strings; 0 if same, not 0 otherwise
 bcopy(s1, s2, n)   copy n bytes from s1 to s2
 bzero(base, n)     zero-fill n bytes starting at base
 htonl(val)         convert 32-bit quantity from host to network byte order
 htons(val)         convert 16-bit quantity from host to network byte order
 ntohl(val)         convert 32-bit quantity from network to host byte order
 ntohs(val)         convert 16-bit quantity from network to host byte order
8____________________________________________________________________________
7|7|7|7|7|7|7|7|7|


                  |7|7|7|7|7|7|7|7|


                                                                            |7|7|7|7|7|7|7|7|


               Table 1.  C run-time routines.
addresses.  For example,  if  an  Internet  port  is  to  be
printed out the following code would be required:

        printf("port number %d\n", ntohs(sp->s_port));

On machines other than the VAX these routines are defined as
null macros.


9


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 18 -        Client/Server Model


                   _4. _C_L_I_E_N_T/_S_E_R_V_E_R _M_O_D_E_L


     The most commonly used paradigm in constructing distri-
buted  applications  is  the  client/server  model.  In this
scheme client applications request services  from  a  server
process.  This implies an asymmetry in establishing communi-
cation between the client and server which has been examined
in  section 2.  In this section we will look more closely at
the interactions between client  and  server,  and  consider
some  of the problems in developing client and server appli-
cations.

     Client and server require a well known set  of  conven-
tions  before  service may be rendered (and accepted).  This
set of conventions comprises a protocol which must be imple-
mented  at  both  ends  of  a  connection.  Depending on the
situation, the protocol may be symmetric or asymmetric.   In
a  symmetric  protocol,  either  side may play the master or
slave roles.  In an asymmetric protocol, one side is  immut-
ably  recognized as the master, with the other the slave. An
example of a symmetric protocol is the TELNET protocol  used
in  the  Internet for remote terminal emulation.  An example
of an asymmetric protocol is the Internet file transfer pro-
tocol, FTP.  No matter whether the specific protocol used in
obtaining a service is symmetric or asymmetric, when access-
ing  a  service there is a ``client process'' and a ``server
process''.  We will first consider the properties of  server
processes, then client processes.

     A server  process  normally  listens  at  a  well  know
address for service requests.  Alternative schemes which use
a service server may be used to eliminate a flock of  server
processes  clogging  the system while remaining dormant most
of the time.  The Xerox Courier  protocol  uses  the  latter
scheme.   When  using Courier, a Courier client process con-
tacts a Courier server at the remote host and identifies the
service  it  requires.   The  Courier  server  process  then
creates the appropriate server process based on a data  base
and  ``splices'' the client and server together, voiding its
part in the transaction.  This scheme is attractive in  that
the  Courier  server  process  may  provide a single contact
point for all services, as well as carrying out the  initial
steps  in authentication.  However, while this is an attrac-
tive possibility for standardizing access  to  services,  it
does  introduce  a  certain  amount  of  overhead due to the
intermediate process involved.  Implementations  which  pro-
vide this type of service within the system can minimize the
cost  of  client  server  rendezvous.   The  _p_o_r_t_a_l   notion
described  in  the ``4.2BSD System Manual'' embodies many of
the ideas found in Courier, with  the  rendezvous  mechanism
implemented internal to the system.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 19 -        Client/Server Model


_4._1.  _S_e_r_v_e_r_s

     In 4.2BSD most  servers  are  accessed  at  well  known
Internet  addresses  or UNIX domain names.  When a server is
started at boot time it advertises it services by  listening
at  a  well  know  location.   For example, the remote login
server's main loop is of the form shown in Figure 2.

        main(argc, argv)
               int argc;
               char **argv;
        {
               int f;
               struct sockaddr_in from;
               struct servent *sp;

               sp = getservbyname("login", "tcp");
               if (sp == NULL) {
                      fprintf(stderr, "rlogind: tcp/login: unknown service\n");
                      exit(1);
               }
               ...
        #ifndef DEBUG
               <<disassociate server from controlling terminal>>
        #endif
               ...
               sin.sin_port = sp->s_port;
               ...
               f = socket(AF_INET, SOCK_STREAM, 0);
               ...
               if (bind(f, (caddr_t)&sin, sizeof (sin)) < 0) {
                      ...
               }
               ...
               listen(f, 5);
               for (;;) {
                      int g, len = sizeof (from);

                      g = accept(f, &from, &len);
                      if (g < 0) {
                             if (errno != EINTR)
                                    perror("rlogind: accept");
                             continue;
                      }
                      if (fork() == 0) {
                             close(f);
                             doit(g, &from);
                      }
                      close(g);
               }
        }

              Figure 2.  Remote login server.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 20 -        Client/Server Model


     The first step taken by the server is look up its  ser-
vice definition:

     sp = getservbyname("login", "tcp");
     if (sp == NULL) {
            fprintf(stderr, "rlogind: tcp/login: unknown service\n");
            exit(1);
     }

This definition is used in later portions  of  the  code  to
define  the  Internet  port  at which it listens for service
requests (indicated by a connection).

     Step two is to disassociate the server  from  the  con-
trolling  terminal of its invoker.  This is important as the
server will likely not want to receive signals delivered  to
the process group of the controlling terminal.

     Once a server has established a  pristine  environment,
it  creates  a socket and begins accepting service requests.
The _b_i_n_d call is required to insure the  server  listens  at
its  expected location.  The main body of the loop is fairly
simple:

        for (;;) {
               int g, len = sizeof (from);

               g = accept(f, &from, &len);
               if (g < 0) {
                      if (errno != EINTR)
                             perror("rlogind: accept");
                      continue;
               }
               if (fork() == 0) {
                      close(f);
                      doit(g, &from);
               }
               close(g);
        }

An _a_c_c_e_p_t call blocks the server  until  a  client  requests
service.   This  call  could  return a failure status if the
call is interrupted by a signal such as SIGCHLD (to be  dis-
cussed  in  section  5).   Therefore,  the return value from
_a_c_c_e_p_t is checked to insure a connection has  actually  been
established.   With  a  connection  in hand, the server then
forks a child process and  invokes  the  main  body  of  the
remote  login protocol processing.  Note how the socket used
by the parent for queueing connection requests is closed  in
the  child,  while  the  socket  created  as a result of the
accept is closed in the parent.  The address of  the  client
is  also  handed  the _d_o_i_t routine because it requires it in
authenticating clients.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 21 -        Client/Server Model


_4._2.  _C_l_i_e_n_t_s

     The client side of the remote login service  was  shown
earlier  in  Figure 1.  One can see the separate, asymmetric
roles of the client and server clearly  in  the  code.   The
server  is  a  passive  entity, listening for client connec-
tions, while the client process is an  active  entity,  ini-
tiating a connection when invoked.

     Let us consider more closely the  steps  taken  by  the
client  remote  login process.  As in the server process the
first step is to locate the service definition for a  remote
login:

        sp = getservbyname("login", "tcp");
        if (sp == NULL) {
                fprintf(stderr, "rlogin: tcp/login: unknown service\n");
                exit(1);
        }

Next the destination host is looked up with a  _g_e_t_h_o_s_t_b_y_n_a_m_e
call:

        hp = gethostbyname(argv[1]);
        if (hp == NULL) {
                fprintf(stderr, "rlogin: %s: unknown host\n", argv[1]);
                exit(2);
        }

With this accomplished, all that is required is to establish
a  connection  to the server at the requested host and start
up  the  remote  login  protocol.   The  address  buffer  is
cleared,  then  filled  in  with the Internet address of the
foreign host and the port number at which the login  process
resides:

        bzero((char *)&sin, sizeof (sin));
        bcopy(hp->h_addr, (char *)sin.sin_addr, hp->h_length);
        sin.sin_family = hp->h_addrtype;
        sin.sin_port = sp->s_port;

A socket is created, and a connection initiated.

        s = socket(hp->h_addrtype, SOCK_STREAM, 0);
        if (s < 0) {
                perror("rlogin: socket");
                exit(3);
        }
         ...
        if (connect(s, (char *)&sin, sizeof (sin)) < 0) {
                perror("rlogin: connect");
                exit(4);
        }


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 22 -        Client/Server Model


The details of the remote login protocol will  not  be  con-
sidered here.

_4._3.  _C_o_n_n_e_c_t_i_o_n_l_e_s_s _s_e_r_v_e_r_s

     While connection-based services are the norm, some ser-
vices  are  based  on  the use of datagram sockets.  One, in
particular, is the ``rwho''  service  which  provides  users
with  status information for hosts connected to a local area
network.  This service, while predicated on the  ability  to
_b_r_o_a_d_c_a_s_t information to all hosts connected to a particular
network, is of interest as  an  example  usage  of  datagram
sockets.

     A user on any machine running the rwho server may  find
out the current status of a machine with the _r_u_p_t_i_m_e(1) pro-
gram.  The output generated is illustrated in Figure 3.


arpa        up   9:45,       5 users, load   1.15,   1.39,   1.31
cad         up   2+12:04,    8 users, load   4.67,   5.13,   4.59
calder      up   10:10,      0 users, load   0.27,   0.15,   0.14
dali        up   2+06:28,    9 users, load   1.04,   1.20,   1.65
degas       up   25+09:48,   0 users, load   1.49,   1.43,   1.41
ear         up   5+00:05,    0 users, load   1.51,   1.54,   1.56
ernie     down   0:24
esvax     down   17:04
ingres    down   0:26
kim         up   3+09:16,    8 users, load   2.03,   2.46,   3.11
matisse     up   3+06:18,    0 users, load   0.03,   0.03,   0.05
medea       up   3+09:39,    2 users, load   0.35,   0.37,   0.50
merlin    down   19+15:37
miro        up   1+07:20,    7 users, load   4.59,   3.28,   2.12
monet       up   1+00:43,    2 users, load   0.22,   0.09,   0.07
oz        down   16:09
statvax     up   2+15:57,    3 users, load   1.52,   1.81,   1.86
ucbvax      up   9:34,       2 users, load   6.08,   5.16,   3.28


                 Figure 3. ruptime output.


     Status information for each host is periodically broad-
cast  by  rwho  server  processes on each machine.  The same
server process also receives the status information and uses
it  to update a database.  This database is then interpreted
to generate the status information for each  host.   Servers
operate  autonomously, coupled only by the local network and
its broadcast capabilities.

     The rwho server, in a simplified form, is  pictured  in
Figure  4.   There  are  two separate tasks performed by the
server.  The first task is to act as a  receiver  of  status
information  broadcast  by other hosts on the network.  This


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 23 -        Client/Server Model


job is carried out in the main loop of the program.  Packets
received at the rwho port are interrogated to insure they've
been sent by another rwho  server  process,  then  are  time
stamped  with  their  arrival time and used to update a file
indicating the status of the host.  When a host has not been
heard  from  for  an  extended  period of time, the database
interpretation routines assume the host is down and indicate
such  on  the  status  reports.   This algorithm is prone to
error as a server may be down while a host is  actually  up,
but serves our current needs.

     The second task performed by the server  is  to  supply
information regarding the status of its host.  This involves
periodically acquiring system status information,  packaging
it  up in a message and broadcasting it on the local network
for other rwho servers to  hear.   The  supply  function  is
triggered  by  a  timer and runs off a signal.  Locating the
system status information is somewhat  involved,  but  unin-
teresting.   Deciding where to transmit the resultant packet
does, however, indicates some problems with the current pro-
tocol.

     Status information is broadcast on the  local  network.
For  networks  which  do not support the notion of broadcast
another scheme must be used to simulate  or  replace  broad-
casting.   One  possibility is to enumerate the known neigh-
bors (based on the status received).   This,  unfortunately,
requires some bootstrapping information, as a server started
up on a quiet network will have no known neighbors and  thus
never receive, or send, any status information.  This is the
identical problem faced by the routing table management pro-
cess  in  propagating routing status information.  The stan-
dard solution, unsatisfactory as it may be, is to inform one
or  more  servers  of  known neighbors and request that they
always communicate with these neighbors.  If each server has
at  least  one  neighbor supplied it, status information may
then propagate through a neighbor to  hosts  which  are  not
(possibly)  directly  neighbors.   If  the server is able to
support networks which provide a  broadcast  capability,  as
well  as those which do not, then networks with an arbitrary
topology may share status information*.

     The second problem with the current scheme is that  the
rwho  process services only a single local network, and this
network is found by reading a file.  It  is  important  that
software operating in a distributed environment not have any
site-dependent information compiled  into  it.   This  would
_________________________
* One must,  however,  be  concerned  about  ``loops''.
That  is,  if a host is connected to multiple networks,
it will receive status information from  itself.   This
can  lead to an endless, wasteful, exchange of informa-
tion.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 24 -        Client/Server Model


        main()
        {
               ...
               sp = getservbyname("who", "udp");
               net = getnetbyname("localnet");
               sin.sin_addr = inet_makeaddr(INADDR_ANY, net);
               sin.sin_port = sp->s_port;
               ...
               s = socket(AF_INET, SOCK_DGRAM, 0);
               ...
               bind(s, &sin, sizeof (sin));
               ...
               sigset(SIGALRM, onalrm);
               onalrm();
               for (;;) {
                      struct whod wd;
                      int cc, whod, len = sizeof (from);

                      cc = recvfrom(s, (char *)&wd, sizeof (struct whod), 0, &from, &len);
                      if (cc <= 0) {
                             if (cc < 0 && errno != EINTR)
                                    perror("rwhod: recv");
                             continue;
                      }
                      if (from.sin_port != sp->s_port) {
                             fprintf(stderr, "rwhod: %d: bad from port\n",
                                    ntohs(from.sin_port));
                             continue;
                      }
                      ...
                      if (!verify(wd.wd_hostname)) {
                             fprintf(stderr, "rwhod: malformed host name from %x\n",
                                    ntohl(from.sin_addr.s_addr));
                             continue;
                      }
                      (void) sprintf(path, "%s/whod.%s", RWHODIR, wd.wd_hostname);
                      whod = open(path, FWRONLY|FCREATE|FTRUNCATE, 0666);
                      ...
                      (void) time(&wd.wd_recvtime);
                      (void) write(whod, (char *)&wd, cc);
                      (void) close(whod);
               }
        }

                  Figure 4.  rwho server.
require a separate copy of the server at each host and  make
maintenance  a  severe headache.  4.2BSD attempts to isolate
host-specific information  from  applications  by  providing
system  calls  which  return  the  necessary   information|-.
_________________________
|- An example of such a  system  call  is  the  _g_e_t_h_o_s_t-
_n_a_m_e(2)  call  which  returns  the  host's ``official''
name.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 25 -        Client/Server Model


Unfortunately, no straightforward mechanism currently exists
for  finding  the  collection of networks to which a host is
directly connected.  Thus the rwho server performs a  lookup
in a file to find its local network.  A better, though still
unsatisfactory, scheme used by the  routing  process  is  to
interrogate  the  system  data  structures  to  locate those
directly connected networks.  A mechanism  to  acquire  this
information from the system would be a useful addition.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 26 -            Advanced Topics


                     _5. _A_D_V_A_N_C_E_D _T_O_P_I_C_S


     A number of facilities have yet to be  discussed.   For
most  users of the ipc the mechanisms already described will
suffice in constructing distributed applications.   However,
others  will find need to utilize some of the features which
we consider in this section.

_5._1.  _O_u_t _o_f _b_a_n_d _d_a_t_a

     The stream socket abstraction includes  the  notion  of
``out  of  band''  data.   Out  of  band data is a logically
independent transmission channel associated with  each  pair
of  connected stream sockets.  Out of band data is delivered
to the user independently of  normal  data  along  with  the
SIGURG  signal.   In  addition  to the information passed, a
logical mark is placed in the data stream  to  indicate  the
point  at  which  the out of band data was sent.  The remote
login and remote shell applications  use  this  facility  to
propagate  signals from between client and server processes.
When a signal is expected to flush any pending  output  from
the  remote process(es), all data up to the mark in the data
stream is discarded.

     The stream abstraction defines that  the  out  of  band
data  facilities  must  support  the reliable delivery of at
least one out of band message at a time.  This  message  may
contain  at least one byte of data, and at least one message
may be pending delivery to the user at any  one  time.   For
communications  protocols which support only in-band signal-
ing (i.e. the urgent data is delivered in sequence with  the
normal  data)  the  system extracts the data from the normal
data stream and stores it separately.  This allows users  to
choose  between  receiving  the  urgent  data  in  order and
receiving it out of sequence without having  to  buffer  all
the intervening data.

     To send an out of band message the SOF_OOB flag is sup-
plied  to  a  _s_e_n_d  or _s_e_n_d_t_o calls, while to receive out of
band data SOF_OOB should  be  indicated  when  performing  a
_r_e_c_v_f_r_o_m  or  _r_e_c_v call.  To find out if the read pointer is
currently pointing at the  mark  in  the  data  stream,  the
SIOCATMARK ioctl is provided:

        ioctl(s, SIOCATMARK, &yes);

If _y_e_s is a 1 on return, the  next  read  will  return  data
after  the  mark.   Otherwise (assuming out of band data has
arrived), the next read will provide data sent by the client
prior  to  transmission of the out of band signal.  The rou-
tine used in the remote login process  to  flush  output  on
receipt of an interrupt or quit signal is shown in Figure 5.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 27 -            Advanced Topics


        oob()
        {
                int out = 1+1;
                char waste[BUFSIZ], mark;

                signal(SIGURG, oob);
                /* flush local terminal input and output */
                ioctl(1, TIOCFLUSH, (char *)&out);
                for (;;) {
                        if (ioctl(rem, SIOCATMARK, &mark) < 0) {
                                perror("ioctl");
                                break;
                        }
                        if (mark)
                                break;
                        (void) read(rem, waste, sizeof (waste));
                }
                recv(rem, &mark, 1, SOF_OOB);
                ...
        }

Figure 5.  Flushing terminal i/o on receipt of out of band data.

_5._2.  _S_i_g_n_a_l_s _a_n_d _p_r_o_c_e_s_s _g_r_o_u_p_s

     Due to the existence of the SIGURG  and  SIGIO  signals
each socket has an associated process group (just as is done
for terminals).  This process group is  initialized  to  the
process  group  of  its  creator,  but may be redefined at a
later time with the SIOCSPGRP ioctl:

        ioctl(s, SIOCSPGRP, &pgrp);

A similar ioctl, SIOCGPGRP, is available for determining the
current process group of a socket.

_5._3.  _P_s_e_u_d_o _t_e_r_m_i_n_a_l_s

     Many programs will not function properly without a ter-
minal  for standard input and output.  Since a socket is not
a terminal, it is often necessary to have a process communi-
cating  over the network do so through a _p_s_e_u_d_o _t_e_r_m_i_n_a_l.  A
pseudo terminal is actually a pair of  devices,  master  and
slave,  which allow a process to serve as an active agent in
communication between processes and users.  Data written  on
the  slave side of a pseudo terminal is supplied as input to
a process reading from the master side.  Data written on the
master  side  is given the slave as input.  In this way, the
process manipulating the master side of the pseudo  terminal
has  control  over  the  information read and written on the
slave side.  The remote login server uses  pseudo  terminals
for  remote  login sessions.  A user logging in to a machine
across the network is provided a shell with a  slave  pseudo


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 28 -            Advanced Topics


terminal  as  standard input, output, and error.  The server
process then handles the communication between the  programs
invoked by the remote shell and the user's local client pro-
cess.  When a user sends an interrupt or quit  signal  to  a
process executing on a remote machine, the client login pro-
gram traps the signal, sends an out of band message  to  the
server  process who then uses the signal number, sent as the
data value  in  the  out  of  band  message,  to  perform  a
_k_i_l_l_p_g(2) on the appropriate process group.

_5._4.  _I_n_t_e_r_n_e_t _a_d_d_r_e_s_s _b_i_n_d_i_n_g

     Binding addresses to sockets in the Internet domain can
be  fairly complex.  Communicating processes are bound by an
_a_s_s_o_c_i_a_t_i_o_n.   An  association  is  composed  of  local  and
foreign  addresses,  and  local  and  foreign  ports.   Port
numbers are allocated out of separate spaces, one  for  each
Internet  protocol.   Associations  are always unique.  That
is, there may never be duplicate <protocol,  local  address,
local port, foreign address, foreign port> tuples.

     The bind system call allows a process to  specify  half
of  an  association,  <local address, local port>, while the
connect  and  accept  primitives  are  used  to  complete  a
socket's  association.   Since the association is created in
two steps the association uniqueness  requirement  indicated
above  could  be violated unless care is taken.  Further, it
is unrealistic to expect user programs to always know proper
values  to  use for the local address and local port since a
host may reside on multiple networks and the  set  of  allo-
cated port numbers is not directly accessible to a user.

     To simplify local  address  binding  the  notion  of  a
``wildcard''  address has been provided.  When an address is
specified as INADDR_ANY  (a  manifest  constant  defined  in
<netinet/in.h>),  the system interprets the address as ``any
valid address''.  For  example,  to  bind  a  specific  port
number to a socket, but leave the local address unspecified,
the following code might be used:

        #include <sys/types.h>
        #include <netinet/in.h>
         ...
        struct sockaddr_in sin;
         ...
        s = socket(AF_INET, SOCK_STREAM, 0);
        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = INADDR_ANY;
        sin.sin_port = MYPORT;
        bind(s, (char *)&sin, sizeof (sin));

Sockets with wildcarded local addresses may receive messages
directed  to the specified port number, and addressed to any
of the possible addresses assigned a host.  For example,  if


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 29 -            Advanced Topics


a  host  is on a networks 46 and 10 and a socket is bound as
above, then an accept call is performed, the process will be
able  to accept connection requests which arrive either from
network 46 or network 10.

     In a similar fashion, a local port may be left unspeci-
fied  (specified  as  zero),  in  which case the system will
select an appropriate port number for it.  For example:

        sin.sin_addr.s_addr = MYADDRESS;
        sin.sin_port = 0;
        bind(s, (char *)&sin, sizeof (sin));

The system selects the port number based  on  two  criteria.
The first is that ports numbered 0 through 1023 are reserved
for privileged users (i.e. the super user).  The  second  is
that  the  port  number is not currently bound to some other
socket.  In  order  to  find  a  free  port  number  in  the
privileged  range  the  following code is used by the remote
shell server:

        struct sockaddr_in sin;
         ...
        lport = IPPORT_RESERVED - 1;
        sin.sin_addr.s_addr = INADDR_ANY;
         ...
        for (;;) {
                sin.sin_port = htons((u_short)lport);
                if (bind(s, (caddr_t)&sin, sizeof (sin)) >= 0)
                        break;
                if (errno != EADDRINUSE && errno != EADDRNOTAVAIL) {
                        perror("socket");
                        break;
                }
                lport--;
                if (lport == IPPORT_RESERVED/2) {
                        fprintf(stderr, "socket: All ports in use\n");
                        break;
                }
        }

The restriction  on  allocating  ports  was  done  to  allow
processes  executing  in a ``secure'' environment to perform
authentication based on the  originating  address  and  port
number.

     In certain cases the algorithm used by  the  system  in
selecting  port  numbers  is  unsuitable for an application.
This is due to associations being created in a two step pro-
cess.   For  example,  the  Internet file transfer protocol,
FTP, specifies that data connections must  always  originate
from  the  same local port.  However, duplicate associations
are avoided by connecting to different  foreign  ports.   In
this  situation  the  system would disallow binding the same


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 30 -            Advanced Topics


local address and port number to a socket if a previous data
connection's  socket  were  around.  To override the default
port selection algorithm then an option call  must  be  per-
formed prior to address binding:

        setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (char *)0, 0);
        bind(s, (char *)&sin, sizeof (sin));

With the above call, local addresses may be bound which  are
already  in  use.   This  does  not  violate  the uniqueness
requirement as the system still checks at connect time to be
sure  any other sockets with the same local address and port
do not have the same foreign address and port (if an associ-
ation already exists, the error EADDRINUSE is returned).

     Local address binding by the system is  currently  done
somewhat  haphazardly  when  a host is on multiple networks.
Logically, one would expect the system  to  bind  the  local
address associated with the network through which a peer was
communicating.  For instance, if the local host is connected
to networks 46 and 10 and the foreign host is on network 32,
and traffic from network 32 were arriving  via  network  10,
the local address to be bound would be the host's address on
network 10, not network  46.   This  unfortunately,  is  not
always  the  case.   For  reasons too complicated to discuss
here, the local address bound may be appear to be chosen  at
random.   This  property  of local address binding will nor-
mally be invisible to users unless the foreign host does not
understand how to reach the address selected*.

_5._5.  _B_r_o_a_d_c_a_s_t_i_n_g _a_n_d _d_a_t_a_g_r_a_m _s_o_c_k_e_t_s

     By using a datagram  socket  it  is  possible  to  send
broadcast  packets  on many networks supported by the system
(the network itself must support the notion of broadcasting;
the  system  provides  no broadcast simulation in software).
Broadcast messages can place a high load on a network  since
they  force every host on the network to service them.  Con-
sequently, the ability to send broadcast  packets  has  been
limited to the super user.

     To send  a  broadcast  message,  an  Internet  datagram
socket should be created:

        s = socket(AF_INET, SOCK_DGRAM, 0);

and at least a port number should be bound to the socket:
_________________________
* For example, if network 46 were unknown to  the  host
on network 32, and the local address were bound to that
located on network 46, then even though a route between
the  two hosts existed through network 10, a connection
would fail.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 31 -            Advanced Topics


        sin.sin_family = AF_INET;
        sin.sin_addr.s_addr = INADDR_ANY;
        sin.sin_port = MYPORT;
        bind(s, (char *)&sin, sizeof (sin));

Then the message should be addressed as:

        dst.sin_family = AF_INET;
        dst.sin_addr.s_addr = INADDR_ANY;
        dst.sin_port = DESTPORT;

and, finally, a sendto call may be used:

        sendto(s, buf, buflen, 0, &dst, sizeof (dst));


     Received broadcast messages contain the senders address
and  port (datagram sockets are anchored before a message is
allowed to go out).

_5._6.  _S_i_g_n_a_l_s

     Two new signals have been added to the system which may
be  used  in conjunction with the interprocess communication
facilities.   The  SIGURG  signal  is  associated  with  the
existence  of  an ``urgent condition''.  The SIGIO signal is
used with ``interrupt driven  i/o''  (not  presently  imple-
mented).  SIGURG is currently supplied a process when out of
band data is present at a socket.  If multiple sockets  have
out  of  band  data  awaiting delivery, a select call may be
used to determine those sockets with such data.

     An old signal which is useful when constructing  server
processes is SIGCHLD.  This signal is delivered to a process
when any children processes have  changed  state.   Normally
servers  use  the  signal  to ``reap'' child processes after
exiting.  For example, the remote login server loop shown in
Figure 2 may be augmented as follows:


DRAFT of May 13, 1987                      Leffler/Fabry/Joy


4.2BSD IPC Primer          - 32 -            Advanced Topics


        int reaper();
         ...
        sigset(SIGCHLD, reaper);
        listen(f, 10);
        for (;;) {
                int g, len = sizeof (from);

                g = accept(f, &from, &len, 0);
                if (g < 0) {
                        if (errno != EINTR)
                                perror("rlogind: accept");
                        continue;
                }
                ...
        }
         ...
        #include <wait.h>
        reaper()
        {
                union wait status;

                while (wait3(&status, WNOHANG, 0) > 0)
                        ;
        }


     If the parent server process fails to  reap  its  chil-
dren, a large number of ``zombie'' processes may be created.


DRAFT of May 13, 1987                      Leffler/Fabry/Joy