1  Outline

Several approaches to updating information in a DSA -
all have their problems

 o  The bad old ways

    --  Stop DSA - modify EDBs - restart DSA

    --  Modify EDBs while DSA still running

 o  An improvement

    --  The DM (Data Management) tools - management
        over DAP

Whatever the merits of the tools, we should be moving
towards doing all database management over DAP.


2  Are there really alternatives to data management
   tools?

 o  Updating the database files directly is
    unsatisfactory

    --  Have to do this on the directory machine

    --  Creating a ``fresh'' EDB from a data source
        only works if you have a single data source per
        EDB

    --  But if data in any EDB drawn from more than one
        source (maybe the user him/herself) ...  can't
        just regenerate EDB

    --  Could rebuild entire EDB database, but that
        requires merging tools if more than one data
        source is used.

    --  EDBs may be replaced with different database

 o  Conclusions


                           1


    --  Initially one can ``get away with'' data
        modification by editing EDB files

    --  Should use DAP to do this

3  Stop DSA - modify EDBs - restart DSA

 o  Simplest approach

 o  Requires DSA downtime

 o  Can thwart replication mechanism (which looks at
    the times EDBs last updated)


Before stopping a DSA, it is considerate to see if
there are currently any users.  Look at the file
.../quipu.log to see if there are any connections open.
(Note:  a DSA console is currently being written which
will display information on current connections.)

dir> ps x
  PID TT STAT  TIME COMMAND
  158 ?  I     4:25 /usr/etc/ros.quipu

dir> kill 158

dir> vi EDB (or similar)

dir> /usr/etc/ros.quipu >&/dev/null &

Can use a local ``test DSA'' to check the format of the
data.

4  Modify EDBs while DSA still running

 o  Doesn't require DSA downtime

 o  Following steps:
    Bind as the manager
    dsacontrol -lock "c=GB@o=ABC@ou=DEF"
    Edit or replace .../c=GB/o=ABC@ou=DEF/EDB
    dsacontrol -refresh "c=GB@o=ABC@ou=DEF"


                           2


    dsacontrol -unlock "c=GB@o=ABC@ou=DEF"

5  Some simple data management tools

 o  dm tools basically a prototype, to evaluate
    techniques, but nothing else has come along ...

 o  Doesn't attempt to solve all the problems (some of
    those not solved are enumerated later)

 o  Tools have been revitalised recently, although
    still have deficiencies

    --  I'll try and fix problems which are reported

    --  We are keen too learn more about the practical
        problems

    --  But, tools aren't officially supported - best
        effort only

 o  Handles some common problems quite well

    --  Adds new entries

    --  Modifies existing entries

    --  Deletes entries


6  How the tools work

 o  Uses very similar syntax to EDB files

 o  Special data file difference tool - dmdiff -
    produces a set of diffs between datafile.old and
    datafile.new

 o  Another tool - crmods - processes these differences
    into a shell/dish script

 o  Run the script - entries not in the Directory are
    added, those in the Directory are modified or


                           3


    deleted in line with the changes indicated

7  The bulk data format - dmformat

 o  Very similar to EDB format - differences are

_______________________________________________________________

|EDB________________________|DMFORMAT__________________________|_

|DIT hierarchy mapped       |Flat file with embedded info      |

|onto UNIX directory        |saying where entries should be    |

|structure                  |loaded in the DIT                 |

|                           |                                 |

|Files start with:          |File don't start with ...         |

|MASTER                     |                                 |

|date in UTC format         |                                 |

|                           |File contains "rootedAt" info     |

|                           |                                 |

|                           |Syntax includes mechanism for     |

|                           |specifying deletion of an entry / |

|                           |attribute                         |

|                           |                                 |

|Can only represent one set |Can represent information         |

|of sibling entries         |in an entire subtree or           |

|___________________________|collection_of_subtrees____________|


                           4


8  dmformat - syntax

Comments may be interspersed throughout the file.  A
comment line begins with a ``#'' character.
rootedAt indicates the parent node in the DIT for
subsequent entries in the file.  Separate a rootedAt
line from entries by one or more blank lines.
A set of entries follows a rootedAt line.  These are
formatted in the same way as in an EDB file:  i.e.  an
entry is a sequence of attribute type-value pairs,
where the first pair is the RDN for the entry.
Entries are separated from other entries by blank
lines.
In addition to the conventional syntax it is possible
to specify deletion of entries and attributes.

 o  Specify entry deletion by prefixing the RDN with
    the ``!''  character.

 o  Specify attribute value deletion by prefixing the
    attribute type=value line with a ``!''  character.

A file can contain information for many DIT subtrees by
including more rootedAt lines.

9  dmformat - example

#subsequent entries are relative to this point
# in the DIT
rootedAt= c=gb@o=UCL@ou=CS

# add this entry with these attributes
#   if it doesn't already exist
# try to add in these attribute values if
#   the entry already exists
cn=Paul Barker
surname=Barker
telephoneNumber=+44 71 380 7366
objectClass=organizationalPerson & quipuObject & ...

# Add the first telephone number attribute
# value and delete the second
cn=Steve Kille


                           5


telephoneNumber=+44 71 380 7294
!telephoneNumber=+44 71 380 1234

# Delete this entry
!cn=Colin Robbins
# don't have to supply attributes, but can
# if you like
!telephoneNumber=+44 71 387 7050 x3688

#subsequent entries are relative to this point
# in the DIT
rootedAt= c=gb@o=UCL@ou=Physics


10  Using the tools

 o  Can be used to load the database initially


    --  Produce a file `newfile'' of entries to be
        loaded

    --  Make a file of dish operations to effect the
        update
        crmods < newfile

    --  Apply the updates
        sh modfile

 o  Can be be used for subsequent amendments

    --  Create a file of difference data
        dmdiff oldfile newfile > difffile

    --  Create a shell/dish script to do the update
        crmods < difffile

    --  Apply the updates
        sh modfile


11  An outline Makefile for using the dm tools

Thanks to Colin Robbins who wrote this for UCL-CS

                           6


Processes data in source file dbdata, and applies
changes to the directory.

applied:    modfile
        sh modfile
        date +%y%m%d%H%M%SZ > applied

modfile:    diff.EDB
        ./crmods < $?

diff.EDB:    new.EDB
        ./dmdiff old.EDB new.EDB > $@

new.EDB:    dbdata
        -cp $@ old.EDB
        echo "rootedAt=c=GB@o=UCL@ou=CS" > $@
        echo >> $@
        sort $? | db2quipu >> $@


12  Creating organisational unit entries

 o  Many of the details will be site dependent,
    according to local data formats

 o  The following is an outline tool for creating
    organisational unit entries (assuming a single
    level of OUs under the organisation entry)

#!/bin/sh
# configure some local variables
DSANAME="c=GB@cn=..."
ORGDN="c=GB@o=..."
OBJCLASS="organizationalUnit & quipuNonLeafObject & \
pilotObject"

nawk ' BEGIN {
    printf "rootedAt= %s"n"n", orgdn
}

{
    # customise this according to local data
    print "ou= " $1
    print "telephoneNumber= " $2

                           7


    print "masterDSA= ", dsaname
    print "objectClass= ", objectclass
    print ""
} ' orgdn="$ORGDN" dsaname="$DSANAME" \
objectclass="$OBJCLASS" < oudata > oudata.dm

To load ou data using dm tools

crmods < oudata.dm
sh modfile


13  Modifying organisational unit entries

Use dmdiff to work out differences between old and new
versions of the oudata file.
Example output might look as follows:

rootedAt=c=GB@o=University College London

# ATTRIB CHANGES FOR THIS ENTRY
ou=test dept one
telephoneNumber=235
!telephoneNumber=234

# THIS IS A NEW ENTRY
ou=test dept three
telephoneNumber=789
masterDSA=c=GB@cn=Giant Armadillo
objectClass=organizationalUnit & ...

# THIS ENTRY NO LONGER EXTANT
!ou=test dept two
!telephoneNumber=456
!objectClass=organizationalUnit & ...
!masterDSA=c=GB@cn=Giant Armadillo

CAVEATS:

 o  Need to remove any entries in ``test dept two''
    first, as deletion of OUs only possible if they
    contain no children


                           8


 o  If the OUs have associated replication information,
    need to modify DSA entries - use dish to edit
    edbinfo attributes.  This could be automated ...

14  Preparing data for use with dm tools

The tools will work more efficiently if the following
guidelines are followed:


 o  Attribute type strings in dm files should be the
    same as those written out by dish when using
    ``showentry -edb''
    In practice this means using the abbreviated
    attribute names as specified in
    $(ETCDIR)/oidtable.at.  E.g.  use ``cn'' rather
    than ``commonName'', and ``mail'' rather than
    ``rfc822Mailbox''.

 o  Be consistent with capitalisation and case in
    general between dm files produced from the various
    sources.

 o  Attribute values with DN syntax should have the
    country name part represented in capitals, as in
    ``c=GB''. This is because Quipu always writes them
    out that way.  In all other cases, Quipu maintains
    the case with which entries' attributes are
    created.

15  Some specific shortcomings of the dm tools

 o  Scale - the shell script, modfile, which crmods
    produces, is very large for substantial amounts of
    data or data differences
    It may be more manageable to split data into a set
    of department files, as for EDBs, and apply set of
    updates.

 o  Matching of attribute types and attribute values is
    case-sensitive, whereas almost always it should be
    case-independent.
    In practice this is not too much of a problem


                           9


    --  At worst, it means that too many
        ``differences'' are discovered

    --  Quipu does the ``right thing'' anyway

 o  No explicit mechanism for renaming entries -
    achieved by deleting entry with old name and
    creating a new entry.
    You may thus discard attribute information which
    has been loaded from another source.


16  Some specific shortcomings of the dm tools (cont)

 o  Tools have no knowledge that entries may be
    mastered by more than one source.
    If an entry is deleted from one source, it will be
    deleted from the Directory even if the entry still
    exists in another source.  This may, or may not, be
    want you want!

 o  No explicit support for maintenance of seeAlso,
    roleOccupant and other attributes which have DN
    syntax.
    All necessary management to avoid ``dangling
    pointers'' must be achieved externally

 o  No support for management of aliases

 o  Updating over DAP can be rather slow for entries
    with large numbers of siblings (in Quipu terms, in
    a large EDB file).
    There is a solution - use the TURBO_DISK option.
    This makes use of GNU's gdbm package.  Consider
    this if you do a lot of updating and you have large
    EDB files.  Read about it in the manual.

17  General data management problems not catered for

 o  Management of data from multiple sources is very
    difficult - no support for merging data from
    different sources, or for consistent deletion.


                          10


 o  No framework for discrimination between quality of
    data sources - this must be handled manually

 o  Relying on diffs not really satisfactory - need to
    rebuild database periodically from source data

 o  Naming of entries

18  Naming entries

dm tools offer no help with naming to person
maintaining the Directory database.  This administrator
should be aware of at least the following problems

 o  Two sources may name an entity differently


    source one: P Barker
    source two: Paul Barker

 o  Need to be careful that no duplicate RDNs are
    formed when processing the source data into EDBs or
    dm files.

    --  If building EDBs, Quipu will detect multiple
        RDNs as it loads its database.

    --  dm tools will perform multiple updates on a
        single entry

 o  Even in case where one is loading from a single
    source, the name which is systematically derivable
    may be unsatisfactory.  E.g.
    PHYS & ASTRO
    rather than
    Physics and Astronomy

19  Naming entries (cont)

 o  A source's vies of what constitutes a department
    may be parochial, suiting particular requirements.
    For example, the UCL telephone directory database
    has the following two departments


                          11


   BIOLOGY (DARWIN)
   BIOLOGY (MEDAWAR)

   whereas the University view, which must be
   represented, is that there is just a single
   ``Biology'' department

o  Need to be careful when joining departments in this
   way that no RDN clashes occur.  If they do occur, a
   solution is to name entries with multiple value
   RDN.
   cn=Fred Bloggs%ou=Biology (Medawar)


                         12