1 Outline Several approaches to updating information in a DSA - all have their problems o The bad old ways -- Stop DSA - modify EDBs - restart DSA -- Modify EDBs while DSA still running o An improvement -- The DM (Data Management) tools - management over DAP Whatever the merits of the tools, we should be moving towards doing all database management over DAP. 2 Are there really alternatives to data management tools? o Updating the database files directly is unsatisfactory -- Have to do this on the directory machine -- Creating a ``fresh'' EDB from a data source only works if you have a single data source per EDB -- But if data in any EDB drawn from more than one source (maybe the user him/herself) ... can't just regenerate EDB -- Could rebuild entire EDB database, but that requires merging tools if more than one data source is used. -- EDBs may be replaced with different database o Conclusions 1 -- Initially one can ``get away with'' data modification by editing EDB files -- Should use DAP to do this 3 Stop DSA - modify EDBs - restart DSA o Simplest approach o Requires DSA downtime o Can thwart replication mechanism (which looks at the times EDBs last updated) Before stopping a DSA, it is considerate to see if there are currently any users. Look at the file .../quipu.log to see if there are any connections open. (Note: a DSA console is currently being written which will display information on current connections.) dir> ps x PID TT STAT TIME COMMAND 158 ? I 4:25 /usr/etc/ros.quipu dir> kill 158 dir> vi EDB (or similar) dir> /usr/etc/ros.quipu >&/dev/null & Can use a local ``test DSA'' to check the format of the data. 4 Modify EDBs while DSA still running o Doesn't require DSA downtime o Following steps: Bind as the manager dsacontrol -lock "c=GB@o=ABC@ou=DEF" Edit or replace .../c=GB/o=ABC@ou=DEF/EDB dsacontrol -refresh "c=GB@o=ABC@ou=DEF" 2 dsacontrol -unlock "c=GB@o=ABC@ou=DEF" 5 Some simple data management tools o dm tools basically a prototype, to evaluate techniques, but nothing else has come along ... o Doesn't attempt to solve all the problems (some of those not solved are enumerated later) o Tools have been revitalised recently, although still have deficiencies -- I'll try and fix problems which are reported -- We are keen too learn more about the practical problems -- But, tools aren't officially supported - best effort only o Handles some common problems quite well -- Adds new entries -- Modifies existing entries -- Deletes entries 6 How the tools work o Uses very similar syntax to EDB files o Special data file difference tool - dmdiff - produces a set of diffs between datafile.old and datafile.new o Another tool - crmods - processes these differences into a shell/dish script o Run the script - entries not in the Directory are added, those in the Directory are modified or 3 deleted in line with the changes indicated 7 The bulk data format - dmformat o Very similar to EDB format - differences are _______________________________________________________________ |EDB________________________|DMFORMAT__________________________|_ |DIT hierarchy mapped |Flat file with embedded info | |onto UNIX directory |saying where entries should be | |structure |loaded in the DIT | | | | |Files start with: |File don't start with ... | |MASTER | | |date in UTC format | | | |File contains "rootedAt" info | | | | | |Syntax includes mechanism for | | |specifying deletion of an entry / | | |attribute | | | | |Can only represent one set |Can represent information | |of sibling entries |in an entire subtree or | |___________________________|collection_of_subtrees____________| 4 8 dmformat - syntax Comments may be interspersed throughout the file. A comment line begins with a ``#'' character. rootedAt indicates the parent node in the DIT for subsequent entries in the file. Separate a rootedAt line from entries by one or more blank lines. A set of entries follows a rootedAt line. These are formatted in the same way as in an EDB file: i.e. an entry is a sequence of attribute type-value pairs, where the first pair is the RDN for the entry. Entries are separated from other entries by blank lines. In addition to the conventional syntax it is possible to specify deletion of entries and attributes. o Specify entry deletion by prefixing the RDN with the ``!'' character. o Specify attribute value deletion by prefixing the attribute type=value line with a ``!'' character. A file can contain information for many DIT subtrees by including more rootedAt lines. 9 dmformat - example #subsequent entries are relative to this point # in the DIT rootedAt= c=gb@o=UCL@ou=CS # add this entry with these attributes # if it doesn't already exist # try to add in these attribute values if # the entry already exists cn=Paul Barker surname=Barker telephoneNumber=+44 71 380 7366 objectClass=organizationalPerson & quipuObject & ... # Add the first telephone number attribute # value and delete the second cn=Steve Kille 5 telephoneNumber=+44 71 380 7294 !telephoneNumber=+44 71 380 1234 # Delete this entry !cn=Colin Robbins # don't have to supply attributes, but can # if you like !telephoneNumber=+44 71 387 7050 x3688 #subsequent entries are relative to this point # in the DIT rootedAt= c=gb@o=UCL@ou=Physics 10 Using the tools o Can be used to load the database initially -- Produce a file `newfile'' of entries to be loaded -- Make a file of dish operations to effect the update crmods < newfile -- Apply the updates sh modfile o Can be be used for subsequent amendments -- Create a file of difference data dmdiff oldfile newfile > difffile -- Create a shell/dish script to do the update crmods < difffile -- Apply the updates sh modfile 11 An outline Makefile for using the dm tools Thanks to Colin Robbins who wrote this for UCL-CS 6 Processes data in source file dbdata, and applies changes to the directory. applied: modfile sh modfile date +%y%m%d%H%M%SZ > applied modfile: diff.EDB ./crmods < $? diff.EDB: new.EDB ./dmdiff old.EDB new.EDB > $@ new.EDB: dbdata -cp $@ old.EDB echo "rootedAt=c=GB@o=UCL@ou=CS" > $@ echo >> $@ sort $? | db2quipu >> $@ 12 Creating organisational unit entries o Many of the details will be site dependent, according to local data formats o The following is an outline tool for creating organisational unit entries (assuming a single level of OUs under the organisation entry) #!/bin/sh # configure some local variables DSANAME="c=GB@cn=..." ORGDN="c=GB@o=..." OBJCLASS="organizationalUnit & quipuNonLeafObject & \ pilotObject" nawk ' BEGIN { printf "rootedAt= %s"n"n", orgdn } { # customise this according to local data print "ou= " $1 print "telephoneNumber= " $2 7 print "masterDSA= ", dsaname print "objectClass= ", objectclass print "" } ' orgdn="$ORGDN" dsaname="$DSANAME" \ objectclass="$OBJCLASS" < oudata > oudata.dm To load ou data using dm tools crmods < oudata.dm sh modfile 13 Modifying organisational unit entries Use dmdiff to work out differences between old and new versions of the oudata file. Example output might look as follows: rootedAt=c=GB@o=University College London # ATTRIB CHANGES FOR THIS ENTRY ou=test dept one telephoneNumber=235 !telephoneNumber=234 # THIS IS A NEW ENTRY ou=test dept three telephoneNumber=789 masterDSA=c=GB@cn=Giant Armadillo objectClass=organizationalUnit & ... # THIS ENTRY NO LONGER EXTANT !ou=test dept two !telephoneNumber=456 !objectClass=organizationalUnit & ... !masterDSA=c=GB@cn=Giant Armadillo CAVEATS: o Need to remove any entries in ``test dept two'' first, as deletion of OUs only possible if they contain no children 8 o If the OUs have associated replication information, need to modify DSA entries - use dish to edit edbinfo attributes. This could be automated ... 14 Preparing data for use with dm tools The tools will work more efficiently if the following guidelines are followed: o Attribute type strings in dm files should be the same as those written out by dish when using ``showentry -edb'' In practice this means using the abbreviated attribute names as specified in $(ETCDIR)/oidtable.at. E.g. use ``cn'' rather than ``commonName'', and ``mail'' rather than ``rfc822Mailbox''. o Be consistent with capitalisation and case in general between dm files produced from the various sources. o Attribute values with DN syntax should have the country name part represented in capitals, as in ``c=GB''. This is because Quipu always writes them out that way. In all other cases, Quipu maintains the case with which entries' attributes are created. 15 Some specific shortcomings of the dm tools o Scale - the shell script, modfile, which crmods produces, is very large for substantial amounts of data or data differences It may be more manageable to split data into a set of department files, as for EDBs, and apply set of updates. o Matching of attribute types and attribute values is case-sensitive, whereas almost always it should be case-independent. In practice this is not too much of a problem 9 -- At worst, it means that too many ``differences'' are discovered -- Quipu does the ``right thing'' anyway o No explicit mechanism for renaming entries - achieved by deleting entry with old name and creating a new entry. You may thus discard attribute information which has been loaded from another source. 16 Some specific shortcomings of the dm tools (cont) o Tools have no knowledge that entries may be mastered by more than one source. If an entry is deleted from one source, it will be deleted from the Directory even if the entry still exists in another source. This may, or may not, be want you want! o No explicit support for maintenance of seeAlso, roleOccupant and other attributes which have DN syntax. All necessary management to avoid ``dangling pointers'' must be achieved externally o No support for management of aliases o Updating over DAP can be rather slow for entries with large numbers of siblings (in Quipu terms, in a large EDB file). There is a solution - use the TURBO_DISK option. This makes use of GNU's gdbm package. Consider this if you do a lot of updating and you have large EDB files. Read about it in the manual. 17 General data management problems not catered for o Management of data from multiple sources is very difficult - no support for merging data from different sources, or for consistent deletion. 10 o No framework for discrimination between quality of data sources - this must be handled manually o Relying on diffs not really satisfactory - need to rebuild database periodically from source data o Naming of entries 18 Naming entries dm tools offer no help with naming to person maintaining the Directory database. This administrator should be aware of at least the following problems o Two sources may name an entity differently source one: P Barker source two: Paul Barker o Need to be careful that no duplicate RDNs are formed when processing the source data into EDBs or dm files. -- If building EDBs, Quipu will detect multiple RDNs as it loads its database. -- dm tools will perform multiple updates on a single entry o Even in case where one is loading from a single source, the name which is systematically derivable may be unsatisfactory. E.g. PHYS & ASTRO rather than Physics and Astronomy 19 Naming entries (cont) o A source's vies of what constitutes a department may be parochial, suiting particular requirements. For example, the UCL telephone directory database has the following two departments 11 BIOLOGY (DARWIN) BIOLOGY (MEDAWAR) whereas the University view, which must be represented, is that there is just a single ``Biology'' department o Need to be careful when joining departments in this way that no RDN clashes occur. If they do occur, a solution is to name entries with multiple value RDN. cn=Fred Bloggs%ou=Biology (Medawar) 12