Anatomy of a repository
=======================

:Author: Edward Z. Yang <ezyang@mit.edu>

Wizard is all about using Git's excellent directed acyclic graph model of
history to perform file-system merges as well as keep track of user
changes on top of ours.  If you are not familiar with the way Git internally
represents commits, I highly recommend reading
`Git for Computer Scientists <http://eagain.net/articles/git-for-computer-scientists/>`_
first.

Wizard takes a simplified view of upstream: from the point of view of the
pristine branch pointer, history should be a straight-forward progression of
versions.  Internal development history is discarded, and there is a one-to-one
mapping of releases and commits.

.. digraph:: pristine_dag

    node [shape=square]
    subgraph cluster_pristine {
        c -> b -> a
        a [label="1.0"]
        b [label="1.1"]
        c [label="2.0"]
        label = "pristine"
        color = white
    }

From here, we build "scriptsified" versions of the application, which
correspond to the master branch.  Every time upstream releases an update,
we import it into our pristine branch, and then merge the changes into
master.

.. digraph:: master_dag

    node [shape=square]
    subgraph cluster_master {
        cs -> bs -> as
        as [label="1.0-scripts"]
        bs [label="1.1-scripts"]
        cs [label="2.0-scripts"]
        label = "master"
        color = white
    }
    subgraph cluster_pristine {
        c -> b -> a
        a [label="1.0"]
        b [label="1.1"]
        c [label="2.0"]
        label = "pristine"
        color = white
    }
    as -> a
    bs -> b
    cs -> c

If there was an error in a deployed scripts version, you might see a structure
like this:

.. digraph:: scripts2_dag

    node [shape=square]
    subgraph cluster_master {
        cs -> bs2 -> bs -> as
        as [label="1.0-scripts"]
        bs [label="1.1-scripts",style=dashed]
        bs2 [label="1.1-scripts2"]
        cs [label="2.0-scripts"]
        label = "master"
        color = white
    }
    subgraph cluster_pristine {
        c -> b -> a
        a [label="1.0"]
        b [label="1.1"]
        c [label="2.0"]
        label = "pristine"
        color = white
    }
    as -> a
    bs -> b
    cs -> c

But such occasions should be rare.  In this particular graph, ``1.1-scripts`` was
defective, and ``1.1-scripts2`` was the fixed version.

There is another layer to this graph, which is not visible from the repository:
it contains the user's commits and is unique for each user.

.. digraph:: master_dag

    node [shape=square]
    subgraph cluster_user {
        node [shape=ellipse]
        u -> x -> y -> z
        u [style=filled,fillcolor=red,fontcolor=white,color=red]
        label = "master"
        color = white
    }
    subgraph cluster_master {
        bs -> as
        as [label="1.0-scripts"]
        bs [label="1.1-scripts"]
        color = white
    }
    subgraph cluster_pristine {
        b -> a
        a [label="1.0"]
        b [label="1.1"]
        label = "pristine"
        color = white
    }
    as -> a
    bs -> b
    x -> bs
    z -> as

The red node ``u`` represents uncommitted changes that may exist
in a user's checkout at any given time.  The untagged commits
``x``, ``y`` and ``z`` each have a particular story: ``z`` was
the commit generated when the install took place and the user's
specific configuration was versioned.  ``y`` was the pre-upgrade
commit generated so that we could then perform a merge; ``x`` is
the resulting merge commit.

All user repositories are initialized with ``--shared``, which means
they take no space footprint at the very beginning.  However, this also
makes it vitally important that the canonical repository in the scripts
locker not lose revisions.
