This is monotone.info, produced by makeinfo version 4.8 from
monotone.texi.

INFO-DIR-SECTION Programming
START-INFO-DIR-ENTRY
* monotone: (monotone).         Monotone version control system
END-INFO-DIR-ENTRY


File: monotone.info,  Node: Additional Lua Functions,  Prev: Hooks,  Up: Hook Reference

6.2 Additional Lua Functions
============================

This section documents the additional Lua functions made available to
hook writers.

`existonpath(POSSIBLE_COMMAND)'
     This function receives a string containing the name of an external
     program and returns 0 if it exists on path and is executable, -1
     otherwise.  As an example, `existonpath("xxdiff")' returns 0 if the
     program xxdiff is available.  On Windows, this function
     automatically appends ".exe" to the program name. In the previous
     example, `existonpath' would search for "xxdiff.exe".

`get_confdir()'
     Returns the path to the configuration directory, either implied or
     given with `--confdir'.

`get_ostype()'
     Returns the operating system flavor as a string.

`guess_binary_file_contents(FILESPEC)'
     Returns true if the file appears to be binary, i.e. contains one or
     more of the following characters:
          0x00 thru 0x06
          0x0E thru 0x1a
          0x1c thru 0x1f

`include(SCRIPTFILE)'
     This function tries to load and execute the script contained into
     scriptfile.  It returns true for success and false if there is an
     error.

`includedir(SCRIPTPATH)'
     This function loads and executes in alphabetical order all the
     scripts contained into the directory scriptpath.  If one of the
     scripts has an error, the functions doesn't process the remaining
     scripts and immediately returns false.

`includedirpattern(SCRIPTPATH, PATTERN)'
     This function loads and executes in alphabetical order all the
     scripts contained into the directory scriptpath that match the
     given pattern.  If one of the scripts has an error, the functions
     doesn't process the remaining scripts and immediately returns
     false.

`is_executable(FILESPEC)'
     This function returns true if the file is executable, false
     otherwise.  On Windows this function returns always false.

`kill(PID [, SIGNAL])'
     This function calls the kill() C library function on POSIX systems
     and TerminateProcess on Win32 (in that case PID is the process
     handle).  If the optional SIGNAL parameter is missing, SIGTERM
     will be used.  Returns 0 on success, -1 on error.

`make_executable(FILESPEC)'
     This function marks the named file as executable.  On Windows has
     no effect.

`match(GLOB, STRING)'
     Returns true if GLOB matches STR, return false otherwise.

`mkstemp(TEMPLATE)'
     Like its C library counterpart,  mkstemp creates a unique name and
     returns a file descriptor for the newly created file.  The value
     of template should be a pointer to a character buffer loaded with
     a null-terminated string that consists of contiguous, legal file
     ad path name characters followed by six Xs.  The function mkstemp
     replaces the Xs by an alpha-numeric sequence that is chosen to
     ensure that no file in the chosen directory has that name.
     Furthermore, subsequent calls to mkstemp within the same process
     each yield different file names.  Unlike other implementations,
     monotone mkstemp allows the template string to contain a complete
     path, not only a filename, allowing users to create temporary
     files outside the current directory.

     *Important notice:*
     To create a temporary file, you must use the `temp_file()'
     function, unless you need to run monotone with the `--nostd'
     option.  `temp_file()' builds on `mkstemp()' and creates a file in
     the standard TMP/TEMP directories.  For the definition of
     `temp_file()', see *Note Default hooks::.

`parse_basic_io(DATA)'
     Parse the string DATA, which should be in basic_io format. It
     returns nil if it can't parse the string; otherwise it returns a
     table. This will be a list of all statements, with each entry
     being a table having a "name" element that is the symbol beginning
     the statement and a "values" element that is a list of all the
     arguments.

     For example, given this as input:

          thingy "foo" "bar"
          thingy "baz"
          spork
          frob "oops"

     The output table will be:
          {
             1 = { name = "thingy", args = { 1 = "foo", 2 = "bar" } },
             2 = { name = "thingy", args = { 1 = "baz" } },
             3 = { name = "spork", args = { } },
             4 = { name = "frob", args = { 1 = "oops" } }
          }

`regex.search(REGEXP, STRING)'
     Returns true if a match for REGEXP is found in STR, return false
     otherwise.

`sleep(SECONDS)'
     Makes the calling process sleep for the specified number of
     seconds.

`spawn(EXECUTABLE [, ARGS ...])'
     Starts the named executable with the given arguments.  Returns the
     process PID on POSIX systems, the process handle on Win32 or -1 if
     there was an error.  Calls fork/execvp on POSIX, CreateProcess on
     Win32.

     *Important notice:*
     To spawn a process and wait for its completion, use the `execute()'
     function, unless you need to run monotone with the `--nostd'
     option.  `execute()' builds on `spawn()' and `wait()' in a
     standardized way.

`spawn_pipe(EXECUTABLE [, ARGS ...])'
     Like spawn(), but returns three values, where the first two are the
     subprocess' standard input and standard output, and the last is the
     process PID on POSIX systems, the process handle on Win32 or -1 if
     there was an error.

`spawn_redirected(INFILE, OUTFILE, ERRFILE, EXECUTABLE [, ARGS ...])'
     Like spawn(), but with standard input, standard output and standard
     error redirected to the given files.

`wait(PID)'
     Wait until the process with given PID (process handle on Win32)
     exits.  Returns two values: a result value and the exit code of
     the waited-for process.  The exit code is meaningful only if the
     result value is 0.



File: monotone.info,  Node: Special Topics,  Next: Default hooks,  Prev: Hook Reference,  Up: Top

7 Special Topics
****************

This chapter describes some "special" issues which are not directly
related to monotone's _use_, but which are occasionally of interest to
people researching monotone or trying to learn the specifics of how it
works. Most users can ignore these sections.

* Menu:

* Internationalization::   Using monotone in non-English locales.
* Hash Integrity::         Notes on probability and failure.
* Rebuilding ancestry::    In case of corruption.
* Mark-Merge::             The merging algorithm used by Monotone.


File: monotone.info,  Node: Internationalization,  Next: Hash Integrity,  Up: Special Topics

7.1 Internationalization
========================

Monotone initially dealt with only ASCII characters, in file path
names, certificate names, key names, and packets. Some conservative
extensions are provided to permit internationalized use. These
extensions can be summarized as follows:

   * Monotone uses GNU gettext to provide localized progress and error
     messages. Translations may or may not exist for your locale, but
     the infrastructure is present to add them.

   * All command-line arguments are mapped from your local character
     set to UTF-8 before processing. This means that monotone can _only_
     handle key names, file names and certificate names which map
     cleanly into UTF-8.

   * Monotone's control files are stored in UTF-8. This includes:
     revisions and manifests, both inside the database and when written
     to the `_MTN/' directory of the workspace; the `_MTN/options' and
     `_MTN/revision' files. Converting these files to any other
     character set will cause monotone to break; do not do so.

   * File path names in the workspace are converted to the locale's
     character set (determined via the LANG or CHARSET environment
     variables) before monotone interacts with the file system. If you
     are accustomed to being able to use file names in your locale's
     character set, this should "just work" with monotone.

   * Key and cert names, and similar "name-like" entities are subject to
     some cleaning and normalization, and conversion into network-safe
     subsets of ASCII (typically ACE). Generally, you should be able to
     use "sensible" strings in your locale's character set as names,
     but they may appear mangled or escaped in certain contexts such as
     network transmission.

   * Monotone's transmission and storage forms are otherwise unchanged.
     Packets and database contents are 7-bit clean ASCII.


   The remainder of this section is a precise specification of
monotone's internationalization behavior.

General Terms
=============

Character set conversion
     The process of mapping a string of bytes representing wide
     characters from one encoding to another. Per-file character set
     conversions are specified by a Lua hook `get_charset_conv' which
     takes a filename and returns a table of two strings: the first
     represents the "internal" (database) charset, the second
     represents the "external" (file system) charset.

LDH
     Letters, digits, and hyphen: the set of ASCII bytes `0x2D',
     `0x30..0x39', `0x41..0x5A', and `0x61..0x7A'.

stringprep
     RFC 3454, a general framework for mapping, normalizing, prohibiting
     and bidirectionality checking for international names prior to use
     in public network protocols.

nameprep
     RFC 3491, a specific profile of stringprep, used for preparing
     international domain names (IDNs)

punycode
     RFC 3492, a "bootstring" encoding of Unicode into ASCII.

IDNA
     RFC 3490, international domain names for applications, a
     combination of the above technologies (nameprep, punycoding,
     limiting to LDH characters) to form a specific "ASCII compatible
     encoding" (ACE) of Unicode, signified by the presence of an
     "unlikely" ACE prefix string "xn-". IDNA is intended to make it
     possible to use Unicode relatively "safely" over legacy
     ASCII-based applications. the general picture of an IDNA string is
     this:

                {ACE-prefix}{LDH-sanitized(punycode(nameprep(UTF-8-string)))}

     It is important to understand that IDNA encoding does _not_
     preserve the input string: it both prohibits a wide variety of
     possible strings and normalizes non-equal strings to supposedly
     "equivalent" forms.

     By default, monotone does _not_ decode IDNA when printing to the
     console (IDNA names are ASCII, which is a subset of UTF-8, so this
     normal form conversion can still apply, albeit oddly). this
     behavior is to protect users against security problems associated
     with malicious use of "similar-looking" characters. If the hook
     `display_decoded_idna' returns true, IDNA names are decoded for
     display.


Filenames
=========

   * Filenames are subject to normal form conversion.

   * Filenames are subject to an additional normal form stage which
     adjusts for platform name semantics, for example changing the
     Windows `0x5C' '\' path separator to `0x2F' '/'. This extra
     processing is performed by boost::filesystem.

   * FIXME: Monotone does not properly handle case insensitivity on
     Windows.

   * A filename (in normal form) is constrained to be a nonempty
     sequence of path components, separated by byte `0x2F' (ASCII / ),
     and without a leading or trailing `0x2F'.

   * A path component is a nonempty sequence of any UTF-8 character
     codes except the path separator byte `0x2F' and any ASCII "control
     codes" (`0x00..0x1F' and `0x7F').

   * The path components "." and ".." are prohibited.

   * Manifests and revisions are constructed from the normal form
     (UTF-8). The LC_COLLATE locale category is _not_ used to sort
     manifest or revision entries.


File contents
=============

   * Files are subject to character set conversion and line ending
     conversion.

   * File SHA1 values are calculated from the internal form of the
     conversions. If the external form of a file differs from the
     internal form, running a 3rd party program such as `sha1sum' will
     produce different results than those entries shown in a
     corresponding manifest.


UI messages
===========

UI messages are displayed via calls to `gettext()'.

Host names
==========

Host names are read on the command-line and subject to normal form
conversion. Host names are then split at `0x2E' (ASCII '.'), each
component is subject to IDNA encoding, and the components are rejoined.

   After processing, host names are stored internally as ASCII. The
invariant is that a host name inside monotone contains only sequences
of LDH separated by `0x2E'.

Cert names
==========

Read on the command line and subject to normal form conversion and IDNA
encoding as a single component. The invariant is that a cert name
inside monotone is a single LDH ASCII string.

Cert values
===========

Cert values may be either text or binary, depending on the return value
of the hook `cert_is_binary'. If binary, the cert value is never
printed to the screen (the literal string "<binary>" is displayed,
instead), and is never subjected to line ending or character
conversion. If text, the cert value is subject to normal form
conversion, as well as having all UTF-8 codes corresponding to ASCII
control codes (`0x0..0x1F' and `0x7F') prohibited in the normal form,
except `0x0A' (ASCII LF).

Var domains
===========

Read on the command line and subject to normal form conversion and IDNA
encoding as a single component. The invariant is that a var domain
inside monotone is a single LDH ASCII string.

Var names and values
====================

Var names and values are assumed to be text, and subject to normal form
conversion.

Key names
=========

Read on the command line and subject to normal form conversion and IDNA
encoding as an email address (split and joined at '.' and '@'
characters). The invariant is that a key name inside monotone contains
only LDH, `0x2E' (ASCII '.') and `0x40' (ASCII '@') characters.

Packets
=======

Packets are 7-bit ASCII. The characters permitted in packets are the
union of these character sets:

   * The 65 characters of base64 encoding (64 coding + "=" pad).

   * The 16 characters of hex encoding.

   * LDH, '@' and '.' characters, as required for key and cert names.

   * '[' and ']', the packet delimiters.

   * ASCII codes 0x0D (CR), 0x0A (LF), 0x09 (HT), and 0x20 (SP).


File: monotone.info,  Node: Hash Integrity,  Next: Rebuilding ancestry,  Prev: Internationalization,  Up: Special Topics

7.2 Hash Integrity
==================

Some proponents of a competing, proprietary version control system have
suggested, in a usenix paper
(http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/),
that the use of a cryptographic hash function such as SHA1 as an
identifier for a version is unacceptably unsafe. This section addresses
the argument presented in that paper and describes monotone's
additional precautions.

   To summarize our position:
   * the analysis in the paper is wrong,

   * even if it were right, monotone is sufficiently safe.

The analysis is wrong
=====================

The paper displays a fundamental lack of understanding about what a
_cryptographic_ hash function is, and how it differs from a normal hash
function. Furthermore it confuses accidental collision with attack
scenarios, and mixes up its analysis of the risk involved in each. We
will try to untangle these issues here.

   A cryptographic hash function such as SHA1 is more than just a
uniform spread of inputs to an output range. Rather, it must be
designed to withstand attempts at:

   * reversal: deriving an input value from the output

   * collision: finding two different inputs which hash to the same
     output

   Collision is the problem the paper is concerned with. Formally, an
n-bit cryptographic hash should cost 2^n work units to collide against
a given value, and sqrt(2^n) tries to find a random pair of colliding
values. This latter probability is sometimes called the hash's
"birthday paradox probability".

Accidental collision
--------------------

One way of measuring these bounds is by measuring how single-bit
changes in the input affect bits in the hash output. The SHA1 hash has
a strong _avalanche property_, which means that flipping _any one bit_
in the input will cause on average half the 160 bits in the output code
to change. The fanciful VAL1 hash presented in the paper does not have
such a property -- flipping its first bit when all the rest are zero
causes _no change_ to any of the 160 output bits -- and is completely
unsuited for use as a _cryptographic hash_, regardless of the general
shape of its probability distribution.

   The paper also suggests that birthday paradox probability cannot be
used to measure the chance of accidental SHA1 collision on "real
inputs", because birthday paradox probability assumes a uniformly
random sample and "real inputs" are not uniformly random. The paper is
wrong: the inputs to SHA1 are not what is being measured (and in any
case can be arbitrarily long); the collision probability being measured
is of _output space_. On output space, the hash is designed to produce
uniformly random spread, even given nearly identical inputs. In other
words, it is _a primary design criterion_ of such a hash that a
birthday paradox probability is a valid approximation of its collision
probability.

   The paper's characterization of risk when hashing "non-random
inputs" is similarly deceptive. It presents a fanciful case of a
program which is _storing_ every possible 2kb block in a file system
addressed by SHA1 (the program is trying to find a SHA1 collision).
While this scenario _will_ very likely encounter a collision
_somewhere_ in the course of storing all such blocks, the paper
neglects to mention that we only expect it to collide after storing
about 2^80 of the 2^16384 possible such blocks (not to mention the
requirements for compute time to search, or disk space to store 2^80
2kb blocks).

   Noting that monotone can only store 2^41 bytes in a database, and
thus probably some lower number (say 2^32 or so) active rows, we
consider such birthday paradox probability well out of practical sight.
Perhaps it will be a serious concern when multi-yottabyte hard disks
are common.

Collision attacks
-----------------

Setting aside accidental collisions, then, the paper's underlying theme
of vulnerability rests on the assertion that someone will break SHA1.
Breaking a cryptographic hash usually means finding a way to collide it
trivially. While we note that SHA1 has in fact resisted attempts at
breaking for 8 years already, we cannot say that it will last forever.
Someone might break it. We can say, however, that finding a way to
trivially collide it only changes the resistance to _active attack_,
rather than the behavior of the hash on benign inputs.

   Therefore the vulnerability is not that the hash might suddenly cease
to address benign blocks well, but merely that additional security
precautions might become a requirement to ensure that blocks are
benign, rather than malicious. The paper fails to make this
distinction, suggesting that a hash becomes "unusable" when it is
broken. This is plainly not true, as a number of systems continue to
get useful low collision hashing behavior -- just not good security
behavior -- out of "broken" cryptographic hashes such as MD4.

Monotone is probably safe anyways
=================================

Perhaps our arguments above are unconvincing, or perhaps you are the
sort of person who thinks that practice never lines up with theory.
Fair enough. Below we present _practical_ procedures you can follow to
compensate for the supposed threats presented in the paper.

Collision attacks
-----------------

A successful collision attack on SHA1, as mentioned, does not disrupt
the _probability_ features of SHA1 on benign blocks. So if, at any
time, you believe SHA1 is "broken", it does _not_ mean that you cannot
use it for your work with monotone. It means, rather, that you cannot
base your _trust_ on SHA1 values anymore. You must trust who you
communicate with.

   The way around this is reasonably simple: if you do not trust SHA1
to prevent malicious blocks from slipping into your communications, you
can always augment it by enclosing your communications in more
security, such as tunnels or additional signatures on your email posts.
If you choose to do this, you will still have the benefit of
self-identifying blocks, you will simply cease to trust such blocks
unless they come with additional authentication information.

   If in the future SHA1 (or, indeed, RSA) becomes accepted as broken
we will naturally upgrade monotone to a newer hash or public key
scheme, and provide migration commands to recalculate existing
databases based on the new algorithm.

   Similarly, if you do not trust our vigilance in keeping up to date
with cryptography literature, you can modify monotone to use any
stronger hash you like, at the cost of isolating your own
communications to a group using the modified version. Monotone is free
software, and runs atop `botan', so it is both legal and relatively
simple to change it to use some other algorithm.


File: monotone.info,  Node: Rebuilding ancestry,  Next: Mark-Merge,  Prev: Hash Integrity,  Up: Special Topics

7.3 Rebuilding ancestry
=======================

As described in *Note Historical records::, monotone revisions contain
the SHA1 hashes of their predecessors, which in turn contain the SHA1
hashes of _their_ predecessors, and so on until the beginning of
history.  This means that it is _mathematically impossible_ to modify
the history of a revision, without some way to defeat SHA1.  This is
generally a good thing; having immutable history is the point of a
version control system, after all, and it turns out to be very
important to building a _distributed_ version control system like
monotone.

   It does have one unfortunate consequence, though.  It means that in
the rare occasion where one _needs_ to change a historical revision, it
will change the SHA1 of that revision, which will change the text of
its children, which will change their SHA1s, and so on; basically the
entire history graph will diverge from that point (invalidating all
certs in the process).

   In practice there are two situations where this might be necessary:
   * bugs: monotone has occasionally allowed nonsense, uninterpretable
     changesets to be generated and stored in the database, and this
     was not detected until further work had been based off of them.

   * advances in crypto: if or when SHA1 is broken, we will need to
     migrate to a different secure hash.
   Obviously, we hope neither of these things will happen, and we've
taken lots of precautions against the first recurring; but it is better
to be prepared.

   If either of these events occur, we will provide migration commands
and explain how to use them for the situation in question; this much is
necessarily somewhat unpredictable.  In the past we've used the (now
defunct) `db rebuild' command, and more recently the `db rosterify'
command, for such changes as monotone developed.  These commands were
used to recreate revisions with new formats.  Because the revision id's
changed, all the existing certs that you trust also must be reissued,
signed with your key.(1)

   While such commands can reconstruct the ancestry graph in _your_
database, there are practical problems which arise when working in a
distributed work group.  For example, suppose our group consists of the
fictional developers Jim and Beth, and they need to rebuild their
ancestry graph. Jim performs a rebuild, and sends Beth an email telling
her that he has done so, but the email gets caught by Beth's spam
filter, she doesn't see it, and she blithely syncs her database with
Jim's. This creates a problem: Jim and Beth have combined the
pre-rebuild and post-rebuild databases.  Their databases now contain two
complete, parallel (but possibly overlapping) copies of their project's
ancestry.  The "bad" old revisions that they were trying to get rid of
are still there, mixed up with the "good" new revisions.

   To prevent such messy situations, monotone keeps a table of branch
"epochs" in each database. An epoch is just a large bit string
associated with a branch. Initially each branch's epoch is zero. Most
monotone commands ignore epochs; they are relevant in only two
circumstances:

   * When monotone rebuilds ancestry, it generates a new _random_ epoch
     for each branch in the database.

   * When monotone runs netsync between databases, it checks to make
     sure that all branches involved in the synchronization have the
     same epochs. If any epochs differ, the netsync is aborted with no
     changes made to either database. If either side is seeing a branch
     for the first time, it adopts the epoch of the other side.


   Thus, when a user rebuilds their ancestry graph, they select a new
epoch and thus effectively disassociate with the group of colleagues
they had previously been communicating with. Other members of that
group can then decide whether to follow the rebuild user into a new
group -- by pulling the newly rebuilt ancestry -- or to remain behind
in the old group.

   In our example, if Jim and Beth have epochs, Jim's rebuild creates a
new epoch for their branch, in his database.  This causes monotone to
reject netsync operations between Jim and Beth; it doesn't matter if
Beth loses Jim's email. When she tries to synchronize with him, she
receives an error message indicating that the epoch does not match. She
must then discuss the matter with Jim and settle on a new course of
action -- probably pulling Jim's database into a fresh database on
Beth's end - before future synchronizations will succeed.

Best practices
==============

The previous section described the theory and rationale behind rebuilds
and epochs.  Here we discuss the practical consequences of that
discussion.

   If you decide you must rebuild your ancestry graph -- generally by
announcement of a bug from the monotone developers -- the first thing
to do is get everyone to sync their changes with the central server; if
people have unshared changes when the database is rebuilt, they will
have trouble sharing them afterwards.

   Next, the project should pick a designated person to take down the
netsync server, rebuild their database, and put the server back up with
the rebuilt ancestry in it.  Everybody else should then pull this
history into a fresh database, check out again from this database, and
continue working as normal.

   In complicated situations, where people have private branches, or
ancestries cross organizational boundaries, matters are more complex.
The basic approach is to do a local rebuild, then after carefully
examining the new revision IDs to convince yourself that the rebuilt
graph is the same as the upstream subgraph, use the special `db epoch'
commands to force your local epochs to match the upstream ones.  (You
may also want to do some fiddling with certs, to avoid getting
duplicate copies of all of them; if this situation ever arises in real
life we'll figure out how exactly that should work.)  Be very careful
when doing this; you're explicitly telling monotone to let you shoot
yourself in the foot, and it will let you.

   Fortunately, this process should be extremely rare; with luck, it
will never happen at all.  But this way we're prepared.

   ---------- Footnotes ----------

   (1) Regardless of who originally signed the certs, after the rebuild
they will be signed by you.  This means you should be somewhat careful
when rebuilding, but it is unavoidable -- if you could sign with other
people's keys, that would be a rather serious security problem!


File: monotone.info,  Node: Mark-Merge,  Prev: Rebuilding ancestry,  Up: Special Topics

7.4 Mark-Merge
==============

Monotone makes use of the Mark-Merge (also known as *-merge) algorithm.
The emails reproduced below document the algorithm. Further information
can be found at revctrl.org (http://revctrl.org/MarkMerge).

Initial mark-merge proposal
---------------------------


From: Nathaniel Smith <njs <at> pobox.com>
Subject: [cdv-devel] more merging stuff (bit long...)
Newsgroups: gmane.comp.version-control.codeville.devel, gmane.comp.version-control.monotone.devel
Date: 2005-08-06 09:08:09 GMT

I set myself a toy problem a few days ago: is there a really, truly,
right way to merge two heads of an arbitrary DAG, when the object
being merged is as simple as possible: a single scalar value?

I assume that I'm given a graph, and each node in the graph has a
value, and no other annotation; I can add annotations, but they have
to be derived from the values and topology.  Oh, and I assume that no
revision has more than 2 parents; probably things can be generalized
to the case of indegree 3 or higher, but it seems like a reasonable
restriction...

So, anyway, here's what I came up with.  Perhaps you all can tell me
if it makes sense.

User model
----------

Since the goal was to be "really, truly, right", I had to figure out
what exactly that meant... basically, what I'm calling a "user model"
-- a formal definition of how the user thinks about merging, to give
an operational definition of "should conflict" and "should clean
merge".  My rules are these:
  1) whenever a user explicitly sets the value, they express a claim
     that their setting is superior to the old setting
  2) whenever a user chooses to commit a new revision, they implicitly
     affirm the validity of the decisions that led to that revision's
     parents
    Corollary of (1) and (2): whenever a user explicitly sets the
     value, they express that they consider their new setting to be
     superior to _all_ old settings
  3) A "conflict" should occur if, and only if, the settings on each
     side of the merge express parallel claims.
This in itself is not an algorithm, or anything close to it; the hope
is that it's a good description of what people actually want out of a
merge algorithm, expressed clearly enough that we can create an
algorithm that fits these desiderata.

Algorithm
---------

I'll use slightly novel notation.  Lower case letters represent values
that scalar the scalar takes.  Upper case letters represent nodes in
the graph.

Now, here's an algorithm, that is supposed to just be a transcription
of the above rules, one step more formal:
  First, we need to know where users actively expressed an intention.
  Intention is defined by (1), above.  We use * to mark where this
  occurred:

    i)      a*     graph roots are always marked

            a
    ii)     |      no mark, value was not set
            a

            a
    iii)    |      b != a, so b node marked
            b*

          a   b
    iv)    \ /
            c*
                   c is totally new, so marked
          a   a
           \ /
            c*

          a   b    we're marking places where users expressed
    v)     \ /     intention; so b should be marked iff this
            b?     was a conflict (!)

          a   a    for now I'm not special-casing the coincidental
    vi)    \ /     clean merge case, so let's consider this to be
            a?     a subclass of (v).

  That's all the cases possible.  So, suppose we go through and
  annotate our graph with *s, using the above rules; we have a graph
  with some *s peppered through it, each * representing one point that
  a user took action.

  Now, a merge algorithm per se: Let's use *(A) to mean the unique
  nearest marked ancestor of node A.  Suppose we want to merge A and
  B.  There are exactly 3 cases:
    - *(A) is an ancestor of B, but not vice versa: B wins.
    - *(B) is an ancestor of A, but not vice versa: A wins.
    - *(A) is _not_ an ancestor of B, and vice versa: conflict,
      escalate to user
  Very intuitive, right?  If B supercedes the intention that led to A,
  then B should win, and vice-versa; if not, the user has expressed
  two conflicting intentions, and that, by definition, is a conflict.

  This lets us clarify what we mean by "was a conflict" in case (v)
  above.  When we have a merge of a and b that gives b, we simple
  calculate *(a); if it is an ancestor of 'b', then we're done, but if
  it isn't, then we mark the merge node.  (Subtle point: this is
  actually not _quite_ the same as detecting whether merging 'a' and
  'b' would have given a conflict; if we somehow managed to get a
  point in the graph that would have clean merged to 'a', but in fact
  was merged to 'b', then this algorithm will still mark the merge
  node.)  For cases where the two parents differ, you have to do this
  using the losing one; for cases where the two parents are the same,
  you should check both, because it could have been a clean merge two
  different ways.  If *(a1) = *(a2), i.e., both sides have the same
  nearest marked ancestor, consider that a clean merge.

  That's all.

Examples
--------

Of course, I haven't shown you this is well-defined or anything, but
to draw out the suspense a little, have some worked examples (like
most places in this document, I draw graphs with two leaves and assume
that those are being merged):

  graph:
       a*
      / \
     a   b*
  result: *(a) is an ancestor of b, but *(b) is not an ancestor of a;
    clean merge with result 'b'.

  graph:
       a*
      / \
     b*  c*
  result: *(b) = b is not an ancestor of c, and *(c) = c is not an
    ancestor of c; conflict.

  graph:
       a*
      / \
     b*  c*  <--- these are both marked, by (iii)
     |\ /|
     | X |
     |/ \|
     b*  c*  <--- which means these were conflicts, and thus marked
  result: the two leaves are both marked, and thus generate a conflict,
    as above.

Right, enough of that.  Math time.

Math
----

Theorem: In a graph marked following the above rules, every node N
  will have a unique least marked ancestor M, and the values of M and N
  will be the same.
Proof: By downwards induction on the graph structure.  The base case
  are graph roots, which by (i) are always marked, so the statement is
  trivially true.  Proceeding by cases, (iii) and (iv) are trivially
  true, since they produce nodes that are themselves marked.  (ii) is
  almost as simple; in a graph 'a' -> 'a', the child obviously
  inherits the parent's unique least marked ancestor, which by
  inductive hypothesis exists.  The interesting case is (v) and (vi):
     a   b
      \ /
       b
  If the child is marked, then again the statement is trivial; so
  suppose it is not.  By definition, this only occurs when *(a) is an
  ancestor of 'b'.  But, by assumption, 'b' has a unique nearest
  ancestor, whose value is 'b'.  Therefore, *(a) is also an ancestor
  of *(b).  If we're in the weird edge case (vi) where a = b, then
  these may be the same ancestor, which is fine.  Otherwise, the fact
  that a != b, and that *(a)'s value = a's value, *(b)'s value = b's
  value, implies that *(a) is a strict ancestor of *(b).  Either way,
  the child has a unique least marked ancestor, and it is the same
  ULMA as its same-valued parent, so the ULMA also has the right
  value.  QED.

Corollary: *(N) is a well-defined function.

Corollary: The three cases mentioned in the merge algorithm are the
  only possible cases.  In particular, it cannot be that *(A) is an
  ancestor of B and *(B) is an ancestor of A simultaneously, unless
  the two values being merged are identical (and why are you running
  your merge algorithm then?).  Or in other words: ambiguous clean
  merge does not exist.
Proof: Suppose *(A) is an ancestor of B, and *(B) is an ancestor of A.
  *(B) is unique, so *(A) must also be an ancestor of *(B).
  Similarly, *(B) must be an ancestor of *(A).  Therefore:
    *(A) = *(B)
  We also have:
    value(*(A)) = value(A)
    value(*(B)) = value(B)
  which implies
    value(A) = value(B).  QED.

Therefore, the above algorithm is well-defined in all possible cases.

We can prove another somewhat interesting fact:
Theorem: If A and B would merge cleanly with A winning, then any
  descendent D of A will also merge cleanly with B, with D winning.
Proof: *(B) is an ancestor of A, and A is an ancestor of D, so *(B) is
  an ancestor of D.

I suspect that this is enough to show that clean merges are order
invariant, but I don't have a proof together ATM.

Not sure what other properties would be interesting to prove; any
suggestions?  It'd be nice to have some sort of proof about "once a
conflict is resolved, you don't have to resolve it again" -- which is
the problem that makes ambiguous clean merge so bad -- but I'm not
sure how to state such a property formally.  Something about it being
possible to fully converge a graph by resolving a finite number of
conflicts or something, perhaps?

Funky cases
-----------

There are two funky cases I know of.

Coincidental clean merge:
    |
    a
   / \
  b*  b*

Two people independently made the same change.  When we're talking
about textual changes, some people argue this should give a conflict
(reasoning that perhaps the same line _should_ be inserted twice).  In
our context that argument doesn't even apply, because these are just
scalars; so obviously this should be a clean merge.  Currently, the
only way this algorithm has to handle this is to treat it as an
"automatically resolved conflict" -- there's a real conflict here, but
the VCS, acting as an agent for the user, may decide to just go ahead
and resolve it, because it knows perfectly well what the user will do.
In this interpretation, everything works fine, all the above stuff
applies; it's somewhat dissatisfying, though, because it's a violation
of the user model -- the user has not necessarily looked at this
merge, but we put the * of user-assertion on the result anyway.  Not a
show-stopper, I guess...

It's quite possible that the above stuff could be generalized to allow
non-unique least marked ancestors, that could only arise in exactly
this case.

I'm not actually sure what the right semantics would be, though.  If
we're merging:
    |
    a
   / \
  b   b
   \ / \
    b   c
Should that be a clean merge?  'b' was set twice, and only one of
these settings was overridden; is that good enough?

Do you still have the same opinion if the graph is:
    |
    a
    |
    b
   / \
  c   b
  |  / \
  b  b  c
  \ /
   b
?  Here the reason for the second setting of 'b' was that a change
away from it was reverted; to make it extra cringe-inducing, I threw
in that change being reverted was another change to 'c'... (this may
just be an example of how any merge algorithm has some particular case
you can construct where it will get something wrong, because it
doesn't _actually_ know how to read the users's minds).

Supporting these cases may irresistably lead back to ambiguous clean,
as well:
     |
     a
    / \
   b*  c*
  / \ / \
 c*  X   b*
  \ / \ /
   c   b

The other funky case is this thing (any clever name suggestions?):
    a
   / \
  b*  c*
   \ / \
    c*  d*
Merging here will give a conflict, with my algorithm; 3-way merge
would resolve it cleanly.  Polling people on #monotone and #revctrl,
the consensus seems to be that they agree with 3-way merge, but giving
a conflict is really not _that_ bad.  (It also seems to cause some
funky effects with darcs-merge; see zooko's comments on #revctrl and
darcs-users.)

This is really a problem with the user model, rather than the
algorithm.  Apparently people do not interpret the act of resolving
the b/c merge to be "setting" the result; They seem to interpret it as
"selecting" the result of 'c'; the 'c' in the result is in some sense
the "same" 'c' as in the parent.  The difference between "setting" and
"selecting" is the universe of possible options; if you see
   a   b
    \ /
     c
then you figure that the person doing the merge was picking from all
possible resolution values; when you see
   a   b
    \ /
     b
you figure that the user was just picking between the two options
given by the parents.  My user model is too simple to take this into
account.  It's not a huge extension to the model to do so; it's quite
possible that an algorithm could be devised that gave a clean merge
here, perhaps by separately tracking each node's nearest marked
ancestor and the original source of its value as two separate things.

Relation to other work
----------------------

This algorithm is very close to the traditional codeville-merge
approach to this problem; the primary algorithmic difference is the
marking of conflict resolutions as being "changes".  The more
important new stuff here, I think, are the user model and the proofs.

Traditionally, merge algorithms are evaluated by coming up with some
set of examples, eyeballing them to make some guess as to what the
"correct" answer was, comparing that to the algorithm's output, and
then arguing with people whose intuitions were different.
Fundamentally, merging is about deterministically guessing the user's
intent in situations where the user has not expressed any intent.
Humans are very good at guessing intent; we have big chunks of squishy
hardware designed to form sophisticated models of others intents, and
it's completely impossible for a VCS to try and duplicate that in
full.  My suggestion here, with my "user model", is to seriously and
explicitly study this part of the problem.  There are complicated
trade-offs between accuracy (correctly modeling intention),
conservatism (avoiding incorrectly modeling intention), and
implementability (describing the user's thought processes exactly
isn't so useful if you can't apply it in practice).  It's hard to make
an informed judgement when we don't have a name for the thing we're
trying to optimize, and hard to evaluate an algorithm when we can't
even say what it's supposed to be doing.

I suspect the benefit of the proofs is obvious to anyone who has spent
much time banging their head against this problem; until a few days
ago I was skeptical there _was_ a way to design a merge algorithm that
didn't run into problems like ambiguous clean merge.

I'm still skeptical, of course, until people read this; merging is
like crypto, you can't trust anything until everyone's tried to break
it... so let's say I'm cautiously optimistic .  If this holds up,
I'm quite happy; between the user model and the proofs, I'm far more
confident that this does something sensible in all cases and has no
lurking edge cases than I have been in any previous algorithm.  The
few problem cases I know of display a pleasing conservatism -- perhaps
more cautious than they need to be, but even if they do cause an
occasional unnecessary conflict, once the conflict is resolved it
should stay resolved.

So... do your worst!

-- Nathaniel

--
So let us espouse a less contested notion of truth and falsehood, even
if it is philosophically debatable (if we listen to philosophers, we
must debate everything, and there would be no end to the discussion).
  -- Serendipities, Umberto Eco

Replies and further discussion concerning this email can be found in
the  monotone-devel archives
(http://thread.gmane.org/gmane.comp.version-control.monotone.devel/4297).

Improvements to *-merge
-----------------------


From: Nathaniel Smith <njs@...>
Subject: improvements to *-merge
Newsgroups: gmane.comp.version-control.revctrl, gmane.comp.version-control.monotone.devel
Date: 2005-08-30 09:21:18 GMT

This is a revised version of *-merge:
  http://thread.gmane.org/gmane.comp.version-control.monotone.devel/4297
that properly handles accidental clean merges.  It does not improve
any of the other parts, just the handling of accidental clean merges.
It shows a way to relax the uniqueness of the *() operator, while
still preserving the basic results from the above email.  For clarity,
I'll say 'unique-*-merge' to refer to the algorithm given above, and
'multi-*-merge' to refer to this one.

This work is totally due to Timothy Brownawell <tbrownaw@...>.
All I did was polish up the proofs and write it up.  He has a more
complex version at:
  http://article.gmane.org/gmane.comp.version-control.monotone.devel/4496
that also attempts to avoid the conflict with:
     a
    / \
   b*  c*
    \ / \
     c*  d*
and has some convergence in it, but the analysis for that is not done.

So:

User model
----------

We keep exactly the same user model as unique-*-merge:

  1) whenever a user explicitly sets the value, they express a claim
     that their setting is superior to the old setting
  2) whenever a user chooses to commit a new revision, they implicitly
     affirm the validity of the decisions that led to that revision's
     parents
    Corollary of (1) and (2): whenever a user explicitly sets the
     value, they express that they consider their new setting to be
     superior to _all_ old settings
  3) A "conflict" should occur if, and only if, the settings on each
     side of the merge express parallel claims.

The difference is that unique-*-merge does not _quite_ fulfill this
model, because in real life your algorithm will automatically resolve
coincidental clean merge cases without asking for user input; but
unique-* is not smart enough to take this into account when inferring
user intentions.

Algorithm
---------

We start by marking the graph of previous revisions.  For each node in
the graph, we either mark it (denoted by a *), or do not.  A mark
indicates our inference that a human expressed an intention at this
node.

    i)      a*     graph roots are always marked

            a1
    ii)     |      no mark, value was not set
            a2

            a
    iii)    |      b != a, so 'b' node marked
            b*

          a   b
    iv)    \ /
            c*
                   'c' is totally new, so marked
          a1  a2
           \ /
            c*

          a   b1   we're marking places where users expressed
    v)     \ /     intention; so 'b' should be marked iff this
            b2?    was a conflict

          a1  a2   'a' matches parents, and so is not marked
    vi)    \ /     (alternatively, we can say this is a special
            a3     case of (v), that is never a conflict)

Case (vi) is the only one that differs from unique-* merge.  However,
because of it, we must use a new definition of *():

Definition: By *(A), we mean we set of minimal marked ancestors of A.
"Minimal" here is used in the mathematical sense of a node in a graph
that has no descendents in that graph.

Algorithm: Given two nodes to merge, A and B, we consider four cases:
   a) value(A) = value(B): return the shared value
   b) *(A) > B: return value(B)
   c) *(B) > A: return value(A)
   d) else: conflict; escalate to user
Where "*(A) > B" means "all elements of the set *(A) are non-strict
ancestors of the revision B".  The right way to read this is as "try
(a) first, and then if that fails try (b), (c), (d) simultaneously".

Note that except for the addition of rule (a), this is a strict
generalization of the unique-* algorithm; if *(A) and *(B) are
single-element sets, then this performs _exactly_ the same
computations as the unique-* algorithm.

Now we can say what we mean by "was a conflict" in case (v) above:
given a -> b2, b1 -> b2, we leave b2 unmarked if and only if
*(a) > b1.

Examples
--------

1.
    a1*
   / \
  a2  b*

result: *(a2) = {a1}, a1 > b, so b wins.

2.
    a*
   / \
  b*  c*

result: *(b) = {b}, *(c) = {c}, neither *(b) > c nor *(c) > b, so
 conflict.

3.
    a*
   / \
  b1* b2*
   \ / \
    b3  c1*

result: *(b3) = {b1, b2}; b2 > c1, but b1 is not > c, so c does not
 win.  *(c1) = {c1}, which is not > b3.  conflict.
note: this demonstrates that this algorithm does _not_ do convergence.
Instead, it takes the conservative position that for one node to
silently beat another, the winning node must pre-empt _all_ the
intentions that created the losing node.  While it's easy to come up
with just-so stories where this is the correct thing to do (e.g., b1
and b2 each contain some other changes that independently require 'a'
to become 'b'; c1 will have fixed up b2's changes, but not b1's), this
doesn't actually mean much.  Whether this is good or bad behavior a
somewhat unresolved question, that may ultimately be answered by which
merge algorithms turn out to be more tractable...

4.
    a*
   / \
  b1* b2*
  |\ /|
  | X |
  |/ \|
  b3  c*

result: *(b3) = {b1, b2} > c.  *(c) = {c}, which is not > b3.  c wins
 cleanly.

5.
     a*
    / \
   b1* c1*
  / \ / \
 c2* X   b2*
  \ / \ /
   c3  b3

result: *(c3) = {c1, c2}; c1 > b3 but c2 is not > b3, so b3 does not
 win.  likewise, *(b3) = {b1, b2}; b1 > c3 but b2 is not > c3, so c3
 does not win either.  conflict.

6.
     a*
    / \
   b1* c1*
  / \ / \
 c2* X   b2*
  \ / \ /
   c3  b3
   |\ /|
   | X |
   |/ \|
   c4* b4*

(this was my best effort to trigger an ambiguous clean merge with this
algorithm; it fails pitifully:)
result: *(c4) = {c4}, *(b4) = {b4}, obvious conflict.

Math
----

The interesting thing about this algorithm is that all the unique-*
proofs still go through, in a generalized form.  The key one that
makes *-merge tractable is:

Theorem: In a graph marked by the above rules, given a node N, all
 nodes in *(N) will have the same value as N.
Proof: By induction.  We consider the cases (i)-(vi) above.  (i)
 through (iv) are trivially true.  (v) is interesting.  b2 is marked
 when *(a) not > b1.  b2 being marked makes that case trivial, so
 suppose *(a) > b1.  All elements of *(a) are marked, and are
 ancestors of b1; therefore, by the definition of *() and "minimal",
 they are also all ancestors of things in *(b1).  Thus no element of
 *(a) can be a minimal marked ancestor of b2.
 (vi) is also trivial, because *(a3) = *(a1) union *(a2).  QED.

We also have to do a bit of extra work because of the sets:

Corollary 1: If *(A) > B, and any element R of *(B) is R > A, then
 value(A) = value(B).
Proof: Let such an R be given.  R > A, and R marked, imply that there
 is some element S of *(A) such that R > S.
 On the other hand, *(A) > B implies that S > B.  By similar reasoning
 to the above, this means that there is some element T of *(B) such
 that S > T.  So, recapping, we have:
  nodes:   R  >  S  >  T
   from: *(B)  *(A)  *(B)
 *(B) is a set of minimal nodes, yet we have R > T and R and T both in
 *(B).  This implies that R = T.  R > S > R implies that S = R,
 because we are in a DAG.  Thus
   value(A) = value(S) = value(R) = value(B)
 QED.

Corollary 2: If *(A) > B and *(B) > A, then not only does value(A) =
 value(B), but *(A) = *(B).
Proof: By above, each element of *(B) is equal to some element of
 *(A), and vice-versa.

This is good, because it means our algorithm is well-defined.  The
only time when options (b) and (c) (in the algorithm) can
simultaneously be true, is when the two values being merged are
identical to start with.  I.e., no somewhat anomalous "4th case" of
ambiguous clean merge.

Actually, this deserves some more discussion.  With *() returning a
set, there are some more subtle "partial ambiguous clean" cases to
think about -- should we be worrying about cases where some, but not
all, of the marked ancestors are pre-empted?  This is possible, as in
example 5 above:
     a*
    / \
   b1* c1*
  / \ / \
 c2* X   b2*
  \ / \ /
   c3  b3
A hypothetical (convergence supporting?) algorithm that said A beats B
if _any_ elements of *(A) are > B would give an ambiguous clean merge
on this case.  (Maybe that wouldn't be so bad, so long as we marked
the result, but I'm in no way prepared to do any sort of sufficient
analysis right now...)

The nastiest case of this is where *(A) > B, but some elements of *(B)
are > A -- so we silently make B win, but it's really not _quite_
clear that's a good idea, since A also beat B sometimes -- and we're
ignoring those user's intentions.

This is the nice thing about Corollary 1 (and why I didn't just
collapse it into Corollary 2) -- it assures us that the only time this
_weak_ form of ambiguous clean can happen is when A and B are already
identical.  This _can_ happen, for what it's worth:
       a*
      /|\
     / | \
    /  |  \
   /   |   \
  b1*  b2*  d*
  |\   /\  /
  | \ /  \/
  |  X   b3*
  | / \ /
  |/   b4
  b5
Here *(b5) = {b3, b2}, *(b6) = {b2, b4}.  If we ignore for a moment
that b4 and b5 have the same value, this is a merge that b4 would win
and b5 would lose, even though one of b4's ancestors, i.e. b1, is
pre-empted by b5.  However, it can _only_ happen if we ignore that
they have the same value...

The one other thing we proved about unique-* merge also still applies;
the proof goes through word-for-word:
Theorem: If A and B would merge cleanly with A winning, then any
  descendent D of A will also merge cleanly with B, with D winning.
Proof: *(B) > A, and A > D, so *(B) > D.

Discussion
----------

This algorithm resolves one of the two basic problems I observed for
unique-* merge -- coincidental clean merges are now handled, well,
cleanly, and the user model is fully implemented.  However, we still
do not handle the unnamed case (you guys totally let me down when I
requested names for this case last time):
    a
   / \
  b*  c*
   \ / \
    c*  d*
which still gives a conflict.  We also, of course, continue to not
support more exotic features like convergence or implicit rollback.

Not the most exciting thing in the world.  OTOH, it does strictly
increase the complexity of algorithms that are tractable to formal
analysis.

Comments and feedback appreciated.

-- Nathaniel

--
"The problem...is that sets have a very limited range of
activities -- they can't carry pianos, for example, nor drink
beer."

Replies and further discussion concerning this email can be found in
the  monotone-devel archives
(http://thread.gmane.org/gmane.comp.version-control.revctrl/93).

More on "mark-merge"
--------------------


From: Timothy Brownawell <tbrownaw@...>
Subject: more on "mark-merge"
Newsgroups: gmane.comp.version-control.revctrl, gmane.comp.version-control.monotone.devel

Prerequisite:
http://thread.gmane.org/gmane.comp.version-control.monotone.devel/4297

A user can make 2 types of merge decisions:
(1): One parent is better than the other (represented by *)
(2): Both parents are wrong (represented by ^)

Since there are 2 types of merge decisions, it would be bad to treat all
merge decisions the same. Also, in the case of merge(a, a) = a, it is
possible for there to be multiple least decision ancestors.

=====

Define: ^(A) is the set of ancestors of A that it gets its value from
(found by setting N=A and iterating N = *(N) until there is no change)
        *(A) is the set of least ancestors of A in which the user made a
decision

note that erase_ancestors(^(A)) = ^(A),
and erase_ancestors(*(A)) = *(A)

=====

& is intersection, | is union

*(A) has the same properties as before, except that it is not a single
ancestor, but a set. This set can acquire more than one member only in
the case of
   Aa    Ba
     \  /
      Ca
, where *(A) and *(B) are different; *(C) will be
erase_ancestors(*(A) | *(B))

The ancestory corollary becomes:
any ancestor C of A with value(C) != value(A) will be an ancestor of at
least one member of *(A)

When merging A and B:

# if one side knows of _all_ places that the other side was chosen, it
wins
(1)
set X = erase_ancestors(*(A) | *(B))
    if X & *(B) = {}, A wins
    if X & *(A) = {}, B wins
else, X contains members of both *(A) and *(B)

# if one side knows of _all_ places that the other side originated, it
wins
(2)
set Y = erase_ancestors(*(A) | ^(B))
set Z = erase_ancestors(*(B) | ^(A))
    if Y & ^(B) = {} and Z & ^(A) = {}, conflict
    if Y & ^(B) = {}, A wins
    if Z & ^(A) = {}, B wins

# if one side knows of _any_ places that the other side originated, it
wins
(3)
    if Y & ^(B) != ^(B) and Z & ^(A) != ^(A), conflict
    if Y & ^(B) != ^(B), A wins
    if Z & ^(A) != ^(A), B wins

# else, nobody knows anything
(4) conflict

(3) is convergence, and can be safely left out if unwanted

====

"Funky cases"

Coincidental clean does not exist; a mark is only needed when there is
user intervention.

    |
    a
   / \
  b   b
   \ / \
    b   c
and the example after it will resolve cleanly iff (3) is included.

     |
     a
    / \
   b*  c*
  / \ / \
 c*  X   b*
  \ / \ /
   c   b
will be a conflict.

    a
   / \
  b*  c*
   \ / \
    c*  d*
This ("the other funky case") is handled by (2), and resolves cleanly.

Tim

Replies and further discussion concerning this email can be found in
the  monotone-devel archives
(http://thread.gmane.org/gmane.comp.version-control.revctrl/92).


File: monotone.info,  Node: Default hooks,  Next: General Index,  Prev: Special Topics,  Up: Top

Appendix A Default hooks
************************

This section contains the entire source code of the standard hook file,
that is built in to the monotone executable, and read before any user
hooks files (unless `--nostd' is passed).  It contains the default
values for all hooks.


-- this is the standard set of lua hooks for monotone;
-- user-provided files can override it or add to it.

function temp_file(namehint)
   local tdir
   tdir = os.getenv("TMPDIR")
   if tdir == nil then tdir = os.getenv("TMP") end
   if tdir == nil then tdir = os.getenv("TEMP") end
   if tdir == nil then tdir = "/tmp" end
   local filename
   if namehint == nil then
      filename = string.format("%s/mtn.XXXXXX", tdir)
   else
      filename = string.format("%s/mtn.%s.XXXXXX", tdir, namehint)
   end
   local name = mkstemp(filename)
   local file = io.open(name, "r+")
   return file, name
end

function execute(path, ...)
   local pid
   local ret = -1
   pid = spawn(path, unpack(arg))
   if (pid ~= -1) then ret, pid = wait(pid) end
   return ret
end

-- Wrapper around execute to let user confirm in the case where a subprocess
-- returns immediately
-- This is needed to work around some brokenness with some merge tools
-- (e.g. on OS X)
function execute_confirm(path, ...)
   ret = execute(path, unpack(arg))

   if (ret ~= 0)
   then
      print(gettext("Press enter"))
   else
      print(gettext("Press enter when the subprocess has completed"))
   end
   io.read()
   return ret
end

-- attributes are persistent metadata about files (such as execute
-- bit, ACLs, various special flags) which we want to have set and
-- re-set any time the files are modified. the attributes themselves
-- are stored in the roster associated with the revision. each (f,k,v)
-- attribute triple turns into a call to attr_functions[k](f,v) in lua.

if (attr_init_functions == nil) then
   attr_init_functions = {}
end

attr_init_functions["mtn:execute"] =
   function(filename)
      if (is_executable(filename)) then
        return "true"
      else
        return nil
      end
   end

attr_init_functions["mtn:manual_merge"] =
   function(filename)
      if (binary_file(filename)) then
        return "true" -- binary files must be merged manually
      else
        return nil
      end
   end

if (attr_functions == nil) then
   attr_functions = {}
end

attr_functions["mtn:execute"] =
   function(filename, value)
      if (value == "true") then
         make_executable(filename)
      end
   end

function dir_matches(name, dir)
   -- helper for ignore_file, matching files within dir, or dir itself.
   -- eg for dir of 'CVS', matches CVS/, CVS/*, */CVS/ and */CVS/*
   if (string.find(name, "^" .. dir .. "/")) then return true end
   if (string.find(name, "^" .. dir .. "$")) then return true end
   if (string.find(name, "/" .. dir .. "/")) then return true end
   if (string.find(name, "/" .. dir .. "$")) then return true end
   return false
end

function ignore_file(name)
   -- project specific
   if (ignored_files == nil) then
      ignored_files = {}
      local ignfile = io.open(".mtn-ignore", "r")
      if (ignfile ~= nil) then
         local line = ignfile:read()
         while (line ~= nil) do
            if line ~= "" then
                table.insert(ignored_files, line)
            end
            line = ignfile:read()
         end
         io.close(ignfile)
      end
   end
   for i, line in pairs(ignored_files)
   do
      local pcallstatus, result = pcall(function() return regex.search(line, name) end)
      if pcallstatus == true then
          -- no error from the regex.search call
          if result == true then return true end
      else
          -- regex.search had a problem, warn the user their .mtn-ignore file syntax is wrong
          io.stderr:write("WARNING: the line '" .. line .. "' in your .mtn-ignore file caused error '" .. result .. "'"
                           .. " while matching filename '" .. name .. "'.\nignoring this regex for all remaining files.\n")
          table.remove(ignored_files, i)
      end
   end

   local file_pats = {
      -- c/c++
      "%.a$", "%.so$", "%.o$", "%.la$", "%.lo$", "^core$",
      "/core$", "/core%.%d+$",
      -- java
      "%.class$",
      -- python
      "%.pyc$", "%.pyo$",
      -- gettext
      "%.g?mo$",
      -- intltool
      "%.intltool%-merge%-cache$",
      -- TeX
      "%.aux$",
      -- backup files
      "%.bak$", "%.orig$", "%.rej$", "%~$",
      -- vim creates .foo.swp files
      "%.[^/]*%.swp$",
      -- emacs creates #foo# files
      "%#[^/]*%#$",
      -- other VCSes (where metadata is stored in named files):
      "%.scc$",
      -- desktop/directory configuration metadata
      "^%.DS_Store$", "/%.DS_Store$", "^desktop%.ini$", "/desktop%.ini$"
   }

   local dir_pats = {
      -- autotools detritus:
      "autom4te%.cache", "%.deps", "%.libs",
      -- Cons/SCons detritus:
      "%.consign", "%.sconsign",
      -- other VCSes (where metadata is stored in named dirs):
      "CVS", "%.svn", "SCCS", "_darcs", "%.cdv", "%.git", "%.bzr", "%.hg"
   }

   for _, pat in ipairs(file_pats) do
      if string.find(name, pat) then return true end
   end
   for _, pat in ipairs(dir_pats) do
      if dir_matches(name, pat) then return true end
   end

   return false;
end

-- return true means "binary", false means "text",
-- nil means "unknown, try to guess"
function binary_file(name)
   -- some known binaries, return true
   local bin_pats = {
      "%.gif$", "%.jpe?g$", "%.png$", "%.bz2$", "%.gz$", "%.zip$",
      "%.class$", "%.jar$", "%.war$", "%.ear$"
   }

   -- some known text, return false
   local txt_pats = {
      "%.cc?$", "%.cxx$", "%.hh?$", "%.hxx$", "%.cpp$", "%.hpp$",
      "%.lua$", "%.texi$", "%.sql$", "%.java$"
   }

   local lowname=string.lower(name)
   for _, pat in ipairs(bin_pats) do
      if string.find(lowname, pat) then return true end
   end
   for _, pat in ipairs(txt_pats) do
      if string.find(lowname, pat) then return false end
   end

   -- unknown - read file and use the guess-binary
   -- monotone built-in function
   return guess_binary_file_contents(name)
end

-- given a file name, return a regular expression which will match
-- lines that name top-level constructs in that file, or "", to disable
-- matching.
function get_encloser_pattern(name)
   -- texinfo has special sectioning commands
   if (string.find(name, "%.texi$")) then
      -- sectioning commands in texinfo: @node, @chapter, @top,
      -- @((sub)?sub)?section, @unnumbered(((sub)?sub)?sec)?,
      -- @appendix(((sub)?sub)?sec)?, @(|major|chap|sub(sub)?)heading
      return ("^@("
              .. "node|chapter|top"
              .. "|((sub)?sub)?section"
              .. "|(unnumbered|appendix)(((sub)?sub)?sec)?"
              .. "|(major|chap|sub(sub)?)?heading"
              .. ")")
   end
   -- LaTeX has special sectioning commands.  This rule is applied to ordinary
   -- .tex files too, since there's no reliable way to distinguish those from
   -- latex files anyway, and there's no good pattern we could use for
   -- arbitrary plain TeX anyway.
   if (string.find(name, "%.tex$")
       or string.find(name, "%.ltx$")
       or string.find(name, "%.latex$")) then
      return ("\\\\("
              .. "part|chapter|paragraph|subparagraph"
              .. "|((sub)?sub)?section"
              .. ")")
   end
   -- There's no good way to find section headings in raw text, and trying
   -- just gives distracting output, so don't even try.
   if (string.find(name, "%.txt$")
       or string.upper(name) == "README") then
      return ""
   end
   -- This default is correct surprisingly often -- in pretty much any text
   -- written with code-like indentation.
   return "^[[:alnum:]$_]"
end

function edit_comment(basetext, user_log_message)
   local exe = nil
   if (program_exists_in_path("vi")) then exe = "vi" end
   if (string.sub(get_ostype(), 1, 6) ~= "CYGWIN" and program_exists_in_path("notepad.exe")) then exe = "notepad.exe" end
   local debian_editor = io.open("/usr/bin/editor")
   if (debian_editor ~= nil) then
      debian_editor:close()
      exe = "/usr/bin/editor"
   end
   local visual = os.getenv("VISUAL")
   if (visual ~= nil) then exe = visual end
   local editor = os.getenv("EDITOR")
   if (editor ~= nil) then exe = editor end

   if (exe == nil) then
      io.write("Could not find editor to enter commit message\n"
               .. "Try setting the environment variable EDITOR\n")
      return nil
   end

   local tmp, tname = temp_file()
   if (tmp == nil) then return nil end
   basetext = "MTN: " .. string.gsub(basetext, "\n", "\nMTN: ") .. "\n"
   tmp:write(user_log_message)
   if user_log_message == "" or string.sub(user_log_message, -1) ~= "\n" then
      tmp:write("\n")
   end
   tmp:write(basetext)
   io.close(tmp)

   if (execute(exe, tname) ~= 0) then
      io.write(string.format(gettext("Error running editor '%s' to enter log message\n"),
                             exe))
      os.remove(tname)
      return nil
   end

   tmp = io.open(tname, "r")
   if (tmp == nil) then os.remove(tname); return nil end
   local res = ""
   local line = tmp:read()
   while(line ~= nil) do
      if (not string.find(line, "^MTN:")) then
         res = res .. line .. "\n"
      end
      line = tmp:read()
   end
   io.close(tmp)
   os.remove(tname)
   return res
end


function persist_phrase_ok()
   return true
end


function use_inodeprints()
   return false
end


-- trust evaluation hooks

function intersection(a,b)
   local s={}
   local t={}
   for k,v in pairs(a) do s[v] = 1 end
   for k,v in pairs(b) do if s[v] ~= nil then table.insert(t,v) end end
   return t
end

function get_revision_cert_trust(signers, id, name, val)
   return true
end

function get_manifest_cert_trust(signers, id, name, val)
   return true
end

function get_file_cert_trust(signers, id, name, val)
   return true
end

function accept_testresult_change(old_results, new_results)
   local reqfile = io.open("_MTN/wanted-testresults", "r")
   if (reqfile == nil) then return true end
   local line = reqfile:read()
   local required = {}
   while (line ~= nil)
   do
      required[line] = true
      line = reqfile:read()
   end
   io.close(reqfile)
   for test, res in pairs(required)
   do
      if old_results[test] == true and new_results[test] ~= true
      then
         return false
      end
   end
   return true
end

-- merger support

-- Fields in the mergers structure:
-- cmd       : a function that performs the merge operation using the chosen
--             program, best try.
-- available : a function that checks that the needed program is installed and
--             in $PATH
-- wanted    : a function that checks if the user doesn't want to use this
--             method, and returns false if so.  This should normally return
--             true, but in some cases, especially when the merger is really
--             an editor, the user might have a preference in EDITOR and we
--             need to respect that.
--             NOTE: wanted is only used when the user has NOT defined the
--             `merger' variable or the MTN_MERGE environment variable.
mergers = {}

mergers.meld = {
   cmd = function (tbl)
      io.write (string.format("\nWARNING: 'meld' was choosen to perform external 3-way merge.\n"..
          "You should merge all changes to *CENTER* file due to limitation of program\n"..
          "arguments.\n\n"))
      local path = "meld"
      local ret = execute(path, tbl.lfile, tbl.afile, tbl.rfile)
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), path))
         return false
      end
      return tbl.afile
   end ,
   available = function () return program_exists_in_path("meld") end,
   wanted = function () return true end
}

mergers.tortoise = {
   cmd = function (tbl)
      local path = "tortoisemerge"
      local ret = execute(path,
                          string.format("/base:%s", tbl.afile),
                          string.format("/theirs:%s", tbl.lfile),
                          string.format("/mine:%s", tbl.rfile),
                          string.format("/merged:%s", tbl.outfile))
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), path))
         return false
      end
      return tbl.outfile
   end ,
   available = function() return program_exists_in_path ("tortoisemerge") end,
   wanted = function () return true end
}

mergers.vim = {
   cmd = function (tbl)
      io.write (string.format("\nWARNING: 'vim' was choosen to perform external 3-way merge.\n"..
          "You should merge all changes to *LEFT* file due to limitation of program\n"..
          "arguments.  The order of the files is ancestor, left, right.\n\n"))
      local vim
      local exec
      if os.getenv ("DISPLAY") ~= nil and program_exists_in_path ("gvim") then
	 vim = "gvim"
	 exec = execute_confirm
      else
	 vim = "vim"
	 exec = execute
      end
      local ret = exec(vim, "-f", "-d", "-c", string.format("file %s", tbl.outfile),
                          tbl.afile, tbl.lfile, tbl.rfile)
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), vim))
         return false
      end
      return tbl.outfile
   end ,
   available =
      function ()
	 return program_exists_in_path("vim") or
	    program_exists_in_path("gvim")
      end ,
   wanted =
      function ()
	 local editor = os.getenv("EDITOR")
	 if editor and
	    not (string.find(editor, "vim") or
		 string.find(editor, "gvim")) then
	    return false
	 end
	 return true
      end
}

mergers.rcsmerge = {
   cmd = function (tbl)
      -- XXX: This is tough - should we check if conflict markers stay or not?
      -- If so, we should certainly give the user some way to still force
      -- the merge to proceed since they can appear in the files (and I saw
      -- that). --pasky
      local merge = os.getenv("MTN_RCSMERGE")
      if execute(merge, tbl.lfile, tbl.afile, tbl.rfile) == 0 then
         copy_text_file(tbl.lfile, tbl.outfile);
         return tbl.outfile
      end
      local ret = execute("vim", "-f", "-c", string.format("file %s", tbl.outfile
),
                          tbl.lfile)
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), "vim"))
         return false
      end
      return tbl.outfile
   end,
   available =
      function ()
	 local merge = os.getenv("MTN_RCSMERGE")
	 return merge and
	    program_exists_in_path(merge) and program_exists_in_path("vim")
      end ,
   wanted = function () return os.getenv("MTN_RCSMERGE") ~= nil end
}

mergers.diffutils = {
   cmd = function (tbl)
      local ret = execute(
          "diff3",
          "--merge",
          "--label", string.format("%s [left]",     tbl.left_path ),
          "--label", string.format("%s [ancestor]", tbl.anc_path  ),
          "--label", string.format("%s [right]",    tbl.right_path),
          tbl.lfile,
          tbl.afile,
          tbl.rfile
      )
      if (ret ~= 0) then
         io.write(gettext("Error running GNU diffutils 3-way difference tool 'diff3'\n"))
         return false
      end
      local ret = execute(
          "sdiff",
          "--diff-program=diff",
          "--suppress-common-lines",
          "--minimal",
          "--output", tbl.outfile,
          tbl.lfile,
          tbl.rfile
      )
      if (ret == 2) then
         io.write(gettext("Error running GNU diffutils 2-two merging tool 'sdiff'\n"))
         return false
      end
      return tbl.outfile
   end,
   available =
      function ()
          return program_exists_in_path("diff3") and
                 program_exists_in_path("sdiff");
      end,
   wanted =
      function ()
           return true
      end
}

mergers.emacs = {
   cmd = function (tbl)
      local emacs
      if program_exists_in_path("xemacs") then
         emacs = "xemacs"
      else
         emacs = "emacs"
      end
      local elisp = "(ediff-merge-files-with-ancestor \"%s\" \"%s\" \"%s\" nil \"%s\")"
      local ret = execute(emacs, "--eval",
                          string.format(elisp, tbl.lfile, tbl.rfile, tbl.afile, tbl.outfile))
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), emacs))
         return false
      end
      return tbl.outfile
   end,
   available =
      function ()
	 return program_exists_in_path("xemacs") or
	    program_exists_in_path("emacs")
      end ,
   wanted =
      function ()
	 local editor = os.getenv("EDITOR")
	 if editor and
	    not (string.find(editor, "emacs") or
		 string.find(editor, "gnu")) then
	    return false
	 end
	 return true
      end
}

mergers.xxdiff = {
   cmd = function (tbl)
      local path = "xxdiff"
      local ret = execute(path,
                        "--title1", tbl.left_path,
                        "--title2", tbl.right_path,
                        "--title3", tbl.merged_path,
                        tbl.lfile, tbl.afile, tbl.rfile,
                        "--merge",
                        "--merged-filename", tbl.outfile,
                        "--exit-with-merge-status")
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), path))
         return false
      end
      return tbl.outfile
   end,
   available = function () return program_exists_in_path("xxdiff") end,
   wanted = function () return true end
}

mergers.kdiff3 = {
   cmd = function (tbl)
      local path = "kdiff3"
      local ret = execute(path,
                          "--L1", tbl.anc_path,
                          "--L2", tbl.left_path,
                          "--L3", tbl.right_path,
                          tbl.afile, tbl.lfile, tbl.rfile,
                          "--merge",
                          "--o", tbl.outfile)
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), path))
         return false
      end
      return tbl.outfile
   end,
   available = function () return program_exists_in_path("kdiff3") end,
   wanted = function () return true end
}

mergers.opendiff = {
   cmd = function (tbl)
      local path = "opendiff"
      -- As opendiff immediately returns, let user confirm manually
      local ret = execute_confirm(path,
                                  tbl.lfile,tbl.rfile,
                                  "-ancestor",tbl.afile,
                                  "-merge",tbl.outfile)
      if (ret ~= 0) then
         io.write(string.format(gettext("Error running merger '%s'\n"), path))
         return false
      end
      return tbl.outfile
   end,
   available = function () return program_exists_in_path("opendiff") end,
   wanted = function () return true end
}

function write_to_temporary_file(data, namehint)
   tmp, filename = temp_file(namehint)
   if (tmp == nil) then
      return nil
   end;
   tmp:write(data)
   io.close(tmp)
   return filename
end

function copy_text_file(srcname, destname)
   src = io.open(srcname, "r")
   if (src == nil) then return nil end
   dest = io.open(destname, "w")
   if (dest == nil) then return nil end

   while true do
      local line = src:read()
      if line == nil then break end
      dest:write(line, "\n")
   end

   io.close(dest)
   io.close(src)
end

function read_contents_of_file(filename, mode)
   tmp = io.open(filename, mode)
   if (tmp == nil) then
      return nil
   end
   local data = tmp:read("*a")
   io.close(tmp)
   return data
end

function program_exists_in_path(program)
   return existsonpath(program) == 0
end

function get_preferred_merge3_command (tbl)
   local default_order = {"kdiff3", "xxdiff", "opendiff", "tortoise", "emacs", "vim", "meld", "diffutils"}
   local function existmerger(name)
      local m = mergers[name]
      if type(m) == "table" and m.available(tbl) then
         return m.cmd
      end
      return nil
   end
   local function trymerger(name)
      local m = mergers[name]
      if type(m) == "table" and m.available(tbl) and m.wanted(tbl) then
         return m.cmd
      end
      return nil
   end
   -- Check if there's a merger given by the user.
   local mkey = os.getenv("MTN_MERGE")
   if not mkey then mkey = merger end
   if not mkey and os.getenv("MTN_RCSMERGE") then mkey = "rcsmerge" end
   -- If there was a user-given merger, see if it exists.  If it does, return
   -- the cmd function.  If not, return nil.
   local c
   if mkey then c = existmerger(mkey) end
   if c then return c,mkey end
   if mkey then return nil,mkey end
   -- If there wasn't any user-given merger, take the first that's available
   -- and wanted.
   for _,mkey in ipairs(default_order) do
      c = trymerger(mkey) ; if c then return c,nil end
   end
end

function merge3 (anc_path, left_path, right_path, merged_path, ancestor, left, right)
   local ret = nil
   local tbl = {}

   tbl.anc_path = anc_path
   tbl.left_path = left_path
   tbl.right_path = right_path

   tbl.merged_path = merged_path
   tbl.afile = nil
   tbl.lfile = nil
   tbl.rfile = nil
   tbl.outfile = nil
   tbl.meld_exists = false
   tbl.lfile = write_to_temporary_file (left, "left")
   tbl.afile = write_to_temporary_file (ancestor, "ancestor")
   tbl.rfile = write_to_temporary_file (right, "right")
   tbl.outfile = write_to_temporary_file ("", "merged")

   if tbl.lfile ~= nil and tbl.rfile ~= nil and tbl.afile ~= nil and tbl.outfile ~= nil
   then
      local cmd,mkey = get_preferred_merge3_command (tbl)
      if cmd ~=nil
      then
         io.write (string.format(gettext("executing external 3-way merge command\n")))
         ret = cmd (tbl)
         if not ret then
            ret = nil
         else
            ret = read_contents_of_file (ret, "r")
            if string.len (ret) == 0
            then
               ret = nil
            end
         end
      else
	 if mkey then
	    io.write (string.format("The possible commands for the "..mkey.." merger aren't available.\n"..
                "You may want to check that $MTN_MERGE or the lua variable `merger' is set\n"..
                "to something available.  If you want to use vim or emacs, you can also\n"..
		"set $EDITOR to something appropriate"))
	 else
	    io.write (string.format("No external 3-way merge command found.\n"..
                "You may want to check that $EDITOR is set to an editor that supports 3-way\n"..
                "merge, set this explicitly in your get_preferred_merge3_command hook,\n"..
                "or add a 3-way merge program to your path.\n\n"))
	 end
      end
   end

   os.remove (tbl.lfile)
   os.remove (tbl.rfile)
   os.remove (tbl.afile)
   os.remove (tbl.outfile)

   return ret
end

-- expansion of values used in selector completion

function expand_selector(str)

   -- something which looks like a generic cert pattern
   if string.find(str, "^[^=]*=.*$")
   then
      return ("c:" .. str)
   end

   -- something which looks like an email address
   if string.find(str, "[%w%-_]+@[%w%-_]+")
   then
      return ("a:" .. str)
   end

   -- something which looks like a branch name
   if string.find(str, "[%w%-]+%.[%w%-]+")
   then
      return ("b:" .. str)
   end

   -- a sequence of nothing but hex digits
   if string.find(str, "^%x+$")
   then
      return ("i:" .. str)
   end

   -- tries to expand as a date
   local dtstr = expand_date(str)
   if  dtstr ~= nil
   then
      return ("d:" .. dtstr)
   end

   return nil
end

-- expansion of a date expression
function expand_date(str)
   -- simple date patterns
   if string.find(str, "^19%d%d%-%d%d")
      or string.find(str, "^20%d%d%-%d%d")
   then
      return (str)
   end

   -- "now"
   if str == "now"
   then
      local t = os.time(os.date('!*t'))
      return os.date("%FT%T", t)
   end

   -- today don't uses the time         # for xgettext's sake, an extra quote
   if str == "today"
   then
      local t = os.time(os.date('!*t'))
      return os.date("%F", t)
   end

   -- "yesterday", the source of all hangovers
   if str == "yesterday"
   then
      local t = os.time(os.date('!*t'))
      return os.date("%F", t - 86400)
   end

   -- "CVS style" relative dates such as "3 weeks ago"
   local trans = {
      minute = 60;
      hour = 3600;
      day = 86400;
      week = 604800;
      month = 2678400;
      year = 31536000
   }
   local pos, len, n, type = string.find(str, "(%d+) ([minutehordaywk]+)s? ago")
   if trans[type] ~= nil
   then
      local t = os.time(os.date('!*t'))
      if trans[type] <= 3600
      then
        return os.date("%FT%T", t - (n * trans[type]))
      else
        return os.date("%F", t - (n * trans[type]))
      end
   end

   return nil
end


external_diff_default_args = "-u"

-- default external diff, works for gnu diff
function external_diff(file_path, data_old, data_new, is_binary, diff_args, rev_old, rev_new)
   local old_file = write_to_temporary_file(data_old);
   local new_file = write_to_temporary_file(data_new);

   if diff_args == nil then diff_args = external_diff_default_args end
   execute("diff", diff_args, "--label", file_path .. "\told", old_file, "--label", file_path .. "\tnew", new_file);

   os.remove (old_file);
   os.remove (new_file);
end

-- netsync permissions hooks (and helper)

function globish_match(glob, str)
      local pcallstatus, result = pcall(function() if (globish.match(glob, str)) then return true else return false end end)
      if pcallstatus == true then
          -- no error
          return result
      else
          -- globish.match had a problem
          return nil
      end
end

function get_netsync_read_permitted(branch, ident)
   local permfile = io.open(get_confdir() .. "/read-permissions", "r")
   if (permfile == nil) then return false end
   local dat = permfile:read("*a")
   io.close(permfile)
   local res = parse_basic_io(dat)
   if res == nil then
      io.stderr:write("file read-permissions cannot be parsed\n")
      return false
   end
   local matches = false
   local cont = false
   for i, item in pairs(res)
   do
      -- legal names: pattern, allow, deny, continue
      if item.name == "pattern" then
         if matches and not cont then return false end
         matches = false
         cont = false
         for j, val in pairs(item.values) do
            if globish_match(val, branch) then matches = true end
         end
      elseif item.name == "allow" then if matches then
         for j, val in pairs(item.values) do
            if val == "*" then return true end
            if val == "" and ident == nil then return true end
            if globish_match(val, ident) then return true end
         end
      end elseif item.name == "deny" then if matches then
         for j, val in pairs(item.values) do
            if val == "*" then return false end
            if val == "" and ident == nil then return false end
            if globish_match(val, ident) then return false end
         end
      end elseif item.name == "continue" then if matches then
         cont = true
         for j, val in pairs(item.values) do
            if val == "false" or val == "no" then cont = false end
         end
      end elseif item.name ~= "comment" then
         io.stderr:write("unknown symbol in read-permissions: " .. item.name .. "\n")
         return false
      end
   end
   return false
end

function get_netsync_write_permitted(ident)
   local permfile = io.open(get_confdir() .. "/write-permissions", "r")
   if (permfile == nil) then
      return false
   end
   local matches = false
   local line = permfile:read()
   while (not matches and line ~= nil) do
      local _, _, ln = string.find(line, "%s*([^%s]*)%s*")
      if ln == "*" then matches = true end
      if globish_match(ln, ident) then matches = true end
      line = permfile:read()
   end
   io.close(permfile)
   return matches
end

-- This is a simple function which assumes you're going to be spawning
-- a copy of mtn, so reuses a common bit at the end for converting
-- local args into remote args. You might need to massage the logic a
-- bit if this doesn't fit your assumptions.

function get_netsync_connect_command(uri, args)

        local argv = nil

        if uri["scheme"] == "ssh"
                and uri["host"]
                and uri["path"] then

                argv = { "ssh" }
                if uri["user"] then
                        table.insert(argv, "-l")
                        table.insert(argv, uri["user"])
                end
                if uri["port"] then
                        table.insert(argv, "-p")
                        table.insert(argv, uri["port"])
                end

                -- ssh://host/~/dir/file.mtn or
                -- ssh://host/~user/dir/file.mtn should be home-relative
                if string.find(uri["path"], "^/~") then
                        uri["path"] = string.sub(uri["path"], 2)
                end

                table.insert(argv, uri["host"])
        end

        if uri["scheme"] == "file" and uri["path"] then
                argv = { }
        end

        if argv then

                table.insert(argv, get_mtn_command(uri["host"]))

                if args["debug"] then
                        table.insert(argv, "--debug")
                else
                        table.insert(argv, "--quiet")
                end

                table.insert(argv, "--db")
                table.insert(argv, uri["path"])
                table.insert(argv, "serve")
                table.insert(argv, "--stdio")
                table.insert(argv, "--no-transport-auth")

        end
        return argv
end

function use_transport_auth(uri)
        if uri["scheme"] == "ssh"
        or uri["scheme"] == "file" then
                return false
        else
                return true
        end
end

function get_mtn_command(host)
        return "mtn"
end


File: monotone.info,  Node: General Index,  Prev: Default hooks,  Up: Top

General Index
*************

 [index ]
* Menu:

* accept_testresult_change (OLD_RESULTS, NEW_RESULTS): Hooks.
                                                             (line  546)
* attr_functions [ATTRIBUTE] (FILENAME, VALUE): Hooks.       (line  703)
* attr_init_functions [ATTRIBUTE] (FILENAME): Hooks.         (line  726)
* edit_comment (COMMENTARY, USER_LOG_MESSAGE): Hooks.        (line  212)
* existonpath(POSSIBLE_COMMAND):         Additional Lua Functions.
                                                             (line    9)
* expand_date (STR):                     Hooks.              (line  680)
* expand_selector (STR):                 Hooks.              (line  672)
* external_diff (FILE_PATH, OLD_DATA, NEW_DATA, IS_BINARY,: Hooks.
                                                             (line  598)
* get_author (BRANCHNAME, KEYPAIR_ID):   Hooks.              (line  186)
* get_branch_key (BRANCHNAME):           Hooks.              (line  167)
* get_confdir():                         Additional Lua Functions.
                                                             (line   17)
* get_encloser_pattern (FILE_PATH):      Hooks.              (line  583)
* get_mtn_command(HOST):                 Hooks.              (line  470)
* get_netsync_connect_command (URI, ARGS): Hooks.            (line  361)
* get_netsync_read_permitted (BRANCH, IDENTITY): Hooks.      (line  280)
* get_netsync_write_permitted (IDENTITY): Hooks.             (line  323)
* get_ostype():                          Additional Lua Functions.
                                                             (line   21)
* get_passphrase (KEYPAIR_ID):           Hooks.              (line  177)
* get_preferred_merge3_command(TBL):     Hooks.              (line  653)
* get_revision_cert_trust (SIGNERS, ID, NAME, VAL): Hooks.   (line  500)
* guess_binary_file_contents(FILESPEC):  Additional Lua Functions.
                                                             (line   24)
* ignore_branch (BRANCHNAME):            Hooks.              (line  265)
* ignore_file (FILENAME):                Hooks.              (line  251)
* include(SCRIPTFILE):                   Additional Lua Functions.
                                                             (line   31)
* includedir(SCRIPTPATH):                Additional Lua Functions.
                                                             (line   36)
* includedirpattern(SCRIPTPATH, PATTERN): Additional Lua Functions.
                                                             (line   42)
* is_executable(FILESPEC):               Additional Lua Functions.
                                                             (line   49)
* kill(PID [, SIGNAL]):                  Additional Lua Functions.
                                                             (line   53)
* make_executable(FILESPEC):             Additional Lua Functions.
                                                             (line   59)
* match(GLOB, STRING):                   Additional Lua Functions.
                                                             (line   63)
* merge3 (ANCESTOR_PATH, LEFT_PATH, RIGHT_PATH, MERGED_PATH, ANCESTOR_TEXT, LEFT_TEXT, RIGHT_TEXT): Hooks.
                                                             (line  631)
* mkstemp(TEMPLATE):                     Additional Lua Functions.
                                                             (line   66)
* mtn --branch=BRANCHNAME checkout DIRECTORY: Tree.          (line   19)
* mtn --branch=BRANCHNAME co DIRECTORY:  Tree.               (line   20)
* mtn [--bookkeep-only] drop PATHNAME...: Workspace.         (line   44)
* mtn [--bookkeep-only] mv SRC DST:      Workspace.          (line   72)
* mtn [--bookkeep-only] mv SRC1 ... DST/: Workspace.         (line   74)
* mtn [--bookkeep-only] rename SRC DST:  Workspace.          (line   70)
* mtn [--bookkeep-only] rename SRC1 ... DST/: Workspace.     (line   73)
* mtn [--no-respect-ignore] mkdir DIRECTORY...: Workspace.   (line   37)
* mtn add --unknown:                     Workspace.          (line   20)
* mtn add PATHNAME...:                   Workspace.          (line   18)
* mtn annotate [--revision=ID] [--brief] FILE: Informative.  (line   69)
* mtn annotate FILE:                     Informative.        (line   68)
* mtn approve ID:                        Certificate.        (line   15)
* mtn automate ancestors REV1 [REV2 [...]]: Automation.      (line   67)
* mtn automate ancestry_difference NEW [OLD1 [OLD2 [...]]]: Automation.
                                                             (line  323)
* mtn automate attributes FILE:          Automation.         (line 1221)
* mtn automate branches:                 Automation.         (line  417)
* mtn automate cert REVISION NAME VALUE: Automation.         (line 1820)
* mtn automate certs ID:                 Automation.         (line  781)
* mtn automate children REV:             Automation.         (line  193)
* mtn automate common_ancestors REV1 [REV2 [...]]: Automation.
                                                             (line   99)
* mtn automate content_diff [--revision=ID1 [--revision=ID2]] [FILES ...]: Automation.
                                                             (line 1298)
* mtn automate db_get DOMAIN NAME:       Automation.         (line 1698)
* mtn automate db_put DOMAIN NAME VALUE: Automation.         (line 1724)
* mtn automate descendents REV1 [REV2 [...]]: Automation.    (line  161)
* mtn automate erase_ancestors [REV1 [REV2 [...]]]: Automation.
                                                             (line  258)
* mtn automate get_base_revision_id:     Automation.         (line 1031)
* mtn automate get_content_changed ID FILE: Automation.      (line 1623)
* mtn automate get_corresponding_path SOURCE_ID FILE TARGET_ID: Automation.
                                                             (line 1656)
* mtn automate get_current_revision_id:  Automation.         (line 1056)
* mtn automate get_file ID:              Automation.         (line 1352)
* mtn automate get_file_of FILENAME [--revision=ID]: Automation.
                                                             (line 1380)
* mtn automate get_manifest_of:          Automation.         (line 1082)
* mtn automate get_manifest_of REVID:    Automation.         (line 1084)
* mtn automate get_option OPTION:        Automation.         (line 1413)
* mtn automate get_revision:             Automation.         (line  936)
* mtn automate get_revision ID:          Automation.         (line  938)
* mtn automate graph:                    Automation.         (line  222)
* mtn automate heads [BRANCH]:           Automation.         (line   39)
* mtn automate identify PATH:            Automation.         (line  538)
* mtn automate interface_version:        Automation.         (line   10)
* mtn automate inventory:                Automation.         (line  564)
* mtn automate keys:                     Automation.         (line 1437)
* mtn automate leaves:                   Automation.         (line  359)
* mtn automate packet_for_certs ID:      Automation.         (line 1508)
* mtn automate packet_for_fdata ID <1>:  Automation.         (line 1564)
* mtn automate packet_for_fdata ID:      Packet I/O.         (line   26)
* mtn automate packet_for_fdelta FROM-ID TO-ID: Automation.  (line 1592)
* mtn automate packet_for_fdelta ID1 ID2: Packet I/O.        (line   34)
* mtn automate packet_for_rdata ID <1>:  Automation.         (line 1478)
* mtn automate packet_for_rdata ID:      Packet I/O.         (line   27)
* mtn automate packets_for_certs ID:     Packet I/O.         (line   20)
* mtn automate parents REV:              Automation.         (line  132)
* mtn automate put_file [BASE-ID] CONTENTS: Automation.      (line 1747)
* mtn automate put_revision REVISION-DATA: Automation.       (line 1773)
* mtn automate roots:                    Automation.         (line  391)
* mtn automate select SELECTOR:          Automation.         (line  512)
* mtn automate stdio:                    Automation.         (line  853)
* mtn automate tags [BRANCH_PATTERN]:    Automation.         (line  442)
* mtn automate toposort [REV1 [REV2 [...]]]: Automation.     (line  292)
* mtn cat --revision=ID PATH:            Tree.               (line    7)
* mtn cat PATH:                          Tree.               (line    6)
* mtn cert ID CERTNAME:                  Certificate.        (line    6)
* mtn cert ID CERTNAME CERTVAL:          Certificate.        (line    7)
* mtn checkout --revision=ID DIRECTORY:  Tree.               (line   17)
* mtn ci:                                Workspace.          (line   88)
* mtn ci --message-file=LOGFILE:         Workspace.          (line   92)
* mtn ci --message-file=LOGFILE PATHNAME...: Workspace.      (line   98)
* mtn ci --message=LOGMSG [--message=LOGMSG...]: Workspace.  (line   90)
* mtn ci --message=LOGMSG [--message=LOGMSG...] PATHNAME...: Workspace.
                                                             (line   96)
* mtn ci PATHNAME...:                    Workspace.          (line   94)
* mtn clone --branch=BRANCHNAME ADDRESS DIRECTORY: Tree.     (line   54)
* mtn co --revision=ID DIRECTORY:        Tree.               (line   18)
* mtn comment ID:                        Certificate.        (line   20)
* mtn comment ID COMMENT:                Certificate.        (line   21)
* mtn commit:                            Workspace.          (line   86)
* mtn commit --message-file=LOGFILE:     Workspace.          (line   91)
* mtn commit --message-file=LOGFILE PATHNAME...: Workspace.  (line   97)
* mtn commit --message=LOGMSG [--message=LOGMSG...]: Workspace.
                                                             (line   89)
* mtn commit --message=LOGMSG [--message=LOGMSG...] PATHNAME...: Workspace.
                                                             (line   95)
* mtn commit PATHNAME...:                Workspace.          (line   93)
* mtn complete [--brief] key PARTIAL-ID: Informative.        (line   82)
* mtn complete [--brief] revision PARTIAL-ID: Informative.   (line   83)
* mtn complete file PARTIAL-ID:          Informative.        (line   81)
* mtn cvs_import PATHNAME:               RCS.                (line   13)
* mtn db check --db=DBFILE:              Database.           (line   50)
* mtn db dump --db=DBFILE:               Database.           (line   25)
* mtn db execute SQL-STATEMENT:          Database.           (line  234)
* mtn db info --db=DBFILE:               Database.           (line   17)
* mtn db init --db=DBFILE:               Database.           (line   14)
* mtn db kill_branch_certs_locally BRANCH: Database.         (line  208)
* mtn db kill_rev_locally ID:            Database.           (line  178)
* mtn db kill_tag_locally TAG:           Database.           (line  222)
* mtn db load --db=DBFILE:               Database.           (line   32)
* mtn db migrate --db=DBFILE:            Database.           (line   41)
* mtn db version --db=DBFILE:            Database.           (line   21)
* mtn diff --context [--show-encloser]:  Informative.        (line  111)
* mtn diff --external [--diff-args=ARGSTRING]: Informative.  (line  112)
* mtn diff --revision=ID:                Informative.        (line  114)
* mtn diff --revision=ID PATHNAME...:    Informative.        (line  115)
* mtn diff --revision=ID1 --revision=ID2: Informative.       (line  116)
* mtn diff --revision=ID1 --revision=ID2 PATHNAME...: Informative.
                                                             (line  117)
* mtn diff [--unified] [--show-encloser]: Informative.       (line  110)
* mtn diff PATHNAME...:                  Informative.        (line  113)
* mtn disapprove ID:                     Tree.               (line   63)
* mtn drop --missing:                    Workspace.          (line   45)
* mtn dropkey KEYID:                     Key and Cert Trust. (line   23)
* mtn explicit_merge ID ID DESTBRANCH:   Tree.               (line  119)
* mtn genkey KEYID:                      Key and Cert Trust. (line    6)
* mtn heads --branch=BRANCHNAME:         Tree.               (line   80)
* mtn import --branch=BRANCH [--message=MESSAGE] [--dry-run] DIR: Tree.
                                                             (line  147)
* mtn import --revision=REVISION [--message=MESSAGE] [--dry-run] DIR: Tree.
                                                             (line  149)
* mtn list branches:                     Informative.        (line  241)
* mtn list certs ID:                     Informative.        (line  178)
* mtn list changed:                      Informative.        (line  331)
* mtn list changed PATHNAME...:          Informative.        (line  334)
* mtn list ignored:                      Informative.        (line  296)
* mtn list ignored PATHNAME...:          Informative.        (line  299)
* mtn list keys:                         Informative.        (line  228)
* mtn list keys PATTERN:                 Informative.        (line  231)
* mtn list known:                        Informative.        (line  261)
* mtn list known PATHNAME...:            Informative.        (line  264)
* mtn list missing:                      Informative.        (line  314)
* mtn list missing PATHNAME...:          Informative.        (line  317)
* mtn list tags:                         Informative.        (line  246)
* mtn list unknown:                      Informative.        (line  279)
* mtn list unknown PATHNAME...:          Informative.        (line  282)
* mtn list vars:                         Informative.        (line  251)
* mtn list vars DOMAIN:                  Informative.        (line  255)
* mtn log:                               Informative.        (line   29)
* mtn log [--last=N] [--next=N] [--from=ID [...]] [--to=ID [...]] [--brief] [--no-merges] [--no-files] [--diffs] [FILE [...]]: Informative.
                                                             (line   30)
* mtn ls branches:                       Informative.        (line  243)
* mtn ls certs ID:                       Informative.        (line  180)
* mtn ls changed:                        Informative.        (line  333)
* mtn ls changed PATHNAME...:            Informative.        (line  335)
* mtn ls ignored:                        Informative.        (line  298)
* mtn ls ignored PATHNAME...:            Informative.        (line  300)
* mtn ls keys:                           Informative.        (line  230)
* mtn ls keys PATTERN:                   Informative.        (line  232)
* mtn ls known:                          Informative.        (line  263)
* mtn ls known PATHNAME...:              Informative.        (line  265)
* mtn ls missing:                        Informative.        (line  316)
* mtn ls missing PATHNAME...:            Informative.        (line  318)
* mtn ls tags:                           Informative.        (line  248)
* mtn ls unknown:                        Informative.        (line  281)
* mtn ls unknown PATHNAME...:            Informative.        (line  283)
* mtn ls vars:                           Informative.        (line  253)
* mtn ls vars DOMAIN:                    Informative.        (line  257)
* mtn merge [--branch=BRANCHNAME]:       Tree.               (line   91)
* mtn merge_into_dir SOURCEBRANCH DESTBRANCH DIR: Tree.      (line  128)
* mtn passphrase ID:                     Key and Cert Trust. (line   29)
* mtn pivot_root [--bookkeep-only] pivot_root NEW_ROOT PUT_OLD: Workspace.
                                                             (line  279)
* mtn pluck --revision=FROM --revision=TO: Workspace.        (line  237)
* mtn pluck --revision=TO:               Workspace.          (line  236)
* mtn privkey KEYID:                     Packet I/O.         (line   40)
* mtn propagate SOURCEBRANCH DESTBRANCH: Tree.               (line  104)
* mtn pubkey KEYID:                      Packet I/O.         (line   41)
* mtn pull [--set-default] [URI-OR-ADDRESS] [GLOB [...] [--exclude=EXCLUDE-GLOB]]]: Network.
                                                             (line    9)
* mtn push [--set-default] [URI-OR-ADDRESS] [GLOB [...] [--exclude=EXCLUDE-GLOB]]]: Network.
                                                             (line   10)
* mtn rcs_import FILENAME...:            RCS.                (line    6)
* mtn read:                              Packet I/O.         (line   46)
* mtn read FILE1 FILE2...:               Packet I/O.         (line   48)
* mtn refresh_inodeprints:               Workspace.          (line  272)
* mtn revert --missing PATHNAME...:      Workspace.          (line  180)
* mtn revert PATHNAME...:                Workspace.          (line  179)
* mtn serve --stdio [--no-transport-auth]: Network.          (line    8)
* mtn serve [--bind=[ADDRESS][:PORT]]:   Network.            (line    6)
* mtn set DOMAIN NAME VALUE:             Database.           (line    6)
* mtn setup [DIRECTORY]:                 Workspace.          (line    6)
* mtn show_conflicts REV REV:            Informative.        (line  349)
* mtn ssh_agent_export FILENAME:         Key and Cert Trust. (line   41)
* mtn status:                            Informative.        (line    6)
* mtn status PATHNAME...:                Informative.        (line    7)
* mtn sync [--set-default] [URI-OR-ADDRESS] [GLOB [...] [--exclude=EXCLUDE-GLOB]]]: Network.
                                                             (line   11)
* mtn tag ID TAGNAME:                    Certificate.        (line   25)
* mtn testresult ID 0:                   Certificate.        (line   33)
* mtn testresult ID 1:                   Certificate.        (line   34)
* mtn trusted ID CERTNAME CERTVAL SIGNERS: Key and Cert Trust.
                                                             (line   33)
* mtn unset DOMAIN NAME:                 Database.           (line   10)
* mtn update:                            Workspace.          (line  193)
* mtn update --revision=REVISION:        Workspace.          (line  194)
* note_commit (NEW_ID, REVISION, CERTS): Hooks.              (line   20)
* note_mtn_startup (...):                Hooks.              (line  151)
* note_netsync_cert_received (REV_ID, KEY, NAME, VALUE, SESSION_ID): Hooks.
                                                             (line   79)
* note_netsync_end (SESSION_ID, STATUS,: Hooks.              (line   98)
* note_netsync_pubkey_received (KEYNAME, SESSION_ID): Hooks. (line   90)
* note_netsync_revision_received (NEW_ID, REVISION, CERTS, SESSION_ID): Hooks.
                                                             (line   66)
* note_netsync_start (SESSION_ID, MY_ROLE, SYNC_TYPE,: Hooks.
                                                             (line   35)
* parse_basic_io(DATA):                  Additional Lua Functions.
                                                             (line   87)
* persist_phrase_ok ():                  Hooks.              (line  228)
* regex.search(REGEXP, STRING):          Additional Lua Functions.
                                                             (line  110)
* sleep(SECONDS):                        Additional Lua Functions.
                                                             (line  114)
* spawn(EXECUTABLE [, ARGS ...]):        Additional Lua Functions.
                                                             (line  118)
* spawn_pipe(EXECUTABLE [, ARGS ...]):   Additional Lua Functions.
                                                             (line  130)
* spawn_redirected(INFILE, OUTFILE, ERRFILE, EXECUTABLE [, ARGS ...]): Additional Lua Functions.
                                                             (line  136)
* use_inodeprints ():                    Hooks.              (line  241)
* use_transport_auth (URI):              Hooks.              (line  444)
* validate_commit_message (MESSAGE, REVISION_TEXT, BRANCHNAME): Hooks.
                                                             (line  765)
* wait(PID):                             Additional Lua Functions.
                                                             (line  140)