SQL Tables
----------
These mostly parallel the data structures in the gnc engine.
See table-create.sql for more info.

General Theory of Operation
---------------------------
The entire backend is built upon the premise that the local
gnucash process acts only as a 'cache' of (some of) the
'true' data in a remote SQL database.  The motivation of 
this design is to allow the local process to quickly access
and manipulate data (because it is local), without loosing
the benefits of a remote, shared, persistent storage 
server (the sql database).  If the code seems complex,
that is because it is expending a considerable amount of
effort in trying to maintain the local cache consistent
with the remote store.  In the following text (and in the 
notes in the source code), "the engine" is used as a 
synonym for "the cache", and always refers to the 
local data running in the local process.


Session Table, Session Modes
----------------------------
There are four basic modes for accessing the database: "Single User
File Mode", "Single User Update Mode", "Polled Multi-User" and
"Event-Driven Multi-User".  The session table in the database
indicates which mode the database is functioning.

-- "Single User File Mode" -- 
   mode=single-file
   Only one user can have access to the database at a time.  The
   database is used as a glorified file: engine data is 'saved' to
   the database only when the user selects 'save' from the GUI
   dialog.  Just as it would be for a file, this mode erases the
   contents of the database and over-writes it with the new data. 

   This mode exists because it is easy to implement, easy 
   to debug, and has reasonable demands on the database for small
   datasets.  It is not efficient for large datasets, and has 
   a certain amount of risk in its use: because it erases the 
   old data before writing the new data, it is prone to any
   problems during the 'save'. For example, if the database
   crashes during the write (very unlikely), data could be lost.
   If GnuCash has some certain kind of bug (possible), not all
   of the data may be saved.  If GnuCash crashes before a save,
   then any changes since the last save would be lost.

   This mode is mutually exclusive of any other access mode:
   a lockout mechanism prevents other users from accessing the 
   database while it is open in single-user mode.

-- "Single User Update Mode" -- 
   mode=single-update
   Only one user can have access to the database at a time.  Updates 
   to data are stored in the DB as they are made in the GUI; there
   is no need to perform a mass-save at the end of a session.  

   When a database is opened in this mode, *all* of the database
   data is copied from the database into the local engine.  This
   might result in the gnucash process getting quite large, 
   especially if the database is large.

   This mode is more robust in that there is minimal/no data loss in 
   the event of a crash.  If the database needs to be accessed in
   single-user mode, this is the preferred mode to use.

-- "Multi-User Polled" -- 
   mode=multi-user-poll
   Multiple users are assumed, GnuCash polls the database to detect 
   changes in the data.   Data storage is incremental, essentially
   the same as in the single-update mode.  Data fetch is also
   incremental: only the data that is immediately being viewed/edited
   is pulled into the local engine.  The idea is that the SQL database
   may be quite large, but that only some smaller amount of data will
   be viewed/edited by a user at any given time.  Thus, we save 
   on network traffic and local memory usage by only pulling in 
   the data that we need. Note that 'old' data is not purged, so
   if a gnucash session runs for a long time in this mode, it may
   pull in significant amounts of the dataset, and thus grow quite 
   large.

   Multiple users may simultaneously edit the data in this mode.
   However, in this mode, changes made by remote users do not
   'automatically' show up as they are made.  To see other users
   changes, one has to (for example) close a register window, and
   reopen it.  The local cache of data is synced with the remote
   database on an as-needed basis:  for example, if the local
   user tries to edit the same transaction that some remote user
   has modified, then the local copy of transaction will be updated, 
   and the user warned that a remote change has occurred. 

-- "Multi-User Events" (Default Mode) --
   mode=multi-user
   Multiple users are assumed.  This mode works essentially the
   same way as the multi-user-poll mode, except that (asynchronous) 
   events are delivered to the local process when remote users 
   make changes.  This will cause the local GUI display to automatically
   update when remote users make changes.  (In the current design,
   there is a 10 second delay between updates).

   This automatic update puts slightly more burden on the local
   process, on the network, and on the SQL server.  Thus, it won't
   scale well with lots of users.   We don't know where that limit
   is; we are guessing its somewhere between 5 and 20 simultaneous 
   users.  Depends on the power of the DB server, of course.


Safety Lockout
--------------
There is a safety lockout that prevents a database that is 
currently open in single-user mode from being opened by a 
second user at the same time.  Similarly, a database that
is currently open in multi-user mode cannot be opened in
single-user mode.

Note, however, that switching modes after all users have
logged off is perfectly safe:  If all the multi-users
log off, then the database can be opened in single-user mode
(and vice-versa).  Logoff happens 'automatically' when 
a user exits gnucash.

If gnucash dies or is killed, a user might be left logged
on.  If it dies in multi-user mode, it may seem that many
users are still logged on; this will prevent the database
from subsequently being opened in single-user mode.

To force a log-off, you can issue the following command:

echo "UPDATE gncsession SET time_off='NOW' WHERE time_off = 'infinity';" | psql dbname

Just be sure that all users really are logged off when
you do this.


gncSession Table
----------------
Shows logon, logoff time, login name and the users 'true name',
and the FQDN hostname.

Session Design Notes
--------------------
The pgendSyncSingleFile() subroutine performs the equivalent of 'file
save'.  Note that it does this by deleting the entire contents of the 
database, and then writing out the entire contents of the engine.  It
works this way (needs to work this way) in order to make sure that
deleted transactions,etc. are really deleted from the database.  This
is because in this mode, the backend never finds out about deletions.
If you want incremental deletion, then use the 'Single Update' mode.


Connecting to Postgres
----------------------
The Postgres API requires a database to connect to.  The initial
connect is made using the "template1" database, which is the default
database that is always created when Postgres is installed.  Thus,
we assume its always present.


m4 macros
---------
Some of the code is auto-gen'ed from m4 macros.  This mostly just
simplifies some rather repetitive, cut-n-paste code that's identical
from function to function.  If you can think of a better way, let me 
know.


String escapes
--------------
The GUI and the engine support all any characters within a string; however,
in SQL some characters are reserved.  These reserved characters are escaped
in builder.c routine sqlBuilder_escape() before storage.  These convert
single-quotes and backslashes to escaped quotes & backslashes to prevent
SQL corruption.


KVP frames
----------
Storage of KVP values in the sql database is treated more or less as
described in the main KVP docs.  The hierarchical structure is converted
into 'paths' by concatenating key names, and using / as the separator.
(Thus, paths look like file-paths).  The root of each frame is
associated with a guid (and thus, a url kvp://12341234/some/kvp/keys, 
where 12341234 is the guid).

The implementation caches the paths, associating a 32-bit inode number
with each path.  Caching is done because the same path names will recur 
frequently for different guids (e.g. /reconcile-info/last-date will
occur in most accounts). 

The implementation also caches guids (associating a unique 32-bit int
with each), although the utility of this is a bit dubious.  But hey, it
works.  It saves a little bit of storage. 

The actual values are stored in one of 6 different tables, depending on
the type.  Note that the binary type and the glist type are not currently
implemented.  The glist type could be implemented, as long as the glist
only stored strings ... The binary type could be implemented with blobs.


Version Numbers
---------------
Both the Account structure, and the Transaction structure, have version
numbers in them.  These are used to compare the sql and the engine
contents, and update the one or the other as appropriate.  Version
numbers would not be necessary for single-user access, but are important
for multi-user access, where several engines must be kept in sync with
the database contents.   An alternative to version numbers might have
been the date of the last update.  However, version numbers are better
than dates in the case where the engines reside on machines whose clocks
are not closely synchronized.  (e.g. which may happen if the machines
are not using NTP for time synchronization; or, e.g. if one machine failed 
to have daylight-savings time set correctly: its transactions would be 
an hour newer/older than the others, leading to bad updates).

/* The pgendAccountCompareVersion() routine compares the version
 * number of the account in the engine and the sql database. It
 * returns a negative number if the sql version is older (or the
 * account is not present in the sql db). It returns a positive
 * number if the sql version is newer.  It returns zero if the
 * two are equal.
 */

Version numbers need to be written to XML file

Version Timestamp
-----------------
The engine is structured so that whenever the GUI issues a query, that 
query is passed to the backend. (The engine assumes the worst: that
the engine data cache is out of date and needs to be upgraded.)  
Unfortunately, the GUI frequently queries for the same data several 
times in rapid succession.  If this is taken at face value, then 
multiple redundant queries to the SQL server occur in quick succession.

The version_check field is used to minimize these redundant queries.
It stores a timestamp; if the timestamp is less than 10 seconds old
(MAX_VERSION_AGE), then the backend assumes that the engine data is
sufficiently recent, and the sql query is skipped.  Under certain 
situations, a considerable amount of querying can be avoided.


Audit Trails
------------
The backend keeps an audit trail of created, modified and deleted
transactions.  These are stored in s set of tables inheriting from
the gncAuditTrail table.  Identified are the date of modification,
the the session id, and the type of change.  

There is currently no support within the GUI to view the audit trails,
nor is there any way to rollback to some previous state based on this 
trail.

The audit trails are used internally to resolve certain multi-user 
editing conflicts, for example, when one user attempts to edit a 
transaction that has been deleted by another.

They are also used to discover changes that other users have made, 
and thus update the local GUI display.  This is done in the 
pgendProcessEvents() routine and specifically, in the get_event_cb()
callback.


Balances
--------
The GUI displays a running balance in the register display.  When the
engine has a copy of all data, this is easy to compute.  However, if
the dataset is large, then we don't want the engine to have a copy of 
all of the data; we want to leave the bulk of it in the database.
However, that presents a problem for computing the running balances.
We could store a running balance with each journal entry.  However,
this has the potential of making balance updates slow: potentially
a lot of entries would need to be updated.

As an alternate technique, we store running balances in a set of
'checkpoints', each showing a subtotal balance for a date interval.
Unfortunately, there is quite a bit of machinery that needs to be 
implemented in order to make this effective.


In order 
Account balances can be computed using advanced SQL statements.
The basic idea looks like this:

    UPDATE checkpoint
        SET balance = (SELECT sum(expr) from .... WHERE 
        txdate between today - 7 and today)

The above is not a valid SQL statement; below is one that actually
works.  Note that this statement updates *all* checkpoints for the 
indicated accountguid.

    UPDATE gnccheckpoint 
        SET balance = (SELECT sum(gncentry.amount) 
            FROM gncentry, gnctransaction
            WHERE
            gncentry.accountguid = gnccheckpoint.accountguid AND 
            gncentry.transguid = gnctransaction.transguid AND 
            gnctransaction.date_posted BETWEEN date_xpoint AND 
            date_xpoint + 360 )
        WHERE accountguid='111';

Its a better to create a function that does the computation:

    CREATE FUNCTION gncsubtotal (char(32), datetime, datetime)
        RETURNS numeric
        AS 'SELECT sum(gncentry.amount) 
            FROM gncentry, gnctransaction
            WHERE
            gncentry.accountguid = $1 AND
            gncentry.transguid = gnctransaction.transguid AND
            gnctransaction.date_posted BETWEEN $2 AND $3'
        LANGUAGE 'sql';

and use it like this:

    UPDATE gnccheckpoint
        SET balance = (gncsubtotal (accountGuid, date_start, date_end ))
        WHERE accountguid='4c9cad7be044559705988c63ea7affc5';

We can find dates for creating checkpoints like so:

SELECT gnctransaction.date_posted 
   FROM gnctransaction, gncentry
   WHERE
       gncentry.transguid = gnctransaction.transguid AND
       gncentry.accountguid='4c9cad7be044559705988c63ea7affc5'
   ORDER BY gnctransaction.date_posted ASC
   LIMIT 2 OFFSET 10;
   
----------------------------

Upgrading
---------
If you find you need to change the structure of the sql tables, then
note that there is a fully general automatic upgrade mechanism.  Its
in upgrade.c, upgrade.h.  

Upgrades are treated like patches; the sql db is 'patched'.  Each patch
carries three version numbers: major, minor, rev.  A client can work 
with a database only if the client's & database's major verion numbers 
are equal, and the client's minor number is equal or newer than the db.
The third number, the rev number, is irrelevant if the above condition 
is met.  The rev number is handy only for making sure that patches are 
applied in a specified order.

The gncVersion table stores these three numbers.  It also stores a 
human-readable text string, so that a sysadmin can review the installed
upgrades.

Most of the contents of of upgrade.c is a mechanism to make sure that 
'ad hoc' upgrades are installed in the appropriate order; i.e. that
the upgrade process stays 'safe' and backwards-compatible. The login 
process in PostegresBackend.c is structured so that older databases
are detected: the GUI should pop up a message asking the user if they
want to upgrade or not.

If the user wants to upgrade, then the pgendUpgradeDB() routine is 
to be called.  This routine is a set of nested case statements that 
compare version numbers, and apply patches as needed.  As of this 
writing, there is only one upgrade: 'put_iguid_in_tables()'.  Note 
how 'put_iguid_in_tables()' is written in a simple ad-hoc manner.  
That's because everything else in upgrade.c is focused on trying to 
figure out when its appropriate to call 'put_iguid_in_tables()'.  
Other upgrades should follow this same pattern: create the adhoc 
routine, and plug it into a case statement in the pgendUpgradeDB() 
call.  Everything else is automatic.

This upgrade process only applies to newer clients connecting to 
older databases.  Otherise, if the client is creating a fresh, brand 
new db, then one does not need to upgrade: put the newest db design
into table-create.sql, as usual, and stick in the version number 
into table-version.sql

-----------------------------------------------------------------------
End of Document.