SQL Tables ---------- These mostly parallel the data structures in the gnc engine. See table-create.sql for more info. General Theory of Operation --------------------------- The entire backend is built upon the premise that the local gnucash process acts only as a 'cache' of (some of) the 'true' data in a remote SQL database. The motivation of this design is to allow the local process to quickly access and manipulate data (because it is local), without loosing the benefits of a remote, shared, persistent storage server (the sql database). If the code seems complex, that is because it is expending a considerable amount of effort in trying to maintain the local cache consistent with the remote store. In the following text (and in the notes in the source code), "the engine" is used as a synonym for "the cache", and always refers to the local data running in the local process. Session Table, Session Modes ---------------------------- There are four basic modes for accessing the database: "Single User File Mode", "Single User Update Mode", "Polled Multi-User" and "Event-Driven Multi-User". The session table in the database indicates which mode the database is functioning. -- "Single User File Mode" -- mode=single-file Only one user can have access to the database at a time. The database is used as a glorified file: engine data is 'saved' to the database only when the user selects 'save' from the GUI dialog. Just as it would be for a file, this mode erases the contents of the database and over-writes it with the new data. This mode exists because it is easy to implement, easy to debug, and has reasonable demands on the database for small datasets. It is not efficient for large datasets, and has a certain amount of risk in its use: because it erases the old data before writing the new data, it is prone to any problems during the 'save'. For example, if the database crashes during the write (very unlikely), data could be lost. If GnuCash has some certain kind of bug (possible), not all of the data may be saved. If GnuCash crashes before a save, then any changes since the last save would be lost. This mode is mutually exclusive of any other access mode: a lockout mechanism prevents other users from accessing the database while it is open in single-user mode. -- "Single User Update Mode" -- mode=single-update Only one user can have access to the database at a time. Updates to data are stored in the DB as they are made in the GUI; there is no need to perform a mass-save at the end of a session. When a database is opened in this mode, *all* of the database data is copied from the database into the local engine. This might result in the gnucash process getting quite large, especially if the database is large. This mode is more robust in that there is minimal/no data loss in the event of a crash. If the database needs to be accessed in single-user mode, this is the preferred mode to use. -- "Multi-User Polled" -- mode=multi-user-poll Multiple users are assumed, GnuCash polls the database to detect changes in the data. Data storage is incremental, essentially the same as in the single-update mode. Data fetch is also incremental: only the data that is immediately being viewed/edited is pulled into the local engine. The idea is that the SQL database may be quite large, but that only some smaller amount of data will be viewed/edited by a user at any given time. Thus, we save on network traffic and local memory usage by only pulling in the data that we need. Note that 'old' data is not purged, so if a gnucash session runs for a long time in this mode, it may pull in significant amounts of the dataset, and thus grow quite large. Multiple users may simultaneously edit the data in this mode. However, in this mode, changes made by remote users do not 'automatically' show up as they are made. To see other users changes, one has to (for example) close a register window, and reopen it. The local cache of data is synced with the remote database on an as-needed basis: for example, if the local user tries to edit the same transaction that some remote user has modified, then the local copy of transaction will be updated, and the user warned that a remote change has occurred. -- "Multi-User Events" (Default Mode) -- mode=multi-user Multiple users are assumed. This mode works essentially the same way as the multi-user-poll mode, except that (asynchronous) events are delivered to the local process when remote users make changes. This will cause the local GUI display to automatically update when remote users make changes. (In the current design, there is a 10 second delay between updates). This automatic update puts slightly more burden on the local process, on the network, and on the SQL server. Thus, it won't scale well with lots of users. We don't know where that limit is; we are guessing its somewhere between 5 and 20 simultaneous users. Depends on the power of the DB server, of course. Safety Lockout -------------- There is a safety lockout that prevents a database that is currently open in single-user mode from being opened by a second user at the same time. Similarly, a database that is currently open in multi-user mode cannot be opened in single-user mode. Note, however, that switching modes after all users have logged off is perfectly safe: If all the multi-users log off, then the database can be opened in single-user mode (and vice-versa). Logoff happens 'automatically' when a user exits gnucash. If gnucash dies or is killed, a user might be left logged on. If it dies in multi-user mode, it may seem that many users are still logged on; this will prevent the database from subsequently being opened in single-user mode. To force a log-off, you can issue the following command: echo "UPDATE gncsession SET time_off='NOW' WHERE time_off = 'infinity';" | psql dbname Just be sure that all users really are logged off when you do this. gncSession Table ---------------- Shows logon, logoff time, login name and the users 'true name', and the FQDN hostname. Session Design Notes -------------------- The pgendSyncSingleFile() subroutine performs the equivalent of 'file save'. Note that it does this by deleting the entire contents of the database, and then writing out the entire contents of the engine. It works this way (needs to work this way) in order to make sure that deleted transactions,etc. are really deleted from the database. This is because in this mode, the backend never finds out about deletions. If you want incremental deletion, then use the 'Single Update' mode. Connecting to Postgres ---------------------- The Postgres API requires a database to connect to. The initial connect is made using the "template1" database, which is the default database that is always created when Postgres is installed. Thus, we assume its always present. m4 macros --------- Some of the code is auto-gen'ed from m4 macros. This mostly just simplifies some rather repetitive, cut-n-paste code that's identical from function to function. If you can think of a better way, let me know. String escapes -------------- The GUI and the engine support all any characters within a string; however, in SQL some characters are reserved. These reserved characters are escaped in builder.c routine sqlBuilder_escape() before storage. These convert single-quotes and backslashes to escaped quotes & backslashes to prevent SQL corruption. KVP frames ---------- Storage of KVP values in the sql database is treated more or less as described in the main KVP docs. The hierarchical structure is converted into 'paths' by concatenating key names, and using / as the separator. (Thus, paths look like file-paths). The root of each frame is associated with a guid (and thus, a url kvp://12341234/some/kvp/keys, where 12341234 is the guid). The implementation caches the paths, associating a 32-bit inode number with each path. Caching is done because the same path names will recur frequently for different guids (e.g. /reconcile-info/last-date will occur in most accounts). The implementation also caches guids (associating a unique 32-bit int with each), although the utility of this is a bit dubious. But hey, it works. It saves a little bit of storage. The actual values are stored in one of 6 different tables, depending on the type. Note that the binary type and the glist type are not currently implemented. The glist type could be implemented, as long as the glist only stored strings ... The binary type could be implemented with blobs. Version Numbers --------------- Both the Account structure, and the Transaction structure, have version numbers in them. These are used to compare the sql and the engine contents, and update the one or the other as appropriate. Version numbers would not be necessary for single-user access, but are important for multi-user access, where several engines must be kept in sync with the database contents. An alternative to version numbers might have been the date of the last update. However, version numbers are better than dates in the case where the engines reside on machines whose clocks are not closely synchronized. (e.g. which may happen if the machines are not using NTP for time synchronization; or, e.g. if one machine failed to have daylight-savings time set correctly: its transactions would be an hour newer/older than the others, leading to bad updates). /* The pgendAccountCompareVersion() routine compares the version * number of the account in the engine and the sql database. It * returns a negative number if the sql version is older (or the * account is not present in the sql db). It returns a positive * number if the sql version is newer. It returns zero if the * two are equal. */ Version numbers need to be written to XML file Version Timestamp ----------------- The engine is structured so that whenever the GUI issues a query, that query is passed to the backend. (The engine assumes the worst: that the engine data cache is out of date and needs to be upgraded.) Unfortunately, the GUI frequently queries for the same data several times in rapid succession. If this is taken at face value, then multiple redundant queries to the SQL server occur in quick succession. The version_check field is used to minimize these redundant queries. It stores a timestamp; if the timestamp is less than 10 seconds old (MAX_VERSION_AGE), then the backend assumes that the engine data is sufficiently recent, and the sql query is skipped. Under certain situations, a considerable amount of querying can be avoided. Audit Trails ------------ The backend keeps an audit trail of created, modified and deleted transactions. These are stored in s set of tables inheriting from the gncAuditTrail table. Identified are the date of modification, the the session id, and the type of change. There is currently no support within the GUI to view the audit trails, nor is there any way to rollback to some previous state based on this trail. The audit trails are used internally to resolve certain multi-user editing conflicts, for example, when one user attempts to edit a transaction that has been deleted by another. They are also used to discover changes that other users have made, and thus update the local GUI display. This is done in the pgendProcessEvents() routine and specifically, in the get_event_cb() callback. Balances -------- The GUI displays a running balance in the register display. When the engine has a copy of all data, this is easy to compute. However, if the dataset is large, then we don't want the engine to have a copy of all of the data; we want to leave the bulk of it in the database. However, that presents a problem for computing the running balances. We could store a running balance with each journal entry. However, this has the potential of making balance updates slow: potentially a lot of entries would need to be updated. As an alternate technique, we store running balances in a set of 'checkpoints', each showing a subtotal balance for a date interval. Unfortunately, there is quite a bit of machinery that needs to be implemented in order to make this effective. In order Account balances can be computed using advanced SQL statements. The basic idea looks like this: UPDATE checkpoint SET balance = (SELECT sum(expr) from .... WHERE txdate between today - 7 and today) The above is not a valid SQL statement; below is one that actually works. Note that this statement updates *all* checkpoints for the indicated accountguid. UPDATE gnccheckpoint SET balance = (SELECT sum(gncentry.amount) FROM gncentry, gnctransaction WHERE gncentry.accountguid = gnccheckpoint.accountguid AND gncentry.transguid = gnctransaction.transguid AND gnctransaction.date_posted BETWEEN date_xpoint AND date_xpoint + 360 ) WHERE accountguid='111'; Its a better to create a function that does the computation: CREATE FUNCTION gncsubtotal (char(32), datetime, datetime) RETURNS numeric AS 'SELECT sum(gncentry.amount) FROM gncentry, gnctransaction WHERE gncentry.accountguid = $1 AND gncentry.transguid = gnctransaction.transguid AND gnctransaction.date_posted BETWEEN $2 AND $3' LANGUAGE 'sql'; and use it like this: UPDATE gnccheckpoint SET balance = (gncsubtotal (accountGuid, date_start, date_end )) WHERE accountguid='4c9cad7be044559705988c63ea7affc5'; We can find dates for creating checkpoints like so: SELECT gnctransaction.date_posted FROM gnctransaction, gncentry WHERE gncentry.transguid = gnctransaction.transguid AND gncentry.accountguid='4c9cad7be044559705988c63ea7affc5' ORDER BY gnctransaction.date_posted ASC LIMIT 2 OFFSET 10; ---------------------------- Upgrading --------- If you find you need to change the structure of the sql tables, then note that there is a fully general automatic upgrade mechanism. Its in upgrade.c, upgrade.h. Upgrades are treated like patches; the sql db is 'patched'. Each patch carries three version numbers: major, minor, rev. A client can work with a database only if the client's & database's major verion numbers are equal, and the client's minor number is equal or newer than the db. The third number, the rev number, is irrelevant if the above condition is met. The rev number is handy only for making sure that patches are applied in a specified order. The gncVersion table stores these three numbers. It also stores a human-readable text string, so that a sysadmin can review the installed upgrades. Most of the contents of of upgrade.c is a mechanism to make sure that 'ad hoc' upgrades are installed in the appropriate order; i.e. that the upgrade process stays 'safe' and backwards-compatible. The login process in PostegresBackend.c is structured so that older databases are detected: the GUI should pop up a message asking the user if they want to upgrade or not. If the user wants to upgrade, then the pgendUpgradeDB() routine is to be called. This routine is a set of nested case statements that compare version numbers, and apply patches as needed. As of this writing, there is only one upgrade: 'put_iguid_in_tables()'. Note how 'put_iguid_in_tables()' is written in a simple ad-hoc manner. That's because everything else in upgrade.c is focused on trying to figure out when its appropriate to call 'put_iguid_in_tables()'. Other upgrades should follow this same pattern: create the adhoc routine, and plug it into a case statement in the pgendUpgradeDB() call. Everything else is automatic. This upgrade process only applies to newer clients connecting to older databases. Otherise, if the client is creating a fresh, brand new db, then one does not need to upgrade: put the newest db design into table-create.sql, as usual, and stick in the version number into table-version.sql ----------------------------------------------------------------------- End of Document.