PostgreSQL Backend Directories

by Bruce Momjian

Click on any of the section headings to see the source code for that section.

bootstrap - creates initial template database via initdb

Because PostgreSQL requires access to system tables for almost every operation, getting those system tables in place is a problem. You can't just create the tables and insert data into them in the normal way, because table creation and insertion requires the tables to already exist. This code jams the data directly into tables using a special syntax used only by the bootstrap procedure.

main - passes control to postmaster or postgres

This checks the process name(argv[0]) and various flags, and passes control to the postmaster or postgres backend code.

postmaster - controls postgres server startup/termination

This creates shared memory, and then goes into a loop waiting for connection requests. When a connection request arrives, a postgres backend is started, and the connection is passed to it.

libpq - backend libpq library routines

This handles communication to the client processes.

tcop - traffic cop, dispatches request to proper module

This contains the postgres backend main handler, as well as the code that makes calls to the parser, optimizer, executor, and /commands functions.

parser - converts SQL query to query tree

This converts SQL queries coming from libpq into command-specific structures to be used the the optimizer/executor, or /commands routines. The SQL is lexically analyzed into keywords, identifiers, and constants, and passed to the parser. The parser creates command-specific structures to hold the elements of the query. The command-specific structures are then broken apart, checked, and passed to /commands processing routines, or converted into Lists of Nodes to be handled by the optimizer and executor.

optimizer - creates path and plan

This uses the parser output to generate an optimal plan for the executor.

optimizer/path - creates path from parser output

This takes the parser query output, and generates all possible methods of executing the request. It examines table join order, where clause restrictions, and optimizer table statistics to evaluate each possible execution method, and assigns a cost to each.

optimizer/geqo - genetic query optimizer

optimizer/path evaluates all possible ways to join the requested tables. When the number of tables becomes great, the number of tests made becomes great too. The Genetic Query Optimizer considers each table separately, then figures the most optimal order to perform the join. For a few tables, this method takes longer, but for a large number of tables, it is faster. There is an option to control when this feature is used.

optimizer/plan - optimizes path output

This takes the optimizer/path output, chooses the path with the least cost, and creates a plan for the executor.

optimizer/prep - handle special plan cases

This does special plan processing.

optimizer/util - optimizer support routines

This contains support routines used by other parts of the optimizer.

executor - executes complex node plans from optimizer

This handles select, insert, update, and delete statements. The operations required to handle these statement types include heap scans, index scans, sorting, joining tables, grouping, aggregates, and uniqueness.

commands - commands that do not require the executor

These process SQL commands that do not require complex handling. It includes vacuum, copy, alter, create table, create type, and many others. The code is called with the structures generated by the parser. Most of the routines do some processing, then call lower-level functions in the catalog directory to do the actual work.

catalog - system catalog manipulation

This contains functions that manipulate the system tables or catalogs. Table, index, procedure, operator, type, and aggregate creation and manipulation routines are here. These are low-level routines, and are usually called by upper routines that pre-format user requests into a predefined format.

storage - manages various storage systems

These allow uniform resource access by the backend.

storage/buffer - shared buffer pool manager
storage/file - file manager
storage/ipc - semaphores and shared memory
storage/large_object - large objects
storage/lmgr - lock manager
storage/page - page manager
storage/smgr - storage/disk manager

access - various data access methods

These control the way data is accessed in heap, indexes, and transactions.

access/common - common access routines
access/gist - easy-to-define access method system
access/hash - hash
access/heap - heap is use to store data rows
access/index - used by all index types
access/nbtree - Lehman and Yao's btree management algorithm
access/rtree - used for indexing of 2-dimensional data
access/transam - transaction manager (BEGIN/ABORT/COMMIT)

nodes - creation/manipulation of nodes and lists

PostgreSQL stores information about SQL queries in structures called nodes. Nodes are generic containers that have a type field and then a type-specific data section. Nodes are usually placed in Lists. A List is container with an elem element, and a next field that points to the next List. These List structures are chained together in a forward linked list. In this way, a chain of Lists can contain an unlimited number of Node elements, and each Node can contain any data type. These are used extensively in the parser, optimizer, and executor to store requests and data.

utils - support routines

utils/adt - built-in data type routines

This contains all the PostgreSQL builtin data types.

utils/cache - system/relation/function cache routines

PostgreSQL supports arbitrary data types, so no data types are hard-coded into the core backend routines. When the backend needs to find out about a type, is does a lookup of a system table. Because these system tables are referred to often, a cache is maintained that speeds lookups. There is a system relation cache, a function/operator cache, and a relation information cache. This last cache maintains information about all recently-accessed tables, not just system ones.

utils/error - error reporting routines

Reports backend errors to the front end.

utils/fmgr - function manager

This handles the calling of dynamically-loaded functions, and the calling of functions defined in the system tables.

utils/hash - hash routines for internal algorithms

These hash routines are used by the cache and memory-manager routines to do quick lookups of dynamic data storage structures maintained by the backend.

utils/init - various initialization stuff

utils/misc - miscellaneous stuff

utils/mmgr - memory manager(process-local memory)

When PostgreSQL allocates memory, it does so in an explicit context. Contexts can be statement-specific, transaction-specific, or persistent/global. By doing this, the backend can easily free memory once a statement or transaction completes.

utils/sort - sort routines for internal algorithms

When statement output must be sorted as part of a backend operation, this code sorts the tuples, either in memory or using disk files.

utils/time - transaction time qualification routines

These routines do checking of tuple internal columns to determine if the current row is still valid, or is part of a non-committed transaction or superseded by a new row.

include - include files

There are include directories for each subsystem.

lib - support library

This houses several generic routines.

regex - regular expression library

This is used for regular expression handling in the backend, i.e. '~'.

rewrite - rules system

This does processing for the rules system.

tioga - unused (array handling?)

Maintainer: Bruce Momjian (pgman@candle.pha.pa.us)
Last updated: Tue Dec 9 17:56:08 EST 1997