Because PostgreSQL requires access to system tables for almost every operation, getting those system tables in place is a problem. You can't just create the tables and insert data into them in the normal way, because table creation and insertion requires the tables to already exist. This code jams the data directly into tables using a special syntax used only by the bootstrap procedure.
This checks the process name(argv[0]) and various flags, and passes control to the postmaster or postgres backend code.
This creates shared memory, and then goes into a loop waiting for connection requests. When a connection request arrives, a postgres backend is started, and the connection is passed to it.
This handles communication to the client processes.
This contains the postgres backend main handler, as well as the code that makes calls to the parser, optimizer, executor, and /commands functions.
This converts SQL queries coming from libpq into command-specific structures to be used the the optimizer/executor, or /commands routines. The SQL is lexically analyzed into keywords, identifiers, and constants, and passed to the parser. The parser creates command-specific structures to hold the elements of the query. The command-specific structures are then broken apart, checked, and passed to /commands processing routines, or converted into Lists of Nodes to be handled by the optimizer and executor.
This uses the parser output to generate an optimal plan for the executor.
This takes the parser query output, and generates all possible methods of executing the request. It examines table join order, where clause restrictions, and optimizer table statistics to evaluate each possible execution method, and assigns a cost to each.
optimizer/path evaluates all possible ways to join the requested tables. When the number of tables becomes great, the number of tests made becomes great too. The Genetic Query Optimizer considers each table separately, then figures the most optimal order to perform the join. For a few tables, this method takes longer, but for a large number of tables, it is faster. There is an option to control when this feature is used.
This takes the optimizer/path output, chooses the path with the least cost, and creates a plan for the executor.
This does special plan processing.
This contains support routines used by other parts of the optimizer.
This handles select, insert, update, and delete statements. The operations required to handle these statement types include heap scans, index scans, sorting, joining tables, grouping, aggregates, and uniqueness.
These process SQL commands that do not require complex handling. It includes vacuum, copy, alter, create table, create type, and many others. The code is called with the structures generated by the parser. Most of the routines do some processing, then call lower-level functions in the catalog directory to do the actual work.
This contains functions that manipulate the system tables or catalogs. Table, index, procedure, operator, type, and aggregate creation and manipulation routines are here. These are low-level routines, and are usually called by upper routines that pre-format user requests into a predefined format.
These allow uniform resource access by the backend.
storage/buffer
- shared buffer pool manager
storage/file
- file manager
storage/ipc
- semaphores and shared memory
storage/large_object
- large objects
storage/lmgr
- lock manager
storage/page
- page manager
storage/smgr
- storage/disk manager
These control the way data is accessed in heap, indexes, and
transactions.
access/common
- common access routines
access/gist
- easy-to-define access method system
access/hash
- hash
access/heap
- heap is use to store data rows
access/index
- used by all index types
access/nbtree
- Lehman and Yao's btree management algorithm
access/rtree
- used for indexing of 2-dimensional data
access/transam
- transaction manager (BEGIN/ABORT/COMMIT)
PostgreSQL stores information about SQL queries in structures called nodes. Nodes are generic containers that have a type field and then a type-specific data section. Nodes are usually placed in Lists. A List is container with an elem element, and a next field that points to the next List. These List structures are chained together in a forward linked list. In this way, a chain of Lists can contain an unlimited number of Node elements, and each Node can contain any data type. These are used extensively in the parser, optimizer, and executor to store requests and data.
This contains all the PostgreSQL builtin data types.
PostgreSQL supports arbitrary data types, so no data types are hard-coded into the core backend routines. When the backend needs to find out about a type, is does a lookup of a system table. Because these system tables are referred to often, a cache is maintained that speeds lookups. There is a system relation cache, a function/operator cache, and a relation information cache. This last cache maintains information about all recently-accessed tables, not just system ones.
Reports backend errors to the front end.
This handles the calling of dynamically-loaded functions, and the calling of functions defined in the system tables.
These hash routines are used by the cache and memory-manager routines to do quick lookups of dynamic data storage structures maintained by the backend.
When PostgreSQL allocates memory, it does so in an explicit context. Contexts can be statement-specific, transaction-specific, or persistent/global. By doing this, the backend can easily free memory once a statement or transaction completes.
When statement output must be sorted as part of a backend operation, this code sorts the tuples, either in memory or using disk files.
These routines do checking of tuple internal columns to determine if the current row is still valid, or is part of a non-committed transaction or superseded by a new row.
There are include directories for each subsystem.
This houses several generic routines.
This is used for regular expression handling in the backend, i.e. '~'.
This does processing for the rules system.