Getting Started with PostgreSQL
PostgreSQL is a commonly requested RDBMS, and for good reason. I'm not going into advocacy or even advantages, I shall merely explain the steps for what I consider to be a basic setup; it's not minimal by any means, but heavy usage, large datasets and special cases such as tablespaces will not be considered, neither will high availability or replication.
Installation
PostgreSQL is available in binary package form such as rpm or deb for most major linux distributions. If the version provided by your distribution is not new or sparkly enough for you, please build the latest and greatest with the mods and tweaks you desire and then make a package for it.
Basic Configuration
PostgreSQL requires its own user and cluster creation must be done by that user:
$ mkdir -p /var/pgsql/data; chown -R postgres /var/pgsql $ su - postgres $ initdb -D /var/pgsql/data
Starting, testing:
$ postgres -D /var/pgsql/data > /var/pgsql/logfile 2>&1 & $ createdb test $ psql test
And don't forget you probably want it to start upon reboot as well.
You'll probably want to change a few settings in postgresql.conf:
- shared_buffers
- work_mem
- maintenance_work_mem
- checkpoint_timeout
- You can see all settings with something like::
- SELECT name,setting,unit from pg_settings;
Consider also:
- putting logs/WAL on a different storage subsystem
- asynchronous commits with synchronous commit=off (not as risky as fsync=off, because you will never get corruption, although you are risking loss)
Tuning
As is common in the world of database administration, there are many knobs to fiddle; some of them can easily change the experience from "this sucks" to "wow" (or the other way around). But it can get rather complex, sometimes controversial and is usually highly dependent on your environment and usage situation, so I'm not going to go into it here. I'll merely list a few places I've looked when I've wanted more detail.
FreeBSD
The above steps are taken care of for you by the port; after you install the port, all you need is:
/usr/local/etc/rc.d/postgresql initdb /usr/local/etc/rc.d/postgresql start
CentOS
The Postgresql Development Group (PGDG) publishes yum repositories for up-to-date versions of postgresql for CentOS/RHEL, which makes it very easy to stay current:
rpm -Uvh http://yum.postgresql.org/9.1/redhat/rhel-6-x86_64/pgdg-centos91-9.1-4.noarch.rpm yum install postgresql91{,-libs,-devel,-server} service postgresql-9.1 initdb service postgresql-9.1 start
OSX
If you installed via macports:
/opt/local/etc/LaunchDaemons/org.macports.postgresql84-server/postgresql84-server.wrapper start sudo launchctl load -w /Library/LaunchDaemons/org.macports.postgresql84-server.plist
If you installed via homebrew:
brew install postgresql initdb /usr/local/var/postgres -E utf8 pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
User Administration
Authentication/Access
pg_hba.conf specifies permissible authentication methods for users -> databases from specified hosts/networks.
Roles/Privileges
Create a role for ordinary usage such as for a webapp and grant it privileges:
CREATE ROLE rolename WITH LOGIN NOSUPERUSER NOCREATEROLE ENCRYPTED PASSWORD 'clear_text_password' GRANT SELECT, INSERT, UPDATE, DELETE, REFERENCES ON tablename TO rolename
Roles allow nested/group privileges.
If you create sequences, you need to grant select,update on them in order to allow a user to use nextval.
Managing Databases
Create a new database, the rolename should already exist (defaults to the current user):
CREATE DATABASE dbname WITH OWNER rolename
Logging
Following the basic configuration above will result in a logfile at /var/pgsql/logfile. If you take out the shell redirection of stderr when starting postgres and set some parameters, you can have postgres take care of your logging and log rotation for you:
redirect_stderr = on (must be set at server startup) log_filename postgresql_log.%a (default postgresql-%Y-%m-%d_%H%M%S.log) log_rotation_age = 1440 log_truncate_on_rotation = on
This will keep 7 days of logs, one log file per day named server_log.Mon, server_log.Tue, etc, and automatically overwrite last week's log with this week's log.
Vacuum
- From the manual:
"PostgreSQL's VACUUM command must be run on a regular basis for several reasons:
- To recover or reuse disk space occupied by updated or deleted rows.
- To update data statistics used by the PostgreSQL query planner.
- To protect against loss of very old data due to transaction ID wraparound."
and
"There are two variants of the VACUUM command. The first form, known as "lazy vacuum" or just VACUUM, marks expired data in tables and indexes for future reuse; it does not attempt to reclaim the space used by this expired data unless the space is at the end of the table and an exclusive table lock can be easily obtained. Unused space at the start or middle of the file does not result in the file being shortened and space returned to the operating system. This variant of VACUUM can be run concurrently with normal database operations.
The second form is the VACUUM FULL command. This uses a more aggressive algorithm for reclaiming the space consumed by expired row versions. Any space that is freed by VACUUM FULL is immediately returned to the operating system. Unfortunately, this variant of the VACUUM command acquires an exclusive lock on each table while VACUUM FULL is processing it. Therefore, frequently using VACUUM FULL can have an extremely negative effect on the performance of concurrent database queries. "
For lazy vacuuming, some edits to postgresql.conf:
autovacuum = on autovacuum_naptime = 1000min stats_start_collector = on stats_row_level = on vacuum_cost_delay = 1000
For full vacuuming, I run the following from cron, at times when usage is low:
vacuumdb --all --full --analyze
For high usage databases, defined as highly I/O constrained databases, other values for these parameters or indeed other approaches should be considered.
Reindexing
Unless you have very active databases (heavy usage or heavy churn), reindexing is probably not necessary. Consider it a performance optimisation. However, there is one major exception: after a recovery operation (any time you replay the Write Ahead Log) you will need to reindex any hash indexes
Backup and Restore
pg_dump and pg_dumpall are the tools for dumping postgresql databases. If you use --format to specify an archive format, you will be able to selectively restore from the archive, but you will need to use pg_restore to do so. Otherwise the ordinary psql is your restore tool.
Basic usage
The best way to get postgresql to show the DDL for your tables is:
pg_dump -s <dbname> [-t tablename1 [-t tablenameN]]
which, more-or-less, gives effectively the same result as MySQL's SHOW CREATE TABLE.
This will dump all the databases, roles and tablespace information:
pg_dumpall --clean > dumpfile psql -f dumpfile postgres
For individual databases:
pg_dump --clean *dbname* > dumpfile psql *dbname* < dumpfile
To a remote destination:
pg_dump --clean *dbname* | ssh hostname 'psql *dbname*'
From the manual: "If your database schema relies on OIDs (for instance as foreign keys) you must instruct pg_dump to dump the OIDs as well. To do this, use the -o command line option."