PostgreSQL CookbookSchemaidentity and defaults
0 / 63 (0%)

identity and defaults

Brew was migrating off an old orders table and backfilling history with a script. The table was on serial, the script inserted rows with explicit ids from the export (1001..1500) — but nobody moved the sequence past those values; it stayed where it was. Everything worked for a few weeks, orders flowed, and then the app started failing out of nowhere: duplicate key value violates unique constraint "orders_pkey". The sequence had climbed to 1500, tried to hand out an already-taken id, and crashed into its own history. The classic serial foot-gun.

The goal of this unit is to understand who owns the primary key. With serial (and its direct heir GENERATED BY DEFAULT AS IDENTITY) ownership is shared between the DB and the app: the DB supplies an id if you don't, but it'll also accept an explicit one — which means a migration script can silently drift away from the sequence. The modern PG18 default is GENERATED ALWAYS AS IDENTITY: the column is owned only by the DB, you simply cannot write your own id (without an explicit OVERRIDING SYSTEM VALUE). Along the way we'll cover the plain DEFAULT — an expression substituted when a value is omitted.

Two IDENTITY modes: who owns the key

GENERATED ALWAYS AS IDENTITY — the column belongs entirely to the DB. An INSERT without id gets the next value from the internal sequence; an INSERT with id is rejected with SQLSTATE 428C9 — not a bug but a guard: the app physically cannot desync the counter. That very foot-gun from the story is impossible here.

GENERATED BY DEFAULT AS IDENTITY behaves like the old serial: without id the DB supplies its own, but it'll also silently accept an explicit idwithout moving the sequence. Tomorrow the sequence reaches that same value and emits a duplicate. Handy for data backfills (you can preserve the original ids), but that very leniency plants the mine; after a migration you must bump the sequence manually with setval (exactly what our schema/seed.sql does for the base orders table).

DEFAULT: the default value

DEFAULT is simpler: an expression the DB substitutes into a column when the INSERT skipped it. For created_at that's now() — pass no time, get the current one. IDENTITY is essentially a special DEFAULT over a sequence; the difference is that a plain DEFAULT can always be overridden with your own value, while GENERATED ALWAYS cannot.

ALWAYS vs BY DEFAULT

The difference is about who owns the key and whether you can write your own id:

AxisGENERATED ALWAYSGENERATED BY DEFAULT (≈ serial)
Who owns the keythe DB onlythe DB and the app
INSERT without idnext from the sequencenext from the sequence
INSERT with explicit idrejected (428C9)accepted, counter not moved
Sequence desyncimpossiblepossible → needs setval after a load
When to usenew tables (the PG18 default)backfill/import preserving original ids

What our code shows

Two dedicated tables (DDL in schema.sql) — the Brew base tables are byte-compatible with kafka-cookbook and their ids must not be touched, so we show the contrast on new ones:

sql
CREATE TABLE id_always (
    id          BIGINT       GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    label       TEXT         NOT NULL,
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT now()
);
CREATE TABLE id_by_default (
    id     BIGINT  GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    label  TEXT    NOT NULL
);

query.sql runs three experiments: an insert without id (DB hands out the key), an attempt to write an explicit id into ALWAYS and into BY DEFAULT:

sql
-- name: InsertAlways :one
INSERT INTO id_always (label) VALUES ($1)
RETURNING id, (created_at IS NOT NULL)::boolean AS created_at_filled;
 
-- name: InsertAlwaysExplicitID :exec      -- will reject: SQLSTATE 428C9
INSERT INTO id_always (id, label) VALUES ($1, $2);
 
-- name: InsertByDefaultExplicitID :one    -- will accept the explicit id
INSERT INTO id_by_default (id, label) VALUES ($1, $2)
RETURNING id;

main.go is thin: it clears the tables, does three inserts without id (gets 1,2,3), then catches the error from the explicit insert into ALWAYS and prints its SQLSTATE — the code is deterministic, unlike the message text, so the output reproduces verbatim:

go
err = queries.InsertAlwaysExplicitID(ctx, db.InsertAlwaysExplicitIDParams{ID: 999, Label: "smuggled"})
fmt.Printf("... отклонён: SQLSTATE %s ...\n", sqlState(err)) // 428C9

Running it

sh
docker compose up -d
make lecture L=02-schema-and-constraints/02-01-identity-and-defaults T=db-reset
make lecture L=02-schema-and-constraints/02-01-identity-and-defaults

Output:

plaintext
1) GENERATED ALWAYS AS IDENTITY — id присваивает БД:
   три INSERT без id → id = [1 2 3]; created_at заполнен по DEFAULT now(): true
2) Явный id=999 в GENERATED ALWAYS → отклонён: SQLSTATE 428C9 (нельзя писать в GENERATED ALWAYS)
3) Явный id=999 в GENERATED BY DEFAULT → принят: id = 999
   (счётчик IDENTITY при этом не сдвинулся — отсюда классический рассинхрон serial; см. заборчик)

(The demo prints in Russian.) The three inserts without id landed as 1, 2, 3, and created_at filled itself from DEFAULT now(). The explicit id into GENERATED ALWAYS was batted away by the DB (428C9) — the app simply can't touch the key. And GENERATED BY DEFAULT accepted 999 silently: the sequence stayed at one — that's the very path to a future duplicate.

The fence

What we simplified: we showed that BY DEFAULT silently desyncs the counter, but we didn't drive it to an actual collision (duplicate key) — that's tens of thousands of inserts. Beyond that lie the boundaries your DBA holds, and the practical rules:

  • Desync is fixed with setval: after any data load with explicit ids you bump the sequence via SELECT setval(pg_get_serial_sequence('t','id'), (SELECT max(id) FROM t)) — exactly what our schema/seed.sql does for the base orders table.
  • Start new tables on GENERATED ALWAYS AS IDENTITY — it forbids the app from touching the key and kills a whole class of bugs.
  • Reach for BY DEFAULT deliberately and only where you need to temporarily substitute ids (import, replicating test data), and don't forget the setval afterward.
  • OVERRIDING SYSTEM VALUE exists, but it's an emergency hatch, not a daily tool.

Takeaways

  • GENERATED ALWAYS AS IDENTITY (the PG18 default) — the DB owns the key: an explicit id is rejected (428C9), sequence desync is impossible.
  • GENERATED BY DEFAULT AS IDENTITY (like serial) — accepts an explicit id without moving the counter; handy for migrations, but needs a setval afterward, otherwise a future duplicate.
  • DEFAULT <expression> is substituted when a column is omitted (created_at DEFAULT now()); IDENTITY is a special DEFAULT that ALWAYS won't let you override.
  • Print the SQLSTATE, not the error text: the code is deterministic, the text is not.

Next up — the 02-02 "NOT NULL, PK, natural vs surrogate key" unit: how PRIMARY KEY differs from a plain UNIQUE, why it's NOT NULL right away, and when to key on a surrogate id (as here) versus a natural business code.

·Module 03

This lesson is still ahead

The course goes in order — to open this step, finish the previous ones first. Context builds up without gaps that way.

/ you tried to open
Schema / identity and defaults