diff --git a/.cursorrules b/.cursorrules new file mode 100644 index 0000000..444d0e5 --- /dev/null +++ b/.cursorrules @@ -0,0 +1,55 @@ +You are working on `git.codelab.vc/pkg/dbx`, a Go PostgreSQL cluster library built on pgx/v5. + +## Architecture + +- Cluster manages master + replicas with method-based routing (no SQL parsing) +- Write ops (Exec, Query, QueryRow, Begin, BeginTx, CopyFrom, SendBatch) → master +- Read ops (ReadQuery, ReadQueryRow) → replicas with master fallback +- Retry with exponential backoff + jitter, iterates nodes then backs off +- Round-robin balancer skips unhealthy nodes +- Background health checker pings all nodes on interval +- RunTx — panic-safe transaction wrapper (recover → rollback → re-panic) +- InjectQuerier/ExtractQuerier — context-based Querier for service layers + +## Package structure + +- `dbx` (root) — Cluster, Node, Balancer, retry, health, errors, tx, config, options + - `dbx.go` — interfaces: Querier, DB, Logger, MetricsHook + - `cluster.go` — Cluster routing and query execution + - `node.go` — Node wrapping pgxpool.Pool with health state + - `balancer.go` — Balancer interface + RoundRobinBalancer + - `retry.go` — retrier with backoff and node fallback + - `health.go` — background health checker goroutine + - `tx.go` — RunTx, RunTxOptions, InjectQuerier, ExtractQuerier + - `errors.go` — IsRetryable, IsConnectionError, IsConstraintViolation, PgErrorCode + - `config.go` — Config, NodeConfig, PoolConfig, RetryConfig, HealthCheckConfig + - `options.go` — functional options (WithLogger, WithMetrics, WithRetry, WithHealthCheck) +- `dbxtest/` — test helpers: NewTestCluster, TestLogger + +## Code conventions + +- Struct-based Config with defaults() method for zero-value defaults +- Functional options (Option func(*Config)) used via ApplyOptions +- stdlib only testing — no testify, no gomock +- Thread safety with atomic.Bool (Node.healthy, Cluster.closed) +- dbxtest.NewTestCluster skips on unreachable DB, auto-closes via t.Cleanup +- Sentinel errors: ErrNoHealthyNode, ErrClusterClosed, ErrRetryExhausted +- retryError multi-unwrap for errors.Is compatibility + +## When writing new code + +- New node type → add to Cluster struct, Config, connect in NewCluster, add to `all` for health checking +- New balancer → implement Balancer interface, check IsHealthy(), return nil if no suitable node +- New retry logic → provide RetryConfig.RetryableErrors or extend IsRetryable() +- New metrics hook → add field to MetricsHook, nil-check before calling +- Close() is required — leaking a Cluster leaks goroutines and connections +- No SQL parsing — routing is method-based, Exec with SELECT still goes to master + +## Commands + +```bash +go build ./... # compile +go test ./... # test +go test -race ./... # test with race detector +go vet ./... # static analysis +``` diff --git a/.gitea/workflows/ci.yml b/.gitea/workflows/ci.yml new file mode 100644 index 0000000..a33b0dc --- /dev/null +++ b/.gitea/workflows/ci.yml @@ -0,0 +1,23 @@ +name: CI + +on: + push: + branches: [main] + pull_request: + branches: [main] + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - uses: actions/setup-go@v5 + with: + go-version: "1.24" + + - name: Vet + run: go vet ./... + + - name: Test + run: go test -race -count=1 ./... diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..5f6d88d --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,109 @@ +# AGENTS.md — dbx + +Universal guide for AI coding agents working with this codebase. + +## Overview + +`git.codelab.vc/pkg/dbx` is a Go PostgreSQL cluster library built on **pgx/v5**. It provides master/replica routing, automatic retries, load balancing, background health checking, panic-safe transactions, and context-based Querier injection. + +## Package map + +``` +dbx/ Root — Cluster, Node, Balancer, retry, health, errors, tx, config +├── dbx.go Interfaces: Querier, DB, Logger, MetricsHook +├── cluster.go Cluster — routing, write/read operations +├── node.go Node — pgxpool.Pool wrapper with health state +├── balancer.go Balancer interface + RoundRobinBalancer +├── retry.go retrier — exponential backoff with jitter and node fallback +├── health.go healthChecker — background goroutine pinging nodes +├── tx.go RunTx, RunTxOptions, InjectQuerier, ExtractQuerier +├── errors.go Error classification (IsRetryable, IsConnectionError, etc.) +├── config.go Config, NodeConfig, PoolConfig, RetryConfig, HealthCheckConfig +├── options.go Functional options (WithLogger, WithMetrics, WithRetry, etc.) +└── dbxtest/ + └── dbxtest.go Test helpers: NewTestCluster, TestLogger +``` + +## Routing architecture + +``` + ┌──────────────┐ + │ Cluster │ + └──────┬───────┘ + │ + ┌───────────────┴───────────────┐ + │ │ + Write ops Read ops + Exec, Query, QueryRow ReadQuery, ReadQueryRow + Begin, BeginTx, RunTx + CopyFrom, SendBatch + │ │ + ▼ ▼ + ┌──────────┐ ┌────────────────────────┐ + │ Master │ │ Balancer → Replicas │ + └──────────┘ │ fallback → Master │ + └────────────────────────┘ + +Retry loop (retrier.do): + For each attempt (up to MaxAttempts): + For each node in [target nodes]: + if healthy → execute → on retryable error → continue + Backoff (exponential + jitter) +``` + +## Common tasks + +### Add a new node type (e.g., analytics replica) + +1. Add a field to `Cluster` struct (e.g., `analytics []*Node`) +2. Add corresponding config to `Config` struct +3. Connect nodes in `NewCluster`, add to `all` slice for health checking +4. Add routing methods (e.g., `AnalyticsQuery`) + +### Customize retry logic + +1. Provide `RetryConfig.RetryableErrors` — custom `func(error) bool` classifier +2. Or modify `IsRetryable()` in `errors.go` to add new PG error codes +3. Adjust `MaxAttempts`, `BaseDelay`, `MaxDelay` in `RetryConfig` + +### Add a metrics hook + +1. Add a new callback field to `MetricsHook` struct in `dbx.go` +2. Call it at the appropriate point (nil-check the hook and the field) +3. See existing hooks in `cluster.go` (queryStart/queryEnd) and `health.go` (OnNodeDown/OnNodeUp) + +### Add a new balancer strategy + +1. Implement the `Balancer` interface: `Next(nodes []*Node) *Node` +2. Must return `nil` if no suitable node is available +3. Must check `node.IsHealthy()` to skip down nodes + +## Gotchas + +- **Close() is required**: `Cluster.Close()` stops the health checker goroutine and closes all pools. Leaking a Cluster leaks goroutines and connections +- **RunTx panic safety**: `runTx` uses `defer` with `recover()` — it rolls back on panic, then re-panics. Do not catch panics outside `RunTx` expecting the tx to be committed +- **Context-based Querier injection**: `ExtractQuerier` returns the fallback if no Querier is in context. Always pass the cluster/pool as fallback so code works both inside and outside transactions +- **Health checker goroutine**: Starts immediately in `NewCluster`. Uses `time.NewTicker` — the first check happens after one interval, not immediately. Nodes start as healthy (`healthy.Store(true)` in `newNode`) +- **readNodes ordering**: `readNodes()` returns `[replicas..., master]` — the retrier tries replicas first, master is the last fallback +- **errRow for closed cluster**: When cluster is closed, `QueryRow`/`ReadQueryRow` return `errRow{err: ErrClusterClosed}` — the error surfaces on `Scan()` +- **No SQL parsing**: Routing is purely method-based. If you call `Exec` with a SELECT, it still goes to master + +## Commands + +```bash +go build ./... # compile +go test ./... # all tests +go test -race ./... # tests with race detector +go test -v -run TestName ./... # single test +go vet ./... # static analysis +``` + +## Conventions + +- **Struct-based Config** with `defaults()` method for zero-value defaults +- **Functional options** (`Option func(*Config)`) used via `ApplyOptions` (primarily in dbxtest) +- **stdlib only** testing — no testify, no gomock +- **Thread safety** — `atomic.Bool` for `Node.healthy` and `Cluster.closed` +- **dbxtest helpers** — `NewTestCluster` skips on unreachable DB, auto-closes via `t.Cleanup`; `TestLogger` routes to `testing.T` +- **Sentinel errors** — `ErrNoHealthyNode`, `ErrClusterClosed`, `ErrRetryExhausted` +- **retryError** uses multi-unwrap (`Unwrap() []error`) so both `ErrRetryExhausted` and the last error can be matched with `errors.Is` diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..729be5a --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,46 @@ +# CLAUDE.md — dbx + +## Commands + +```bash +go build ./... # compile +go test ./... # all tests +go test -race ./... # tests with race detector +go test -v -run TestName ./... # single test +go vet ./... # static analysis +``` + +## Architecture + +- **Module**: `git.codelab.vc/pkg/dbx`, Go 1.24, depends on pgx/v5 +- **Single package** `dbx` (+ `dbxtest` for test helpers) + +### Core patterns + +- **Cluster** is the entry point — connects master + replicas, routes writes to master, reads to replicas with master fallback +- **Routing is method-based**: `Exec`/`Query`/`QueryRow`/`Begin`/`BeginTx`/`CopyFrom`/`SendBatch` → master; `ReadQuery`/`ReadQueryRow` → replicas +- **Retry** with exponential backoff + jitter, node fallback; retrier.do() iterates nodes then backs off +- **Balancer** interface (`Next([]*Node) *Node`) — built-in `RoundRobinBalancer` skips unhealthy nodes +- **Health checker** — background goroutine pings all nodes on an interval, flips `Node.healthy` atomic bool +- **RunTx** — panic-safe transaction wrapper: recovers panics, rolls back, re-panics +- **Querier injection** — `InjectQuerier`/`ExtractQuerier` pass `Querier` via context for service layers + +### Error classification + +- `IsRetryable(err)` — connection errors (class 08), serialization failures (40001), deadlocks (40P01), too_many_connections (53300) +- `IsConnectionError(err)` — PG class 08 + string matching for pgx-wrapped errors +- `IsConstraintViolation(err)` — PG class 23 +- `PgErrorCode(err)` — extract raw code from `*pgconn.PgError` + +## Conventions + +- Struct-based `Config` with `defaults()` method (not functional options for NewCluster constructor, but `Option` type exists for `ApplyOptions` in tests) +- Functional options (`Option func(*Config)`) used via `ApplyOptions` (e.g., in dbxtest) +- stdlib-only tests — no testify, no gomock +- `atomic.Bool` for thread safety (`Node.healthy`, `Cluster.closed`) +- `dbxtest.NewTestCluster` skips tests when DB unreachable, auto-closes via `t.Cleanup` +- `dbxtest.TestLogger` writes to `testing.T` for test log output + +## See also + +- `AGENTS.md` — universal AI agent guide with common tasks, gotchas, and ASCII diagrams diff --git a/README.md b/README.md index 31765b8..05e1fb6 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,178 @@ # dbx +PostgreSQL cluster library for Go built on pgx/v5. Master/replica routing, automatic retries with exponential backoff, round-robin load balancing, background health checking, panic-safe transactions, and context-based Querier injection. + +``` +go get git.codelab.vc/pkg/dbx +``` + +## Quick start + +```go +cluster, err := dbx.NewCluster(ctx, dbx.Config{ + Master: dbx.NodeConfig{ + Name: "master", + DSN: "postgres://user:pass@master:5432/mydb", + Pool: dbx.PoolConfig{MaxConns: 20, MinConns: 5}, + }, + Replicas: []dbx.NodeConfig{ + {Name: "replica-1", DSN: "postgres://user:pass@replica1:5432/mydb"}, + {Name: "replica-2", DSN: "postgres://user:pass@replica2:5432/mydb"}, + }, +}) +if err != nil { + log.Fatal(err) +} +defer cluster.Close() + +// Write → master +cluster.Exec(ctx, "INSERT INTO users (name) VALUES ($1)", "alice") + +// Read → replica with automatic fallback to master +rows, err := cluster.ReadQuery(ctx, "SELECT * FROM users WHERE active = $1", true) + +// Transaction → master, panic-safe +cluster.RunTx(ctx, func(ctx context.Context, tx pgx.Tx) error { + tx.Exec(ctx, "UPDATE accounts SET balance = balance - $1 WHERE id = $2", 100, fromID) + tx.Exec(ctx, "UPDATE accounts SET balance = balance + $1 WHERE id = $2", 100, toID) + return nil +}) +``` + +## Components + +| Component | What it does | +|-----------|-------------| +| `Cluster` | Entry point. Connects to master + replicas, routes queries, manages lifecycle. | +| `Node` | Wraps `pgxpool.Pool` with health state and a human-readable name. | +| `Balancer` | Interface for replica selection. Built-in: `RoundRobinBalancer`. | +| `retrier` | Exponential backoff with jitter, node fallback, custom error classifiers. | +| `healthChecker` | Background goroutine that pings all nodes on an interval. | +| `Querier` injection | `InjectQuerier` / `ExtractQuerier` — context-based Querier for service layers. | +| `MetricsHook` | Optional callbacks: query start/end, retry, node up/down, replica fallback. | + +## Routing + +The library uses explicit method-based routing (no SQL parsing): + +``` + ┌──────────────┐ + │ Cluster │ + └──────┬───────┘ + │ + ┌───────────────┴───────────────┐ + │ │ + Write ops Read ops + Exec, Query, QueryRow ReadQuery, ReadQueryRow + Begin, BeginTx, RunTx + CopyFrom, SendBatch + │ │ + ▼ ▼ + ┌──────────┐ ┌────────────────────────┐ + │ Master │ │ Balancer → Replicas │ + └──────────┘ │ fallback → Master │ + └────────────────────────┘ +``` + +Direct node access: `cluster.Master()` and `cluster.Replica()` return `DB`. + +## Multi-replica setup + +```go +cluster, _ := dbx.NewCluster(ctx, dbx.Config{ + Master: dbx.NodeConfig{ + Name: "master", + DSN: "postgres://master:5432/mydb", + Pool: dbx.PoolConfig{MaxConns: 20, MinConns: 5}, + }, + Replicas: []dbx.NodeConfig{ + {Name: "replica-1", DSN: "postgres://replica1:5432/mydb"}, + {Name: "replica-2", DSN: "postgres://replica2:5432/mydb"}, + {Name: "replica-3", DSN: "postgres://replica3:5432/mydb"}, + }, + Retry: dbx.RetryConfig{ + MaxAttempts: 5, + BaseDelay: 100 * time.Millisecond, + MaxDelay: 2 * time.Second, + }, + HealthCheck: dbx.HealthCheckConfig{ + Interval: 3 * time.Second, + Timeout: 1 * time.Second, + }, +}) +defer cluster.Close() +``` + +## Transactions + +`RunTx` is panic-safe — if the callback panics, the transaction is rolled back and the panic is re-raised: + +```go +err := cluster.RunTx(ctx, func(ctx context.Context, tx pgx.Tx) error { + _, err := tx.Exec(ctx, "UPDATE accounts SET balance = balance - $1 WHERE id = $2", amount, fromID) + if err != nil { + return err + } + _, err = tx.Exec(ctx, "UPDATE accounts SET balance = balance + $1 WHERE id = $2", amount, toID) + return err +}) +``` + +For custom isolation levels use `RunTxOptions`: + +```go +cluster.RunTxOptions(ctx, pgx.TxOptions{ + IsoLevel: pgx.Serializable, +}, fn) +``` + +## Context-based Querier injection + +Pass the Querier through context so service layers work both inside and outside transactions: + +```go +// Repository +func CreateUser(ctx context.Context, db dbx.Querier, name string) error { + q := dbx.ExtractQuerier(ctx, db) + _, err := q.Exec(ctx, "INSERT INTO users (name) VALUES ($1)", name) + return err +} + +// Outside transaction — uses cluster directly +CreateUser(ctx, cluster, "alice") + +// Inside transaction — uses tx +cluster.RunTx(ctx, func(ctx context.Context, tx pgx.Tx) error { + ctx = dbx.InjectQuerier(ctx, tx) + return CreateUser(ctx, cluster, "alice") // will use tx from context +}) +``` + +## Error classification + +```go +dbx.IsRetryable(err) // connection errors, serialization failures, deadlocks, too_many_connections +dbx.IsConnectionError(err) // PG class 08 + common connection error strings +dbx.IsConstraintViolation(err) // PG class 23 (unique, FK, check violations) +dbx.PgErrorCode(err) // extract raw PG error code +``` + +Sentinel errors: `ErrNoHealthyNode`, `ErrClusterClosed`, `ErrRetryExhausted`. + +## dbxtest helpers + +The `dbxtest` package provides test helpers: + +```go +func TestMyRepo(t *testing.T) { + cluster := dbxtest.NewTestCluster(t, dbx.WithLogger(&dbxtest.TestLogger{T: t})) + // cluster is auto-closed when test finishes + // skips test if DB is not reachable +} +``` + +Set `DBX_TEST_DSN` env var to override the default DSN (`postgres://postgres:postgres@localhost:5432/dbx_test?sslmode=disable`). + +## Requirements + +Go 1.24+, [pgx/v5](https://github.com/jackc/pgx).